Database Optimization for Rails Apps
November 20th, 2007
Summary
- Use STRAIGHT_JOIN if MySQL is doing something silly
- MySQL Datetimes are not slow
- MyISAM is much, much faster than InnoDB
- Consider MySQL partitioning; it's not hard
- Watch out for background optimisation and query caching
- Log everything you do, and why
The reporting application I'm working on does a date-range select against a table with ten million rows. It took over two minutes to execute. I cut it down to around two seconds without significantly changing the application. Here's how.
Straight Joins
The original SQL statement was 12 lines, featuring three joins, a group-by and a computed column.
To start with I used EXPLAIN SELECT to look at the execution plan. MySQL had made a bad guess, and was table-scanning a four-million-row related table that could have been joined by ID instead.
I added STRAIGHT_JOIN to the SQL statement and dropped the execution time to 20 seconds.
Use STRAIGHT_JOIN if MySQL is doing something silly.
Finding the Weak Point
Progress; MySQL was now table-scanning a 10 million row 'events' table to satisfy the WHERE clause, then joining the other three tables by ID, grouping and summing. I cut the query down progressively to a simple "SELECT count(*) FROM events WHERE starttime < '2006-12-30 08:30:00' AND endtime > '2006-12-30 08:15:00" and found that performance didn't significantly change.
This isolated the slow part of the query. My original plan had been to denormalize the table structure, but there was no point.
At this point I built a test database of two million rows on my local machine with only the one table in question, and only the three columns that were needed.
Indexes
I added indexes on starttime, endtime, and (starttime,endtime), but the MySQL optimiser chose not to use them. When I forced it to use them, using FORCE INDEX, performance deteriorated (!). OPTIMIZE TABLE made no difference.
Dead End: Int Conversion
The application needs only 15-minute resolution on the date times, so I created start_int and finish_int columns and filled them:
update events set start_int = floor(unix_timestamp(starttime)/900) update events set finish_int = floor(unix_timestamp(endtime)/900)
Then I created indexes on the columns and tested again. The plan here was that, by reducing the resolution, there'd be fewer unique values and the indexes would be more effective.
It didn't work. Tablescanning against datetimes was faster than indexed lookup against ints.
MySQL DateTimes are not slow.
Switching To MyISAM
I'd noticed that my test database was performing much faster than the production database, even taking into account the smaller size. Running SHOW CREATE TABLE against the two revealed that I was using MyISAM, but the production database had InnoDB. I converted my local database to InnoDB and took a speed hit on the order of 6x.
MyISAM is much faster than InnoDB. (but see notes)
Note that the conversion process is hideously slow; it's faster to create a new table and load it up with your test data yourself. If you do this, use extended inserts; they're about 25x faster than singular inserts.
You give up some goodies, including transactions, by making the switch.
Partitioning
The application never queries over a time range of greater than 15 minutes, and always aligns its requests on 15-minute boundaries. If we had to tablescan, I could use this to at least make sure we didn't have to tablescan all ten million rows.
I installed MySQL 5.1, which is the first release that supports partitioning, and created a new table using MyISAM, with a partition per 3 months, copying data from the old table in one hit:
CREATE TABLE `events2` ( `id` int(11) NOT NULL AUTO_INCREMENT, `starttime` datetime DEFAULT NULL, `endtime` datetime DEFAULT NULL, KEY `id` (`id`), KEY `starttime` (`starttime`), KEY `endtime` (`endtime`), KEY `starttime_and_endtime` (`starttime`,`endtime`), ) ENGINE=MyISAM DEFAULT CHARSET=utf8 PARTITION BY RANGE (TO_DAYS(starttime)) ( PARTITION p0 VALUES LESS THAN (732312), PARTITION p1 VALUES LESS THAN (732677), PARTITION p2 VALUES LESS THAN (732767), ... PARTITION p17 VALUES LESS THAN MAXVALUE ) SELECT * FROM events;
I'm using TO_DAYS because partitions by range can only be described in terms of a single integer (not a datetime).
Now I could adjust my query to address a single partition only:
SELECT count(*) FROM events WHERE starttime < '2006-12-30 08:30:00' AND endtime > '2006-12-30 08:15:00" AND starttime > '2006-12-01 00:00:00'.
I used EXPLAIN PARTITION SELECT ... to confirm that MySQL was accessing only a single partition.
The benefit of this particular partitioning strategy is that it's transparent to the application. More advanced approaches involve splitting records across database _servers_, rather than simply changing the 'create table' statement.
Consider MySQL partitioning; it's not hard.
Results
The full query originally took around 152 seconds to run. It's now down to around two seconds, or a 75x speed improvement. This is running essentially the same query, and returning data in the same format; the only change necessary at the application level was to reword the SQL slightly.General Tactics
Those were the specific victories. Here are the overall tactics which are more generally applicable:
I found it helpful to think not about what my application needed to know, but what it didn't need to know. For example, rules like "we never run a query that does not align on a 15-minute boundary" can simplify the problem.
Nothing substitutes for benchmarks. Things that ought to be fast are sometimes slow. Things that ought to be slow are sometimes fast.
But don't trust the benchmarks. Sometimes I'd return to my machine to find the *same query* running in less than a second instead of over a minute. I was using SQL_NO_CACHE, so I think background optimisation is responsible here. Since the database was very large, it's possible that OSX's virtual memory management was also interfering.
Watch out for background optimisation and query caching.
The only reason I caught this was that I kept notes of every single query I ran, which machine I ran it on, the time it took to execute, and anything interesting about the results. It's a simple routine to create or edit the query in Textmate, then paste it into MySQL, instead of working in the MySQL window directly.
Log everything you do, and why.
I also kept notes about theories and ideas, inline with the sql, and later marked them with PROVED / DISPROVED and the evidence so that I could be sure I'd covered all the bases.
Conclusion and Notes
Tweaking the database lets you make huge gains in performance without modifying your complex application. The assertions above are only guidelines based on the evidence I collected on one project; ultimately it's your own benchmarks that will matter. Simplify the problem, record your investigations, and benchmark the results.
21 Responses to “Database Optimization for Rails Apps”
Sorry, comments are closed for this article.






I’m sure I’ll re-read this article a lot in future.
gooood one , 10x
Awesome tips. I’d never heard of partitioning before. EXPLAIN is certainly a life-saver.
I wrote a script recently that automates EXPLAINS by parsing the rails logs. I’ve saved a ton of time using it. Perhaps you and your readers might find it useful.
http://pixels-and-politics.blogspot.com/2007/11/rails-sql-optimization.html
Hi,
You article is very good but.. INNODB is in some conditions faster than MyISAM (but of course with Rails it is better to use MyISAM) and we can use transaction and constraints. It often is really usefull. So if our application is only some web site, shop etc. it is better to use MyISAM (because using SELECT it will work about 2-3 times faster than using INNODB). If your data in database should be more safe and the most important is for you insertion data into database than pulling it so you should to use INNODB.
MySQL’s idiosyncrasies – some of which outlined above – make me question why it is so popular in the first place. Some behavior, e.g. concerning index performance or datetime lookup being faster than an indexed integer, is not just counterintuitive but outright twisted. My question to you: Why do seemingly smart guys like New Bamboo stick to software like MySQL?
@marcus: Agreed. This is a reporting-only application, so it’s not an issue for us. If you’re up for it, you can do something cool with master-slave replication: the master uses InnoDB and handles all the writes, and the slave uses MyISAM and handles the reads. MySQL-Proxy can make this invisible to your application. Best of both worlds!
@DerGuteMoritz: I haven’t done this level of analysis on Postgresql, but I imagine it has its own quirks. We do have a couple of projects that are using it. It looks like this has been debated in detail before.
@Patrick: Looks cool – I’ll give it a go when the next project goes into the optimisation stage.
Yeah, I’ve been waiting on MySQL 5.1 table partitioning for a while, any ideas on when it’s production worthy?
@Gwyn: Given that Postgres mostly conforms to standard SQL wheres MySQL hardly does, and most of the time acts as one would expect (deviances are very well documented), MySQL’s quirks clearly overweigh. Unless, of course, you are used to its quirks so much that SQL standard seems quirky to you ;) The search results yielded by your link are interesting. The first one is, in fact, 5 years old. The second (http://monstera.man.poznan.pl/wiki/index.php/Mysql_vs_postgres) only 2 years (at least) and it claims MySQL’s performance to be “faster” while Postgres’ is considered “slower.” Scrolling down to the graphs clearly shows that it’s only MyISAM which is slightly slower than Postgres. Contextualize this with the feature comparison table above. Why use an inferior, only slightly faster product in the first place when the more advanced features (whereas I wouldn’t consider transactions an advanced feature but a necessity) may be needed sooner or later anyway. Just curious :)
Thanks for your reply and keep up the good work here!
Hey,
Sorry not to continue the db-pissing-contest, just wanted to point out that rather than copy/paste from TextMate, you can run your queries from there directly. ctrl-shift-Q
@DerGuteMoritz – We really don’t have a bias as far as persistance stores go. I for example am happily using CoucheDB with a Merb app at the minute, but just to fill you in on why we’re using MySQL for most of our projects:
- There are a lot of very large high profile sites using MySQL. They’ve already solved the scalability problems so we don’t have to. Can you guess how many of the top 5 social networking sites are running MySQL as their relational persistance store?
- All of our business logic goes into the object model. Features such as triggers, views, stored procs are pretty much useless to us. By not coupling/polluting our domain objects with SQL it allows us to move quickly during an Agile project and model complex business rules that can change one day to the next. The only time we do start getting heavy with the old SQL is for those specific optimisations towards the end of a project.
- Vi is the best editor! No, emacs is! No wait Textmate! See what I mean? If we argued about this kind of thing in the office we wouldn’t get any work done;) It’s one of those holy war topics that are usually never very constructive.
- We use what fits best with the clients needs, technology stack and knowledge. Most of our clients are comfortable with MySQL and use it already within their organisation, which is why we have a slight bias towards MySQL. Mind you, when we’ve had clients ask for Postgres we’ve given them a Postgress solution and not tried to sell them a solution they don’t want.
I hope this has answered your question.
MyISAM is much, much faster than InnoDB
Stopped reading right there.
Just get a clue, mkay.
“MyISAM is much, much faster than InnoDB”...
Yeah, I stopped reading right there too. It is faster in certain cases… and those should be noted when using such a bold statement ;)
I don’t use MySQL primarily due to their licensing… so, Postgres takes the cake.
“MyISAM is much, much faster than InnoDB”...
Yeah, I stopped reading right there too. It is faster in certain cases… and those should be noted when using such a bold statement ;)
I don’t use MySQL primarily due to their licensing… so, Postgres takes the cake.
@guy: I’ve added a note to make it explicit that the performance results should be read in the context of this particular project.
Thanks for your detailed reply, Jonathan. Didn’t mean to start a flame war (though I don’t think that text editor preference is comparable to SQL-RDBMS preference), so let’s leave it at that and keep on doing what makes us happy :)
The database is the model. Treating the db as a dumb data store is misguided.
@plu – I think you’re going to have to back that up to be taken seriously.
Just curious, what version of MySQL were you on before the 5.1 upgrade? It’s strange that MySQL didn’t do an index merge on the two conditions that were indexed, instead of a table scan. The index merge feature was included in 5.0, so if you were using an order version, you would see a table scan.
http://www.google.com/search?hl=en&q=ativan&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=ambien&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=didrex&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=celebrex&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=cialis&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=ionamin&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=levitra&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=meridia&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=phentermine&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=tenuate&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=tramadol&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=ultram&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=xanax&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=viagra&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=zyban&hq=site:www.pharmacy.com.md http://www.google.com/search?hl=en&q=zoloft&hq=site:www.pharmacy.com.md
I have Bad Credit Debt Consolidation Non Secured Personal Loans and I search where i can go to get a Personal Unsecured Bad Credit Loans