Let's say that we have three models:
class PetShop < ActiveRecord::Base has_many :dogs end class Dog < ActiveRecord::Base belongs_to :pet_shop end class Cats < ActiveRecord::Base belongs_to :pet_shop endIn Rails 2.0, this:
PetShop.find(:all, :include => [:cats, :dogs])Will yield this:
SELECT `pet_shops`.`id` AS t0_r0, `pet_shops`.`name` AS t0_r1, `pet_shops`.`created_at` AS t0_r2, `pet_shops`.`updated_at` AS t0_r3, `dogs`.`id` AS t1_r0, `dogs`.`name` AS t1_r1, `dogs`.`pet_shop_id` AS t1_r2, `dogs`.`created_at` AS t1_r3, `dogs`.`updated_at` AS t1_r4, `cats`.`id` AS t2_r0, `cats`.`name` AS t2_r1, `cats`.`pet_shop_id` AS t2_r2, `cats`.`created_at` AS t2_r3, `cats`.`updated_at` AS t2_r4 FROM `pet_shops` LEFT OUTER JOIN `dogs` ON dogs.pet_shop_id = pet_shops.id LEFT OUTER JOIN `cats` ON cats.pet_shop_id = pet_shops.id;WTF?!
This is a cartesian join; if you had 100 petshops, with 100 dogs and 100 cats each, the number of rows returned will be 100 * 100 * 100 = 1,000,000. A million friggin' rows. Add another association, like say has_many :birds, and the count further skyrockets.
In Rails 2.1, the same call will result to:
SELECT * FROM `pet_shops`; SELECT `dogs`.* FROM `dogs` WHERE (`dogs`.pet_shop_id IN (1,2,3,4,5,6,7,8,9,10)); SELECT `cats`.* FROM `cats` WHERE (`cats`.pet_shop_id IN (1,2,3,4,5,6,7,8,9,10));Back to our assumption of having 100 petshops, with 100 dogs and 100 cats each. The number of rows returned is now 100 + 100(100) + 100(100) = 20,100.
There were reports though - unconfirmed, since I could not find a concrete article in the intertubes - that MySQL was taking a hit CPU-wise because of the additional queries. I have a different hunch though.
Assuming we had the "usual" migration:
create_table :pet_shops do |t| t.string :name t.timestamps end create_table :cats do |t| t.string :name t.references :pet_shop t.timestamps end create_table :dogs do |t| t.string :name t.references :pet_shop t.timestamps endWe can analyze the SQL statements generated by Rails 2.1's eager loading mechanism by using MySQL's EXPLAIN command. We will not attempt to analyze the first statement since it is basically a fetch all.
EXPLAIN SELECT `dogs`.* FROM `dogs` WHERE (`dogs`.pet_shop_id IN (1,2,3,4,5,6,7,8,9,10)); EXPLAIN SELECT `cats`.* FROM `cats` WHERE (`cats`.pet_shop_id IN (1,2,3,4,5,6,7,8,9,10));And here's what we got in return:
ALL - A full table scan is done for each combination of rows from the previous tables. This is normally not good if the table is the first table not marked const, and usually very bad in all other cases. Normally, you can avoid ALL by adding indexes that allow row retrieval from the table based on constant values or column values from earlier tables."Normally not good, and usually very bad in all other cases." The MySQL guys sure didn’t mince words here.
Let’s do what they suggested and add an index to our reference field:
add_index :dogs, :petshop_id add_index :cats, :petshop_idOr if you’re using the easier_indexes plugin, we can revise our migration:
create_table :cats do |t| t.string :name t.references :pet_shop, :index => true t.timestamps end create_table :dogs do |t| t.string :name t.references :pet_shop, :index => true t.timestamps endRunning the same EXPLAIN command this time yields us with:
range - Only rows that are in a given range are retrieved, using an index to select the rows.I'm too lazy to do a benchmark now, so let me know how this one works for you.


1 comments:
Hey Erol. Sweet post -- the explanation between 2.0 / 2.1 is nice and precise. I'm just trying out my first project on 2.1 and i'm going to use both the easier-indexes plugin and see what kind of performance I can get with the polymorphic belongs_to relationships... usually the bane of my existance.
Post a Comment