This is the second entry in a three post series on a few intermediate Arel recipes that I’ve used in my professional life and is adapted from my posts on the SciMed blog, which can be found here. For an introduction to this series—as well as some personal background—check out my previous post.
I was heading up a project about a year ago when a new requirement came through from our client: some of their biostatisticians wanted an easy way to query data that was spread across the majority of the tables in our database. Since they didn’t know our architecture terribly well, we decided the best thing that we could provide them was a one-stop-shop database view that collated information from all the tables that were relevant to their request.
For anyone new enough to database design to have never used a view, they’re
incredibly useful. You can define a view similarly to a table, but, unlike a
table, it’s built up of columns and data that actually live elsewhere—whether
that be a table or even another view. Since you just have to write the query to
define your view once, it makes querying data that originates from multiple
locations a cinch since you can just write a simple
SELECT without any
WHERE statements. Some databases allow data to be
updated via the view itself, though I much prefer keeping views read-only and
updating the rows in the original tables for consistency and clarity. If you
don’t want your view to be calculated on the fly for performance purposes,
certain databases even allow you to materialize your views and only refresh them
when desired. Read more about views
(the answer by Mike McAllister is currently my favorite).
After writing the first draft of the database view for the client’s biostatisticians, it was time to test it out. After a little bit of tinkering, we discovered that, while the data was all there, the performance degraded rapidly as our dataset increased in size. With the help of one of my colleagues, we rewrote the view to use common table expressions (CTEs) and bettered our performance by several orders of magnitude.
If you’re relatively unfamiliar with CTEs, you can think of them as a sort of named, temporary table built by querying existing artifacts in your database. They’re quite similar to views themselves with the notable exception of only existing within the scope of the current query. If you’re curious, you can read more about them here or here.
While we didn’t use Arel to construct the aforementioned view, it provided a great example of the power behind CTEs. Even though ActiveRecord doesn’t give us CTE support out of the box, we can craft our own CTEs with Arel (or you can use the postgres_ext gem if you’re using Postgres).
For the purposes of this exercise, imagine that we have the following models:
class Recipe < ActiveRecord::Base belongs_to: :author end class Author < ActiveRecord::Base has_many :recipes end
Both models have a primary key of
id, as usual. The
recipes table also has
an integer column called
rating that represents the quality of the recipe as
well as the standard
author_id foreign key. For this exercise, our CTE will
represent recipes whose rating is between 60 and 80.
To start off, we need to instantiate a new
Arel::Table. We’ll pass in a symbol
representing the name that we want our CTE to be referenced by in our final
query as the only argument.
cte_table = Arel::Table.new(:recipes_rated_between_60_and_80)
Now, we have to create the definition of our CTE.
recipes = Recipe.arel_table cte_definition = recipes .project(recipes[:id], recipes[:author_id]) .where(recipes[:rating].between(60..80))
Lastly, we need to combine our
cte_table and our
cte_definition in an
composed_cte = Arel::Nodes::As.new(cte_table, cte_definition)
We can now use our CTE in other queries. For example, if we wanted to grab all the authors who have written a recipe with a rating between 60 and 80, we could do the following:
authors = Author.arel_table author_ids = authors .project(authors[:id]) .join(cte_table).on(authors[:id].eq(cte_table[:author_id])) .with(composed_cte) Author.where(authors[:id].in(author_ids))
This leads to Rails executing the following SQL query:
SELECT "AUTHORS".* FROM "AUTHORS" WHERE "AUTHORS"."ID" IN ( WITH "RECIPES_RATED_BETWEEN_60_AND_80" AS ( SELECT "RECIPES"."ID", "RECIPES"."AUTHOR_ID" FROM "RECIPES" WHERE "RECIPES"."RATING" BETWEEN 60 AND 80 ) SELECT "AUTHORS"."ID" FROM "AUTHORS" INNER JOIN "RECIPES_RATED_BETWEEN_60_AND_80" ON "RECIPES_RATED_BETWEEN_60_AND_80"."AUTHOR_ID" = "AUTHORS"."ID" )
Thanks to ActiveRecord and Arel, we now have an
authors that fulfill our specified conditions.