Recursive Queries with MySQL

This is an old post!

This post is over 2 years old. Solutions referenced in this article may no longer be valid. Please consider this when utilizing any information referenced here.

Discovered something neat with the new version of MySQL and thought it warranted a mention. Storing tree structures in a relational database is a common use case across many different areas of tech. The problem comes when you need to construct a query based on a subset of that tree.

But MySQL 8 has some nice new features that makes doing this a breeze.

For example, let’s assume you have a set of tables that look like this:

CREATE TABLE `files` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `parent_id` bigint(20) NOT NULL,
  `kind` enum('file','folder') NOT NULL,
  PRIMARY KEY (`id`),
  KEY `parent_id` (`parent_id`)
);

CREATE TABLE `tags` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
);

CREATE TABLE `tag_file` (
  `tag_id` bigint(20) NOT NULL,
  `file_id` bigint(20) NOT NULL,
  PRIMARY KEY (`tag_id`,`file_id`)
);

This is a pretty standard setup for storing a tree setup in a relational table, with the parent_id key referencing id of the parent. And this all works well … right up until you need to query a parent and all of it’s children. So let’s say you want to find all children that have a tag of ‘Important’.

MySQL 8 includes support for recursive common table expressions. Using this, this becomes a pretty easy query. You can create a CTE query and recursively call it! You could so something like this:

with recursive cte (id, parent_id) as (
    select id, parent_id from files where parent_id = ? and kind = 'folder'
    union all
    select p.id, p.parent_id from files p inner join cte on p.parent_id = cte.id where kind = 'folder'
)

select * from files inner join tag_file on (tag_file.file_id = file.id)
inner join tags on (tags.id = tag_file.tag_id) where kind = 'file'
and tag = 'Important' and parent_id in (select id from cte);

(That is a prepared statement, replace the ? with the ID of the parent.)

What surprised me was how fast that query is with the right keys. On a table that has nearly 600,000 items in it, that query completes in about 0.3 seconds. Slow, but considering the number of rows in the table quite fast.

Thanks to this post on Stack Overflow for the heads up.

About the Author

Hi, I'm Rob! I'm a blogger and software developer. I wrote petfeedd, dystill, and various other projects and libraries. I'm into electronics, general hackery, and model trains and airplanes. I am based in Huntsville, Alabama, USA.

About Me · Contact Me · Don't Hire Isaiah Armstrong

Did this article help you out?

I don't earn any money from this site.

I run no ads, sell no products and participate in no affiliate programs. I do not accept gifts in exchange for articles, guest articles or link exchanges. I don't track you or sell your data. The only third-party Javascript on this website is Google Analytics.

In general I run this site very much like a 1990s homepage or early 2000s personal blog, meaning that I do this solely because it's fun! I enjoy writing and sharing what I learn.

If you found this article helpful and want to show your appreciation, a tip or donation would be very welcome. Feel free to choose from the options below.

Comments (0)

Interested in why you can't leave comments on my blog? Read the article about why comments are uniquely terrible and need to die. If you are still interested in commenting on this article, feel free to reach out to me directly and/or share it on social media.

Contact Me
Share It

Interested in reading more?

Linux

Backing Up and Rotating MySQL Databases the Easy Way

Here’s a little quickie for you. Say you have a small MySQL server floating around your house that you want to have regular backups of. You do want regular backups right? In my case, the biggest motivation was wanting a regular way to grab a recent MySQL dump of an internal tool I use at home to develop against. After poking around the Internet a bit, I was surprised that, other than mysqldump itself, there doesn’t seem to be a simple tool out there that you can slam into a cronjob and let it do it’s thing. So, like any good hacker, I decided to brew my own. After all, when you have 256,428 different solutions, why not make solution 256,429? :)
Read More
MySQL

Finding Multi-byte Characters in MySQL Fields

So I was recently helping a client with an issue in MySQL where a migration failed to transfer the full contents of some fields. This amounted to a little over 1% of the total messages transferred. In doing some research, we discovered that the one thing every message had in common was the presence of multi-byte (high unicode) characters. In many cases, this was due to a user pasting some text from Microsoft Word.
Read More
MySQL

MySQL mathematical operations and NULL values

So I came across an interesting quirk in MySQL the other day. Let’s say you have a table schema and some values that look like this: +-------------------+------------------+------+-----+---------+-------+ | Field             | Type             | Null | Key | Default | Extra | +-------------------+------------------+------+-----+---------+-------+ | page_id       | varchar(30)      | YES  |     | NULL    |       | | clicks            | int(10) unsigned | YES  |     | NULL    |       | +-------------------+------------------+------+-----+---------+-------+ +---------+--------+ | page_id | clicks | +---------+--------+ | 1 | NULL | +---------+--------+ And then let’s say you pass the following SQL statement to MySQL: update page_click_count set clicks = clicks + 1 where page_id=1; If you come from a loosely-typed language such as PHP, you would probably expect clicks for page_id 1 to now be 1. But that’s not the case in MySQL. After the query is run, the table will still look like this: +---------+--------+ | page_id | clicks | +---------+--------+ | 1 | NULL | +---------+--------+ Not only does the query fail, but it fails with no warnings given. It appears that mathematical operations on null values silently fail. There are a couple of ways around this. The first and most obvious is to set NOT NULL and a default value on the column. In the example above, this would work. The NULL value in that field becomes a 0 and you can to normal mathematical operations on it. But what happens if, for whatever reason, you can’t do that? We actually have this situation in a few places at dealnews, where NULL represents a distinct value of that field that is different from 0. In this case, you can use COALESCE() to fill in the appropriate value for the field. update page_click_count set clicks = coalesce(clicks, 0) + 1 where page_id=1; Edit: Brian Moon informs me that this is actually part of the SQL specification. So hooray for specifications. Still, it’s kind of arcane; in working with MySQL (and PHP) for a decade now, this is the first time I’ve ever actually encountered this. Hopefully this helps someone who was as confused as I was.
Read More