HABTM and JOIN trickery with CakePHP

Update (10/7/2009)… this functionality has been rolled into Habtamable behavior

An interesting question came up on IRC today…

If I have Post HABTM Tag, how do I select only Posts that have Tags “new” AND “cakephp”?

A seemingly simple problem, actually required a pretty tricky solution. I do have to say that it’s not because of some cake limitation, but really (at least for me) it’s not at all straight forward to do that type of query in good ol’ SQL (yeah, I don’t really know SQL all that well… good thing we’ve got google).

Before going further, I’ll assume that you have a solid understanding of the way HABTM works, concepts of join tables, auto “with” models and cake conventions. If not, you should probably refer to my previous posts about HABTM and/or read up in the manual.

Alright, let’s analyze the problem for a second… basically we need to grab all Posts where in the join table a single post_id would match two tag_id’s (i.e. the tag_id’s of “new” and “cakephp”). Note, that we cannot match just one or the other, we must have both tag_id’s matching the same post_id. Not only that, we don’t know tag_id’s as we only know the search terms (“new” and “cakephp”).

Looks like we’ll need some creative SQL and JOINs to get this working right…

Surely, we could rely on Model::query(), but let’s see if we can get this working with find() instead.

First of all we have to force cake to build a join query.

So let’s do something like this:


$searchTerms = array('cakephp', 'new');

$this->Post->unbindModel(array('hasAndBelongsToMany'=>array('Tag')));
		
$this->Post->bindModel(array('hasOne'=>array(
                                                     'PostsTag'=>array(
                                                         'foreignKey'=>false,
                                                         'type'=>'INNER',
                                                         'conditions'=>array('PostsTag.post_id = Post.id')
                                                     ), 
                                                    'Tag'=>array(
                                                          'foreignKey'=>false,
                                                          'type'=>'INNER',
                                                          'conditions'=>array(
                                                              'Tag.id = PostsTag.tag_id',
                                                              'Tag.tag'=>$searchTerms
)))));

I will briefly explain, what’s going on here (if you need more details, see the post linked above on how to force CakePHP to do a JOIN)…

We are telling cake to JOIN our Post model with the PostsTag model (join table: posts_tags) and then JOIN our PostTag model with our Tag model (tag table: tags).
The JOIN conditions are pretty simple, we ensure that Post.id matches the PostsTag.post_id and Tag.id matches PostsTag.tag_id. Of course we need to also ensure that we only grab the tags where Tag.tag is IN our search terms (see the $searchTerms array).

Once all of that is accomplished, we build our find() method:

$this->Post->find('all', array(
                              'group' => array('Post.id','Post.title HAVING COUNT(*) = '.$numCount)))

What?

Let’s break it down…

If we were to do a simple find(‘all’) we’d get all Posts that happen to have Tag.id’s matching either one of our search terms. This is not what we need.

By adding the GROUP BY and HAVING COUNT(*) = $numCount, we ensure that we match both of our Tag ids and not just one or the other. In other words, COUNT (*) must equal to the number of search terms.

OK, so what is $numCount?
In our example we know that we searched for two terms ($searchTerms array), therefore we could have done:

HAVING COUNT(*) = 2.

However, if our $searchTerms array had an unknown number of items, we’d do something like this prior to our find() call, to determine how many Tag.id’s must be matched in order for our query to be correct:

$numCount = count($searchTerms);

I can imagine that this is probably a bit over the top, but it certainly taught me a few things and reinforced some others, hopefully you’ll learn a thing or two from this as well ;)

‘fields’, ‘conditions’ and associated models in CakePHP 1.2

Sometimes you see code that tries to do something like this:

$this->Company->find('all', array('conditions' =>array('Profile.name'=>'test'),
                                            'fields'=>array('Profile.id, Profile.name')))

Of course most of the time you’ll hear: “Well, this doesn’t work!”
Yet, some people will disagree: “Works for me!”

So, what’s really going on here?

Whether the above syntax works or not, would depend on your model association. Or, more specifically, whether or not an SQL JOIN is built.

If we assume that Company hasOne Profile, then the above find() will build a query like:

SELECT 
   Profile.id, Profile.name 
FROM 
   `companies` 
AS 
   `Company` 
LEFT JOIN 
   `profiles` AS `Profile` 
ON 
   (`Profile`.`company_id` = `Company`.`id`) 
WHERE 
   `Profile`.`name` = 'test'

… which is perfectly legal.

Of course if you attempt to do this for hasMany or HABTM, the query is going to fail.
You’ve probably guessed that the same works for belongsTo.

Forcing an SQL JOIN in CakePHP

Update (23/7/2010) The improved way to handle this now-a-days: http://book.cakephp.org/view/872/Joining-tables
Update (10/7/2009)… this functionality has been rolled into Habtamable behavior

You’ve probably noticed that by default CakePHP will only create a JOIN query if you have a hasOne or belongsTo associations between your models. There are cases, however, when a JOIN is necessary to get just the right data.

So how do you tell cake to create a JOIN?

Well, you have to rely on some crafty methods, but all in all it’s not very hard, once you get the hang of it…

I will summarize here a few tricks I’ve encountered on the google group, so big thanks to the original authors of these ideas.

Let’s take our favorite sample models:

User hasMany Post
and
Post hasAndBelongsToMany Tag

If we were to do: $this->User->find(‘all’);
CakePHP will run a few selects and return you all the User data. The problem, however, is that if a User doesn’t have any posts CakePHP will still return you the User and an empty array for Post (I’m sure you’ve noticed that before)…

So let’s try to force cake to do a JOIN…

First things first, we have to make cake forget about our previous bindings between the models:


$this->User->unbindModel(array('hasMany'=>array('Post')));
$this->User->Post->unbindModel(array('hasAndBelongsToMany'=>array('Tag')));

Now, we need to trick CakePHP into thinking that we’ve got a hasOne relationship between our models, so that it will build a JOIN query…

We can use bindModel() to achieve that:

$this->User->bindModel(array('hasOne'=>array('Post'=>array(),
                    'PostsTag'=>array(
                        'foreignKey'=>false,
                        'conditions'=>array('PostsTag.post_id = Post.id')),
                    'Tag'=>array(
                        'foreignKey'=>false,
                        'conditions'=> array('PostsTag.tag_id = Tag.id')
))));

So what’s going on here?

We are telling cake to bind the User model using a hasOne relationship to the Post model, which forces cake to build the JOIN query like:

LEFT JOIN `posts` AS `Post` ON (`Post`.`user_id` = `User`.`id`)

Next, we need to ensure that we get all related information from our joinTable (i.e. posts_tags), so that we can eventually get the relevant Tags. Just as before we tell cake to bind PostsTag using a hasOne relationship. Note, we need to tell CakePHP how to do the JOIN condition by specifying the ‘conditions’ key. Therefore, cake will do:

LEFT JOIN `posts_tags` AS `PostsTag` ON (`PostsTag`.`post_id` = `Post`.`id`)

Last, but not least, we have to grab all the Tags. Again, we tell cake to join our Tag model using hasOne (of course here we also have to specify the conditions).

The complete query looks like:

SELECT `User`.`id`, `User`.`username`, `User`.`password`, `User`.`name`, `User`.`created`, `Post`.`id`, `Post`.`title`, `Post`.`post`, `Post`.`created`, `Post`.`modified`, `Post`.`user_id`, `PostsTag`.`id`, `PostsTag`.`post_id`, `PostsTag`.`tag_id`, `PostsTag`.`status`, `Tag`.`id`, `Tag`.`tag`, `Tag`.`status` FROM `users` AS `User` LEFT JOIN `posts` AS `Post` ON (`Post`.`user_id` = `User`.`id`) LEFT JOIN `posts_tags` AS `PostsTag` ON (`PostsTag`.`post_id` = `Post`.`id`) LEFT JOIN `tags` AS `Tag` ON (`PostsTag`.`tag_id` = `Tag`.`id`) WHERE 1 = 1

OK, this is much better and pretty much what we need. However, we are still getting records with empty values, yet we need only the records where the User has a Post and a Post has a Tag.
Well, this is actually due to the fact that we are using a LEFT JOIN, what we really need is an INNER JOIN. You can look up the difference, if you are not sure, but basically INNER JOIN ensures that matching records must exist in all tables.

Remember that for a hasOne (or belongsTo) relationship you can specify a ‘type’ key, so go ahead and modify the above bindModel() call to include ‘type’ => ‘INNER’ for both PostsTag and Tag association.
How? Here’s a hint: ‘foreignKey’=>false, ‘type’=>’INNER’, ‘conditions’=>array(‘PostsTag.post_id=Post.id’)).

Now, we’ve got just the records we were looking for.

A quick note on ‘foreignKey’=>false…
It’s necessary to specify that, so that CakePHP does not attempt to automagically establish a relationship between the models, instead it forces cake to use our conditions for the JOIN
(i.e. ‘conditions’=>array(‘PostsTag.post_id = Post.id’)).

P.S. as josoroma pointed out, if you use this method with paginate() be sure to supply a ‘false’ param, to your bindModel() so that the binding persists for all subsequent methods, i.e. paginate() and paginateCount()

LEFT JOIN vs INNER JOIN

By default, if you have a hasOne or belongsTo relationship between models, CakePHP will build a JOIN query using LEFT JOIN. In some cases, however, you would really prefer to do an INNER JOIN instead.

This is easily accomplished by specifying the ‘type’ key in your association.

So, for example, User hasOne Post:

var $hasOne = array(‘Post’=>array(‘type’=>’INNER’));

That’s all. Cake will now build a query using the INNER JOIN.

Remember that this key only works in the hasOne or belongsTo associations. If you would like to know how to force CakePHP to do JOIN for other types of associations, please take a look at this post.