HABTM and JOIN trickery with CakePHP
Update (10/7/2009)… this functionality has been rolled into Habtamable behavior
An interesting question came up on IRC today…
If I have Post HABTM Tag, how do I select only Posts that have Tags “new” AND “cakephp”?
A seemingly simple problem, actually required a pretty tricky solution. I do have to say that it’s not because of some cake limitation, but really (at least for me) it’s not at all straight forward to do that type of query in good ol’ SQL (yeah, I don’t really know SQL all that well… good thing we’ve got google).
Before going further, I’ll assume that you have a solid understanding of the way HABTM works, concepts of join tables, auto “with” models and cake conventions. If not, you should probably refer to my previous posts about HABTM and/or read up in the manual.
Alright, let’s analyze the problem for a second… basically we need to grab all Posts where in the join table a single post_id would match two tag_id’s (i.e. the tag_id’s of “new” and “cakephp”). Note, that we cannot match just one or the other, we must have both tag_id’s matching the same post_id. Not only that, we don’t know tag_id’s as we only know the search terms (“new” and “cakephp”).
Looks like we’ll need some creative SQL and JOINs to get this working right…
Surely, we could rely on Model::query(), but let’s see if we can get this working with find() instead.
First of all we have to force cake to build a join query.
So let’s do something like this:
$searchTerms = array('cakephp', 'new');
$this->Post->unbindModel(array('hasAndBelongsToMany'=>array('Tag')));
$this->Post->bindModel(array('hasOne'=>array(
'PostsTag'=>array(
'foreignKey'=>false,
'type'=>'INNER',
'conditions'=>array('PostsTag.post_id = Post.id')
),
'Tag'=>array(
'foreignKey'=>false,
'type'=>'INNER',
'conditions'=>array(
'Tag.id = PostsTag.tag_id',
'Tag.tag'=>$searchTerms
)))));
I will briefly explain, what’s going on here (if you need more details, see the post linked above on how to force CakePHP to do a JOIN)…
We are telling cake to JOIN our Post model with the PostsTag model (join table: posts_tags) and then JOIN our PostTag model with our Tag model (tag table: tags).
The JOIN conditions are pretty simple, we ensure that Post.id matches the PostsTag.post_id and Tag.id matches PostsTag.tag_id. Of course we need to also ensure that we only grab the tags where Tag.tag is IN our search terms (see the $searchTerms array).
Once all of that is accomplished, we build our find() method:
$this->Post->find('all', array(
'group' => array('Post.id','Post.title HAVING COUNT(*) = '.$numCount)))
What?
Let’s break it down…
If we were to do a simple find(‘all’) we’d get all Posts that happen to have Tag.id’s matching either one of our search terms. This is not what we need.
By adding the GROUP BY and HAVING COUNT(*) = $numCount, we ensure that we match both of our Tag ids and not just one or the other. In other words, COUNT (*) must equal to the number of search terms.
OK, so what is $numCount?
In our example we know that we searched for two terms ($searchTerms array), therefore we could have done:
HAVING COUNT(*) = 2.
However, if our $searchTerms array had an unknown number of items, we’d do something like this prior to our find() call, to determine how many Tag.id’s must be matched in order for our query to be correct:
$numCount = count($searchTerms);
I can imagine that this is probably a bit over the top, but it certainly taught me a few things and reinforced some others, hopefully you’ll learn a thing or two from this as well ;)
Post->PostsTag->bindModel(
‘belongsTo’ => array(‘Post’)
);
$this->Post->PostsTag->find(
‘all’, array(
‘conditions’ => array(
‘PostsTag.tag_id’ => $tag_ids_array
)
)
);
?>
But first you should find the $tag_ids_array.
@wargoth
That’s not going to work for this case, because array(1,2,3,4) produces an IN clause, which is essentially an OR. We need to ensure that for a given post_id there is matching tag_id=1 AND tag_id=2
Ugh… there’s gotta be a simpler way of doing this. I mean, this works and great job figuring it out, but it seems to lack… elegance.
@wargoth, that also won’t work where you’re trying to do a MATCH() AGAINST() fulltext search.
@Finster
It doesn’t even look all that elegant in plain SQL ;)
That being said, I don’t know of any other way to force a JOIN in cake … and generally speaking this post was more of an experiment and a learning experience.
Oh, I wasn’t trying to criticize your method at all, more I was criticizing the fact that CakePHP still requires this level of “hackery” for HABTM. This is a very helpful post, all around!
@Finster
I agree, hopefully in 2.0 we’ll see a more elegant way to manage JOINs.
Oh finally, i’ve been looking for this solution for a long time..
But, then how can I add pagination for this result..?
@labanux
Just use $this->paginate(‘Post’);
I’m sorry, I’m still a little bit confused with pagination..
Consider I have Posts Controller, so is it will be :
some_function()
{
$this->Post->find(‘all’);
$this->set(‘posts’, $this->paginate(‘Post’);
}
So is that mean everytime I called $this->paginate->(‘Post’), the result will automatically taken from find() function before ?
Oh sorry, I mean
$this->Post->find(’all’); <– replaced with your code
I’ve tried that, and it didn’t work.
@labanux
You don’t need find(‘all’) if you use paginate() because paginate() does really the same thing (i.e. finds some data) except it applies the limit and order to your query. Well, that’s not fair… it does a lot of other great things, but on the basic level it’s just a way to find data for a given model.
The technique should work (don’t remember if I’ve tested it), since paginate() actually calls find() internally and applies the same JOINs as find() would.
$searchTerms = array(‘cakephp’, ‘new’);
$this->Post->unbindModel(array(‘hasAndBelongsToMany’=>array(‘Tag’)));
$this->Post->bindModel(array(‘hasOne’=>array(
‘PostsTag’=>array(
‘foreignKey’=>false,
‘type’=>’INNER’,
‘conditions’=>array(‘PostsTag.post_id = Post.id’)
),
‘Tag’=>array(
‘foreignKey’=>false,
‘type’=>’INNER’,
‘conditions’=>array(
‘Tag.id = PostsTag.tag_id’,
‘Tag.tag’=>$searchTerms
)))));
$this->set(‘posts’, $this->paginate(‘Post’)); Post->find(‘all’, array(‘group’ => array(‘Post.id’,'Post.title HAVING COUNT(*) = ‘.$numCount))) <– return the right value..
help me please.. :(
I’m dying solving this..
@labanux
You never explained what the actual problem is… you should really post it at the google group as you’ll get better help there.
But take a look at the queries generated, are you getting expected query for your paginate()?
[...] “jouons avec les habtm”, en un peu plus élaboré, je vous invite également à lire ce post de [...]
Hi
I gone through your article and i am currently user the following code inside my controller action
$this->User->unbindModel(array(‘hasAndBelongsToMany’=>array(‘Plan’)));
$this->User->bindModel(array(‘hasOne’=>array(
‘userPlan’=>array(
‘foreignKey’=>false,
‘type’=>’INNER’,
‘conditions’=>array(‘userPlan.UserId = User.UserId’)
),
‘Plan’=>array(
‘foreignKey’=>false,
‘type’=>’INNER’,
‘conditions’=>array(
‘Plan.PlanId = userPlan.PlanId’
)))));
Here my User table and Plan tables are not directly related to each other. UserPlan table having the userid and the planid.
But when i execute this gives me the following error
Query: SELECT `User`.`UserId`, `User`.`UserCode`, `User`.`Title`, `User`.`FirstName`, `User`.`LastName`, `User`.`Email`, `User`.`Password`, `User`.`Phone`, `User`.`Address1`, `User`.`Address2`, `User`.`City`, `User`.`State`, `User`.`PostalCode`, `User`.`OrganizationName`, `User`.`OrganizationUrl`, `User`.`UserStatus`, `User`.`UserIdentificationFlag`, `User`.`SupervisorAccess`, `User`.`CreateDate`, `User`.`ModifyDate`, `userPlan`.`Id`, `userPlan`.`UserId`, `userPlan`.`PlanId`, `userPlan`.`ActivationDate`, `userPlan`.`ExpirationDate`, `userPlan`.`ModifyDate`, `Plan`.`PlanId`, `Plan`.`PlanName`, `Plan`.`PlanDescription`, `Plan`.`PlanType`, `Plan`.`PlanFee`, `Plan`.`PlanPeriod`, `Plan`.`PlanStatus`, `Plan`.`PlanCreateDate`, `Plan`.`PlanModifyDate` FROM `MTS_User` AS `User` LEFT JOIN `MTS_UserPlan` AS `userPlan` ON (`userPlan`.`MTS_User_id` = `User`.`UserId` AND `userPlan`.`UserId` = `User`.`UserId`) LEFT JOIN `MTS_Plan` AS `Plan` ON (`Plan`.`MTS_User_id` = `User`.`UserId` AND `Plan`.`PlanId` = `userPlan`.`PlanId`) WHERE 1 = 1
Warning: SQL Error: 1054: Unknown column ‘userPlan.MTS_User_id’ in ‘on clause’ in C:\wamp\www\WMT\cake\libs\model\datasources\dbo_source.php on line 439
Please suggest me how to put the innter join in my case and get the desired result.
Any help would be greatly appriciated. I am really banging my head for this issue from last 3 days.
Thanks
@Gaurav
What version of cake are you using?
‘type’=>’INNER’ and ‘foriegnKey’=>false were introduced somewhere around RC1 release.
Hi
I am using CakePHP 1.1.
Thanks
@Gaurav
Yep, none of this is going to work on 1.1… so your best hope is custom SQL or upgrade, if you can (should).
I am getting an error if I put in ‘recursive’ =>2 in the find statement. I think it’s because we unbinded the relationship and the ‘recursive’ is trying to go back and use that relationship.
Do you have this problem?
@Jesse
The bindModel() plays the role of the recursive, so you don’t need to include recursion level, you just manually bind all the required Models in your JOIN.
Then again, you didn’t specify what error you are getting, so it’s a bit hard to guess ;)
I’ll include the SQL statement as well as the error:
SQL:
SELECT `OnsaleCoupon`.`id`, `OnsaleCoupon`.`onsale_merchant_id`, `OnsaleCoupon`.`title`, `OnsaleCoupon`.`content`, `OnsaleCoupon`.`datestart`, `OnsaleCoupon`.`dateend`, `OnsaleCoupon`.`retail_price`, `OnsaleCoupon`.`sale_price`, `OnsaleCouponsCategory`.`title`, `OnsaleCouponsCategory`.`onsale_coupons_categories_division_id` FROM `onsale_coupons` AS `OnsaleCoupon` INNER JOIN `onsale_coupons_categories` AS `OnsaleCouponsCategory` ON (`OnsaleCouponsCategory`.`id` = `OnsaleCouponsOnsaleCouponsCategory`.`onsale_coupons_category_id` AND `OnsaleCouponsCategory`.`id` IN (6, 11)) WHERE `OnsaleCoupon`.`datestart` NOW() GROUP BY `OnsaleCoupon`.`id`, `OnsaleCoupon`.`title` HAVING COUNT(*) = 2
Error:
1054: Unknown column ‘OnsaleCouponsOnsaleCouponsCategory.onsale_coupons_category_id’ in ‘on clause’
I’ve got my problems outlined here: http://groups.google.com/group/cake-php/browse_thread/thread/9a985e04ddf076bf/315c5a809af3c61b?lnk=gst&q=HABTM+searcg#315c5a809af3c61b
Any suggestions?
@Jesse
I haven’t looked at the issue very closely, as I am a bit short on time… but here are my suggestions:
1. Bind the Merchant model as part of your JOIN.
OR
2. Grab the required id’s in an array, perhaps using Set::extract() and run a separate query to grab the Merchant data.
… well I hope it sets you on the right track, if anything. Good luck ;)
I had to solve a similar problem, so here’s my solution.
I’m new to CakePHP, so i might have done something really horrible. On the other hand, my SQL statement should be good. I just tried to translate that to CakePHP.
/* SQL ***********************************************************************/
SELECT posts.*
FROM posts
WHERE posts.id IN ( SELECT t1.post_id
FROM posts_tags AS t1
JOIN tags ON tags.id = t1.tag_id
WHERE tags.name = “cakephp” )
AND posts.id IN ( SELECT t2.post_id
FROM posts_tags AS t2
JOIN tags ON tags.id = t2.tag_id
WHERE tags.name = “new” );
/* CakePHP *******************************************************************/
$args = array();
$args['tags'] = ‘cakephp, new’;
$filters = array();
if( isset( $args['tags'] ) && trim( $args['tags'] ) != ” ) {
$i = 0;
foreach( explode( ‘,’, $args['tags'] ) as $tag ) {
$i++;
$tag = mb_trim( $tag );
$filters[] = ‘Post.id IN ( SELECT t’.$i.’.post_id FROM posts_tags AS t’.$i.’ JOIN tags ON tags.id = t’.$i.’.tag_id WHERE tags.name = “‘.Sanitize::clean( $tag, array( ‘encode’ => false ) ).’” )’;
}
}
$posts = $this->paginate( ‘Post’, $filters );
@jmjjg
Thank you for sharing your approach. The sub-select, I’m personally not a big fan of… but if it does the trick for you, then that’s fine. That being said, the goal of the experiment was to avoid any straight SQL and I don’t see anyway to do so with sub-select.
@teknoid
Well, the sub-select is what naturally comes to my mind, and I have trouble (at the moment) “thinking in CakePHP”. So I’ll try your approach soon.
Also, I’m quite happy that you say “… avoid any straight SQL and I don’t see anyway to do so with sub-select [with CakePHP]“, which means that I at least searched correctly for a solution to that.
@jmjjg
Yep, well… I hope it works out for ya.
Hey All,
I’ve only started using cakephp recently, but have already learned a great deal in the short time I’ve been using it. To respond to earlier concerns of pagination, the reason this won’t work with the paginate function is because it calls find() twice. The bind and unbind functions will only apply your changes until the search has been performed. Unfortunately, I have no good advice on how to remedy this.
Cheers!
@Munks
The remedy is quite simple, you need to supply the false param to the bindModel() method, which makes the bind persistent. Please refer to the API for more detailed info.
i am new to cake php i get select query problem when i create my login page i wanted to clear it so please suggest some answers
@teknoid: I have been using your code fine up until I needed to get events created by Causes OR EventOrganisers and my inner join is on the Causes table so instantly discounts any events from EventOrganisers.
I’ve added a topic on Google Groups if you have the time to help me out:
http://groups.google.com/group/cake-php/browse_thread/thread/1222420fd151fefe
Thanks,
Paul.
The answer was a simple as changing the JOIN type to LEFT which I had tried earlier but at a time when I had tried a few other things at the same time which were the real issues.
I take it your use of INNER was a personal preference in a situation where you only wanted those records that had matching associations?
@Paul Gardner
INNER type, was indeed the situation during the write up.
While the manual approach is good, and I’m glad you’ve got it sorted out, I would totally love to hear any feedback (experience) with the habtamable behavior linked above ;)
I may just have to have a play with your habtamable behaviour. Would it resolve the following issue I am having with what I posted above?
Page: http://www.localcause.org.uk/causes
In the right column you can set Content Filters to limit the causes shown by one or more regions and categories. My categories are HABTM associated and selecting two categories that both match a cause (i.e. the cause says they work in both categories) results in the cause appearing twice in the results as the INNER JOIN to category created a new non-distinct data row.
Am I missing something obvious or is this really an issue that maybe your behaviour automatically filter this out? At present I am resolving it by specifying my fields as follows to force DISTINCT, but seems a little dirty to not be setting my fields as a true array:
‘fields’=>array(
‘DISTINCT Model.field1, Model.field2, Model.field3, Model.field3′
)
OK, so to get the values as I need them I only needed to add DISTINCT to Model.id rather than all fields, but that’s when I ran into the fun issue of paginate not using DISTINCT for it’s COUNT.
I have spent a few hours on this and came up with a little core hack (I hate doing this, but generally leads to a suggestion as to how to un-hack and achieve same result).
/cake/libs/controller/controller.php: #1066 (Edit)
Change $parameters = compact(‘conditions’); to
$parameters = compact(‘conditions’, ‘fields’);
/cake/libs/model/model.php #2042 (New line)
if (is_array($query['fields']) && stristr($query['fields'][0], ‘distinct ‘)) $query['fields'] = $query['fields'][0];
The first hack passes your fields array to the find(‘count’) call and the second hack then checks to see if the first field array value (normally your main models primary id) is DISTINCT.
If it is, it overwrites $query['fields'] to with $query['fields'][0] to satisfy the elseif(is_string()) check that already exists. This makes $db->calculate generate a query with COUNT(Model.primary_id) rather than COUNT(*) giving me the right paginate count INT.
Anyone know if we can implement this change without hacking the core?
@Paul Gardner
In order to make the “fake” bind work with pagination, you need to pass a false param. That makes the bind permanent and will be used with both the first (count) query and the second query, which actually fetches the results.
I don’t have that feature built-in in he behavior, but it should be extremely easy to add.
Hacking the core, is just not a good idea :)
@teknoid: that is true for most cases, but as paginate’s core Model::_findCount() method always does its counts as COUNT(*) when you are searching for Posts using say 3 tags and one post matches all 3 tags, then it counts the Post 3 times even if you have DISTINCT Post.id in your field list.
My hack, which I hate doing and don’t recommend to anyone, checks to see if my field list has DISTINCT Post.id as the first value in my fields array and if so use COUNT(Post.id).