Skip to content

Conversation

@rPraml
Copy link
Contributor

@rPraml rPraml commented Jul 15, 2025

Hello @rbygrave,

we found an issue, when a wrong query was generated.

in our application, we got duplicate rows when an exists subquery was used.

In the provided test case, I've used the following path cachedBean.countries.name in an exists subquery.
The db model looks like this OBeanChild --m2o--> OCachedBean --m2m--> Country

I've tracked down the problem, that a SqlTreeNodeExtraJoin for OBeanChild --m2o--> OCachedBean was added.
This join was (correctly) initialized with manyJoin = false
Later, the OCachedBean --m2m--> Country join was added. This is a M2M join, but the ExtraJoin still means it is not a many join.

So we got the query

select t1.* from o_bean_child t1
  left join o_cached_bean t2 ... -- this is a M2O join and OK
  left join country t3 ... -- this is a M2M join and requires a distinct not to multiply rows
  where exists (select 1 ... t3.name = ...)

In this query, we would need a distinct, as a many join is in the join-path.

As possible fix, I've set manyJoin = true, when a many child was added. See 4f82a82

I don't know exactly, if this would be the correct fix as they may be children of children that have to convert the upper node ti hasMany = true

So maybe we would need to check all childrens in the following methods

boolean isManyJoin() {  return manyJoin;  } // CHECKME: Can we remove this and call `hasMany()` instead?
public boolean hasMany() { return manyJoin; }

and there is also a pathContainsMany: technically, the second join is a many, so pathContainsMany may be more acurate here, but just setting pathContainsMany=true would not be sufficient here.

Please take a look at the fix and the other tests that breaks this fix.
(I'll try to explain, what happened in the other tests)


shipQuery.findList();
assertSql(shipQuery.getGeneratedSql()).isEqualTo("select t0.id "
assertSql(shipQuery.getGeneratedSql()).isEqualTo("select distinct t0.id, z_bt1.total_amount "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- new SQL (with distinct - and orderby field)
select distinct t0.id, z_bt1.total_amount from or_order_ship t0 
    left join o_order t1 on t1.id = t0.order_id 
    left join (select order_id, count(*) as total_items, sum(order_qty*unit_price) as total_amount
    from o_order_detail group by order_id) z_bt1 on z_bt1.order_id = t1.id
    order by z_bt1.total_amount
-- old SQL (without distinct)
select          t0.id                     from or_order_ship t0 
    left join o_order t1 on t1.id = t0.order_id 
    left join (select order_id, count(*) as total_items, sum(order_qty*unit_price) as total_amount 
    from o_order_detail group by order_id) z_bt1 on z_bt1.order_id = t1.id
    order by z_bt1.total_amount

The problem here is, that totalAmount is a @Formula property with join. And that join could be theoretically a M2M
Ebean generates a SqlTreeNodeFormulaWhereJoin here, which always returns true for hasMany()


shipQuery.findList();
assertSql(shipQuery.getGeneratedSql()).isEqualTo("select t0.id "
assertSql(shipQuery.getGeneratedSql()).isEqualTo("select distinct t0.id "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--- new SQL (with distinct)
select distinct t0.id from or_order_ship t0
    left join o_order t1 on t1.id = t0.order_id
    left join (select order_id, count(*) as total_items, sum(order_qty*unit_price) as total_amount
    from o_order_detail group by order_id) z_bt1 on z_bt1.order_id = t1.id
    where z_bt1.total_amount is not null
-- old SQL (without distinct)
select          t0.id from or_order_ship t0
    left join o_order t1 on t1.id = t0.order_id
    left join (select order_id, count(*) as total_items, sum(order_qty*unit_price) as total_amount 
    from o_order_detail group by order_id) z_bt1 on z_bt1.order_id = t1.id 
    where z_bt1.total_amount is not null

Here is also just a distinct added

assertEquals(1, loggedSql.size());
assertThat(loggedSql.get(0))
.contains("select t0.identifier from child_person t0")
.contains("select distinct t0.identifier from child_person t0")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- new SQL (with distinct)
select distinct t0.identifier from child_person t0
    left join parent_person t1 on t1.identifier = t0.parent_identifier
    left join (select i2.parent_identifier, count(*) as child_count, sum(i2.age) as child_age
    from child_person i2 group by i2.parent_identifier) f2 on f2.parent_identifier = t1.identifier
    where coalesce(f2.child_age, 0) = ?
-- old SQL (without distinct)
select          t0.identifier from child_person t0
    left join parent_person t1 on t1.identifier = t0.parent_identifier
    left join (select i2.parent_identifier, count(*) as child_count, sum(i2.age) as child_age
    from child_person i2 group by i2.parent_identifier) f2 on f2.parent_identifier = t1.identifier
    where coalesce(f2.child_age, 0) = ?

assertEquals(1, loggedSql.size());
assertThat(loggedSql.get(0)).contains("select count(*) from child_person t0 left join parent_person t1 on t1.identifier = t0.parent_identifier");
assertThat(loggedSql.get(0)).contains("where coalesce(f2.child_age, 0) = ?");
assertThat(loggedSql.get(0)).contains("select count(*) from ( select distinct t0.identifier from child_person t0 left join parent_person t1 on t1.identifier = t0.parent_identifier");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- new SQL - with subquery & distinct
select count(*) from ( select distinct t0.identifier 
    from child_person t0 
    left join parent_person t1 on t1.identifier = t0.parent_identifier
    left join (select i2.parent_identifier, count(*) as child_count, sum(i2.age) as child_age
    from child_person i2 group by i2.parent_identifier) f2 on f2.parent_identifier = t1.identifier
    where coalesce(f2.child_age, 0) = ?) -- subquery ends here
-- old SQL (same as above, but not wrapped in a subquery
select count(*)                                      from child_person t0
    left join parent_person t1 on t1.identifier = t0.parent_identifier
    left join (select i2.parent_identifier, count(*) as child_count, sum(i2.age) as child_age
    from child_person i2 group by i2.parent_identifier) f2 on f2.parent_identifier = t1.identifier
    where coalesce(f2.child_age, 0) = ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant