Piotr Nowicki's Blog

FETCH JOIN is still a JOIN

2012-10-05

tags: boardwalk empire, distinct, fetch, hibernate, join, jpa, many-to-many, sql

JPA’s FETCH JOIN is useful in cases when you want to eagerly load some lazy loaded collection of your entity. By default all @OneToMany and @ManyToMany relationships are lazy loaded. There are some valid cases when (despite the settings in your mapped entity), you want to eagerly load the related entity.

One of the way of doing it is to use JOIN FETCH JPQL clause.

It’s worth remembering that JOIN FETCH is used mainly because of its side-effect of populating the pointed relationship – it’s still a JOIN and must be treated like it when it comes to the duplicated result set problem.

As a quick reminder, assume you have the following entities:

Author.java

@Entity
public class Author {

    @Id
    @GeneratedValue
    private long id;

    private String firstname;

    private String lastname;

    @ManyToMany(cascade = CascadeType.ALL)
    private List<Article> articles = new ArrayList<>(); 

    // c-tors, getters, setters and other methods
}

Article.java

@Entity
public class Article {

    @Id
    @GeneratedValue
    private long id;

    private String title; 

    // c-tors, getters, setters and other methods
}

Now if you want to fetch all Authors and eagerly load their Articles you might try invoking the following query:

SELECT a FROM Author a JOIN FETCH a.articles

Well, at the first sight it looks fine – I’m interested in all Authors and I just want their Articles to be populated. This example, however, suffers the same problem as regular OUTER | INNER JOIN: after invoking SQL’s JOIN clause, we’ll get all the rows that represents the Authors and each of their Articles – every pair will be resulted as a single row.

So, for the following data:

Author a1 = new Author("Manny", "Horvitz");
Author a2 = new Author("Margaret", "Schroeder");
Author a3 = new Author("Eli", "Thompson");
Author a4 = new Author("Enoch", "Thompson");

Article art1 = new Article("Bone for tuna");
Article art2 = new Article("Once upon a time");
Article art3 = new Article("Saint Valentine's Massacre");

a1.addArticles(art1, art2, art3);
a2.addArticles(art2, art3);
a3.addArticles(art1);
a4.addArticles(art3);

the above query will return:

Author [id=1, firstname=Manny, lastname=Horvitz]

Author [id=1, firstname=Manny, lastname=Horvitz]

Author [id=1, firstname=Manny, lastname=Horvitz]

Author [id=5, firstname=Margaret, lastname=Schroeder]

Author [id=5, firstname=Margaret, lastname=Schroeder]

Author [id=6, firstname=Eli, lastname=Thompson]

Author [id=7, firstname=Enoch, lastname=Thompson]

It’s very common problem with JOINs where each table from the left hand side is connected with more than one related rows in the right hand side table. As usual, this kind of problems might be solved using the DISTINCT keyword, so:

SELECT DISTINCT a FROM Author a JOIN FETCH a.articles

Will yield the proper result:

Author [id=5, firstname=Margaret, lastname=Schroeder]

Author [id=1, firstname=Manny, lastname=Horvitz]

Author [id=6, firstname=Eli, lastname=Thompson]

Author [id=7, firstname=Enoch, lastname=Thompson]

Still, each of the rows will have their Articles collection populated but now we’ve got only the unique Authors entities returned.

The conclusion is – even if JOIN FETCH is used for eagerly loading collections and not for using as a path variable in the JPA Query, you still should treat it as a full-blown JOIN in the cases like above.