JPA Eclipselink optimizing query performance on a large resultset

Recently I had to look for ways of optimizing a very slow database request on a very large ResultSet (more than 5000 results with several @OneToMany associations).

While reading EclipseLink's documentation I stumbled upon an interesting piece of information regarding handling large ResultSets. EclipseLink recommends using Streamed Cursors when handling large resultsets

According to the documentation : Cursored streams provide the ability to read back a query result set from the database in manageable subsets, and to scroll through the result set stream.

The initial query would load practically the entire database into memory in one request (all the associations are marked with QueryHint.LEFT_FETCH in order to eagerly fetch the associations upon the initial request).

I will not go into details regarding the process since it's irrelevant to this post not the goal of this post, but basically the program would query an Oracle Database transform every JPA Entity and store it in a somewhat different object in a Mongo Database

The process was really long (more than 1 hour between recovering all the objects from the Oracle Database and processing them into the Mongo Database

The DAO class

public CursoredStream getMyLargeResultset(){

   Query programsQuery = em.createNamedQuery("MyNamedQuery");
   // tell the EntityManager to return a StreamedCursor
   programsQuery.setHint("eclipselink.cursor", true);    
   return (CursoredStream) programsQuery.getSingleResult();
}

The client


public void synchronizeDatabases(){


    final int PAGE_SIZE = 50;

    //get the cursor
    CursoredStream cursor= myDao.getMyLargeResultset();
      
    // iterate through curor 
    cursor(!cursor.atEnd()){ 
 
        // get the next batch of objects and
        // cast it to the target entity 
        List myEntities = (List)(List)cursor.next(PAGE_SIZE);
 processEntities(myEntities);
      }

    cursor.close();

}

By using this technique I was able to reduce by a factor of 10 the process time of this operation!

PS: I realize that the double casting on the List is not very pretty and I could use a different approach by using the next() method without arguments but the existent method processEntities() accepted a List as a parameter and I wasn't allowed to modify that method

Documentation sources


No comments:

Post a Comment

OSX show used ports or listening applications with their PID

On OSX you can display applications listening on a given port using the lsof the commands described below will show listening application...