Java: Aggregate Data Off-Heap

Explore how to create off-heap aggregations with a minimum of garbage collect impact and maximum memory utilization.

Creating large aggregations using Java Map, List and Object normally creates a lot of heap memory overhead. This also means that the garbage collector will have to clean up these objects once the aggregation goes out of scope.

Read this short article and discover how we can use Speedment Stream ORM to create off-heap aggregations that can utilize memory more fficiently and with little or no GC impact.


Let’s say we have a large number of Person objects that take the following shape:

        public class Person {
    private final int age;
    private final short height;
    private final short weight;        
    private final String gender;
    private final double salary;
    // Getters and setters hidden for brievity

For the sake of argument, we also have access to a method called persons() that will create a new Stream with all these Person objects.

Salary per Age

We want to create the average salary for each age bucket. To represent the results of aggregations we will be using a data class called AgeSalary which associates a certain age with an average salary.

    public class AgeSalary {
     private int age;
     private double avgSalary;
    // Getters and setters hidden for brievity

Age grouping for salaries normally entails less than 100 buckets being used and so this example is just to show the principle. The more buckets, the more sense it makes to aggregate off-heap.


Using Speedment Stream ORM, we can derive an off-heap aggregation solution with these three steps:

Create an Aggregator

        var aggregator = Aggregator.builderOfType(Person.class, AgeSalary::new)

The aggregator can be reused over and over again.

Compute an Aggregation

    var aggregation = persons().collect(aggregator.createCollector());

Using the aggregator, we create a standard Java stream Collector that has its internal state completely off-heap.

Since the Aggregation holds data that is stored off-heap, it may benefit from explicit closing rather than just being cleaned up potentially much later. Closing the Aggregation can be done by calling the close() method, possibly by taking advantage of the AutoCloseable trait, or as in the example above by using streamAndClose() which returns a stream that will close the Aggregation after stream termination.

Everything in a One-Liner

        persons().collect(Aggregator.builderOfType(Person.class, AgeSalary::new)

There is also support for parallel aggregations. Just add the stream operation Stream::parallel and aggregation is done using the ForkJoin pool.

  • The root interface defines the contract
  • An intermediate class implements common behavior i.e. Bar
  • If necessary, a class in the hierarchy overrides this behavior e.g. Corge

A wrench in the works

This is perfect, until classes outside the reach of the API designer can implement the interface. The following hierarchy describes the List part of the Java Collections API, with an additional custom class:

Now, let’s introduce the sort() method in the List interface. Only classes i.e.AbstractList and MyList can actually implement this method.

Obviously, it’s impossible to enforce the same sort() implementation in both classes, even though it makes sense. Direct implementations of List have to duplicate (yuck!) the sort() of AbstractList.

In order to remove the duplication and DRY the design, Java API designers have moved the sort() method out of List to an unrelated class with only staticmethods.

On the flip side, static methods are not object-oriented. Worse, there’s no relationship from List to Collections in the code (though there’s one in the opposite direction). Hence, if one is not aware of the Collections class and its features, there’s no way to know about it.

javacodegeeks is optimized for learning.© javacodegeeks .
All Right Reserved and you agree to have read and accepted our term and condition