1BRC: Who's the Fastest to Process a Billion Java Records? - JVM Weekly vol. 160

1. 1BRC – Who’s the Fastest to Process a Billion Java Records?

As I’ve repeatedly stated, December is a significant month in our coding community due to the Advent of Code – the programming contest involves solving often highly abstract tasks every morning. And while most people – like me, by the way – struggle to keep up the pace and not fall off before the end, which most – like me, by the way 🥲 – fail to do, a bunch of enthusiasts also gather every year. These are already so bored of occupying the top of the leaderboard that they try to make the whole game even more difficult, whether by optimising the whole thing or through twisted visuals (which is certainly helped by the elaborate “storylines” of the individual tasks).

The above image is not a joke – it’s reality. I confirm first-hand.

January arrives, however, and it would seem that everyone is exhausted after December’s challenges. However, it appeared that when Gunar Morling announced The One Billion Row Challenge, the software community that woke up from its winter slumber and decided to pick up the gauntlet once again.

The One Billion Rows Challenge (1BRC) is a project to test the performance of modern Java in processing large data sets, a billion rows from a text file to be exact. The text file contains temperature data from various weather stations, and each line consists of the station name and the temperature measurement to one decimal place. The task is to write a Java program that reads the file, calculates the minimum, average and maximum temperature for each station and then displays the results. Participants can use all available means, such as virtual threads, SIMD, GC optimisation and other tricks, to create the fastest execution of this task.

You are required to use Java 21 to take part in the challenge, although I can’t really imagine a situation in which someone would deliberately want to use an older one – unless just as an additional challenging constraint. Any optimisation of the programme is possible, subject to certain rules. Participants can use any OpenJDK distribution (I can already see space for each vendor), but cannot use external libraries and must provide an implementation in a single source file. The challenge is open until the end of January 2024 and the results themselves will be judged on the average of three trials on a dedicated server.

For me, the whole situation proves two things. The first is that developers like a challenge, comparing solutions with each other – but that probably comes as no surprise to anyone. The second observation is of a slightly different nature – the whole Christmas setting in Advent Of Coda is a cool, atmosphere-building addition, but definitely not essential. Even a seemingly trivial task (loading the data) can find fertile ground, as long as it is demanding enough to be able to ‘get out there’. At this point, more than 100 people have already proposed their solutions.

Adding another reference to the past – just a week ago, in my yearly roundup, I expressed dissatisfaction that Java only becomes a topic of discussion in the development community when there’s controversy such as a pricing alteration or Log4Shell. Hence, it’s heartening to see that even more complex matters like the Billion-Rows-Challenge are able to gain broader awareness.

For Gunnar’s initiative hit the very top of Hacker News.

At the time of writing this, it has received two thousand stars on GitHub and more than a hundred proposed solutions, including, among others, from Thomas Wuerthinger, founder of GraalVM. It is currently led by Roy van Rijn, founder of JUG Rotterdam. I recommend taking a look at the solutions list – the higher you go, the more you’ll encounter some serious black magic.

Interestingly, individuals from different programming backgrounds are participating and attempting to counter Java solutions. This stepping out of the bubble is very gratifying – I am of the opinion that such endeavors best illustrate the significant evolution of Java in recent years. Moreover, it demonstrates that Java is in no way inferior to other popular solutions in terms of its capability to handle large data sets.

2. Release Radar

Phoenix – a modern template engine for Spring

We will start today with an interesting Spring project.

Phoenix Template Engine is an experimental template engine that remains in the early stages of development. It aims to facilitate the development of complex web applications by providing an easy-to-use and understandable syntax for templating HTML code generated on the backend – so-called Server-Side-Rendering (SSR).

Phoenix sets itself apart from other template engines available for Spring because it only uses one special ‘@’ character to distinguish the HTML code from the programmable part of the template. This feature allows Java to be used directly in the template, removing the necessity to learn a new language syntax. Phoenix enables the creation of templates using constructors, the importation of Java classes, the definition of variables, for loops, and conditional if statements. It also safeguards inputs from CSRF attacks. Moreover, it claims to be quicker and more lightweight than Thymeleaf, due to the compilation of template files.

However, I personally always have concerns when I hear the phrase “lightweight” technology.

There’s a spoonful of honey in the honey barrel – any controller using Phoenix must inherit from the PhoenixController class, which can probably be at least a yellow light for many people, but it does offer some pretty interesting possibilities. One of these is reverse routing, which is becoming increasingly popular in frontend frameworks. What this means is that it allows us to dynamically create URLs in our templates – instead of hardcode href=/user/profile/123, you can use the reverse routing function to generate that URL by specifying the name of a particular route and providing a parameter (such as a user ID).

<a href="@routes.ProfileController.renderPage(123)">Go to this page</a>

I will quietly wait for the stable version and play around. I wish the developers the best of luck with the project!

Memories: the worst code I’ve ever seen was just templating where there was a triple nested if in three different technologies – in the jsp tag was a javascript scripplet that concatenated strings containing JavaScript. If anyone from Team Phoenix happens to be reading this – avoid this trap.

Instancio 4.0

Instancio is a tool that generates and fills test data objects for unit tests automatically. It eliminates the need for manual preparation of test data by enabling the creation of comprehensive objects, inclusive of nested objects, and collections using just one line of code. These objects are filled with random data, which can be regenerated if a test error occurs.

The code is said to be worth a thousand paragraphs of text, so the whole thing is used as follows – instead of:

Address address  = new Address();
address.setStreet("street");
address.setCity("city");

Person person = new Person();
person.setFirstName("first-name");
person.setLastName("last-name");
person.setAddress(address);

You can easily generate such an object by utilizing the factory provided below:

Person person = Instancio.create(Person.class);

Version 4.0 extends the handling of edge cases further, including an enhancement to method assignment that enables objects to be dynamically populated even in the absence of corresponding fields. This is particularly beneficial for objects with dynamic attributes. Support for sequential collections in Java 21 has been incorporated as well, simplifying data manipulation. A newly introduced Cartesian product generation API facilitates the creation of intricate data sets. The operation of state generators in the stream() function has been modified too, ensuring the generated objects are entirely independent of each other.

jOOQ 3.19

One of the most popular libraries for handling SQL in Java is taking a big step forward – as its free version is dropping support for Java 1.8. This, admittedly, will remain, but only for Enterprise Editon users.

What seems like a fair approach. If there’s anyone to make money from, it’s legacy projects at companies like Enteprise – let them throw that cost into their fitness function architecture.

And companies will definitely be eager to upgrade, as jOOQ 3.19 also brings some really useful features:

<strong>Explicit Path Joins</strong> This function enables the declaration of explicit path joins in queries. For instance: <code> ctx.select(CUSTOMER.FIRST_NAME, CUSTOMER.LAST_NAME, CUSTOMER.address().city().country().NAME) .from(CUSTOMER) .leftJoin(CUSTOMER.address().city().country()) .fetch(); </code>

To-Many Path Joins: This function allows for one-to-many joins using paths. ctx.select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME, ACTOR.film().TITLE) .from(ACTOR) .leftJoin(ACTOR.film()) .fetch();

Implicit Join Path Correlation: Function facilitates correlation of sub-queries with external queries using paths.

ctx.select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
  .from(ACTOR)
  .where(exists(
      selectOne()
      .from(ACTOR.film())
      .where(ACTOR.film().TITLE.like("A%"))
  ))
  .fetch();

Each of these functions facilitates the creation of complex SQL queries, improving code readability.

1BRC: Who’s the Fastest to Process a Billion Java Records? – JVM Weekly vol. 160

1. 1BRC – Who’s the Fastest to Process a Billion Java Records?

2. Release Radar

Phoenix – a modern template engine for Spring

Instancio 4.0

jOOQ 3.19

Discover more great content!