fastutil - how to optimize your collections by primitives
fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion
Homepage of fastutil
What
So what does the quote above actually mean?
First we need to dive into, what is a Java Collection, and why are they "bad" for performance and memory requirements?
Java Collections (Collection<E>
) only works with objects, meaning that if we have a List<E>
which we populate with int
it’ll actually "autobox" the int
into a Integer
, i.e. the class, rather than the primitive type.
What does this mean for you as a user?
- It’s done through "autoboxing" which means automatic casting to the
Integer
-type, so nothing required for you - It allocates more memory than the primitive
int
.
So how do you create an effective List
that contains primitives, such as int, boolean
and float
? You can’t.
What you can do is to create an array, int[]
, which will contain the actual primitives, no autoboxing applied.
But what if you want to have the methods from List
, such as find
and "auto-resizing"? Then you’ll have to research and find a library, fastutil
to the rescue!
How
fastutil
implements their own versions of List
, HashMap
and so on which actually use the raw primitives, thereby increasing throughput while lowering memory used (as we’re not allocating as many objects anymore, when using primitives).
These types of libraries are only required once you hit an enormous amount of data or very strict requirement.
Installation
Using gradle
implementation: 'it.unimi.dsi:fastutil:8.4.1'
(latest version as of Aug 2020, mvnrepository)
Usage
DoubleToDoubleMap
val d2dMap: Double2DoubleMap = Double2DoubleOpenHashMap().apply {
(2.0, 5.5)
put(3.0, 6.6)
put}
(5.5, d2dMap.get(2.0)) assertEquals
This map is not only less memory-hungry (because using double
rather than Double
) but is also faster with insertions & get, than the Java Collections counterpart.
Why
Less space used & faster - it is as simple as that!
- No "AutoBoxing"
- No Object allocations for primitives
Side-note Something I noticed while working on my Language Model in Kotlin, with some strict requirements and a lot of data, was that even when using fastutil
I wasn’t gaining that much as I was mainly using views
of my Lists, further optimizing memory. Views are what the name implies, a view of the List. It never creates a copy but just the indices and make use of the original structure. Using immutable data this is very effective, but if you’d been using mutable data it could prove dangerous as someone can change the structure and data of your view (even if your view is immutable the underlying List might not be).
Alternatives
Goldman Sachs Collection - now Eclipse Collections - Probably the best alternative, in my opinion. HPPC - Carrot Search Labs Trove4j - Not as active as other alternatives, but who cares when it’s performant and "done"?
Find a 2015 benchmark of the libraries here At least both fastutil
& Eclipse Collections
are updated for Java 8 streams!
-Hampus Londögård