Benchmarking Java Image Libraries encoding: Speed, Quality, and File Size

TL;DR

When benchmarking Java image libraries, what matters most depends on your use case. Speed is critical for high-throughput services, file size drives storage and bandwidth costs, and quality is non-negotiable when visual output matters. Testing across all three reveals a more honest picture. In this benchmark covering seven formats, JDeli leads overall on speed, quality, and reliability, producing virtually no broken files and handling every format without plugins. ImageIO (with plugins) holds its own on file size and raw speed in certain formats, and Apache delivers solid, consistent quality where it has coverage. But if you need one library that performs well across all three metrics, JDeli is the clear all-rounder. Check out our performance comparisons page if you’re just interested in the numbers

As a developer and product manager, I’m always looking closely at how our library performs, especially compared to other options. I want to know where we’re doing well, where we can improve, and what actually makes a difference in real-world use.

Over the years, I have benchmarked various image encoding and decoding libraries including JDeli and shared the results.

This time, I decided to expand things a bit further and run broader performance benchmarks across the board and dive into performance. Partly because it’s useful, and partly because let’s be honest, it’s fun to see how your work stacks up. I’ll walk you through how I ran the tests and, more importantly, which metrics I believe actually matter.

Trial JDeli Now

So… what actually matters?

In a world obsessed with faster and smaller, it’s tempting to look for a single number that tells you which library is “best”.

But as my boss likes to remind me, referencing Mazzeo’s Law:

“The answer to every strategic question is… It depends.”

The right answer depends entirely on what you care about.

If you’re running a high-throughput service, speed might be everything. If you’re trying to reduce bandwidth or storage costs, file size matters more. And if the visual result is critical, then quality is non-negotiable.

There isn’t one winner across every scenario.

So instead of focusing on just one metric, I’ve looked at the three that tend to matter most in the real world:

Speed
File size
Quality

Together, they tell a much more useful story.

How the benchmarks work

To measure these, I used a combination of JMH for performance testing and SSIM for image quality.

Why JMH?

JMH (Java Microbenchmark Harness) is the gold standard for benchmarking Java code. It handles JVM warm-up, avoids common benchmarking pitfalls, and produces reliable, repeatable results.

This is important because naïve benchmarks can be wildly misleading.

Why use this over simpler approaches like System.nanoTime(), custom timers, or basic loop-based benchmarks? The main reason is accuracy. Benchmarking Java code correctly is surprisingly difficult because of JVM optimisations like JIT compilation, garbage collection, and dead code elimination. These can make naïve benchmarks produce misleading and sometimes completely wrong results.

JMH is designed specifically to handle these issues. It includes proper warm-up phases so the JVM can fully optimise the code before measurements are taken, runs multiple iterations to produce stable averages, and uses techniques to prevent the JVM from optimising away the code being tested. It also helps isolate the benchmark from other system activity as much as possible.

In short, JMH gives you confidence that you’re measuring the actual performance of your code, not the side effects of how the JVM happens to be behaving at that moment.

Why SSIM instead of VIF, PSNR or MSE?

For image quality, I’ve used the Structural Similarity Index (SSIM) rather than relying on metrics like PSNR (Peak Signal-to-Noise Ratio) or MSE (Mean Squared Error), and instead of going with Visual information fidelity (VIF ) .

The main reason is simple: I care about how the image looks, not just how different it is numerically.

Metrics like PSNR and MSE work by measuring pixel-by-pixel error. That’s useful, but it doesn’t always match what a person actually sees. You can end up with an image that scores worse on paper but looks better to the eye.

SSIM takes a more perceptual approach, looking at things like structure, luminance, and contrast. In practice, it lines up much more closely with how we judge image quality.

VIF is also a strong option and can often outperform SSIM, but it’s significantly more computationally expensive. For this kind of broad benchmarking, that extra cost doesn’t really justify itself.

SSIM hits a good balance: it’s fast enough to run at scale, and accurate enough to reflect real-world visual quality.

Why not use multiple Quality metrics together?

Many benchmarks include SSIM alongside PSNR and MSE, and there’s nothing inherently wrong with that approach.

However, in practice, these additional metrics often don’t change the overall conclusions — they just add more numbers to interpret.

For this benchmark, the goal is to compare practical performance in realistic scenarios, not to perform academic analysis of compression error.

SSIM provides a strong, reliable indicator of perceptual quality on its own, and adding additional error-based metrics wouldn’t materially change the decisions most developers would make based on the results.

It also keeps the results simpler and easier to interpret.

If there’s a visible quality difference, SSIM will reflect it. If there isn’t, that’s usually what matters most.

Benchmark Methodology

Before getting into the results, it’s worth explaining how the benchmarks were run. Performance numbers without context can be misleading, and small differences in setup can produce very different outcomes.

The goal here wasn’t to create a synthetic “best case” scenario, but to measure performance in a way that reflects how image libraries are actually used in real systems.

Test Environment

All benchmarks were run using the following setup:

CPU: M1
RAM: 16GB
OS: MacOS Tahoe 26.3.1
Java version: 17
JDeli version: 2026.04
Comparison libraries: ImageIO(including plugins: twelvemonkeys, JAI, darkXanther for webp), Apache.

Each test was run on an otherwise idle system to minimise interference from background processes.

Test Images

The image set used for testing is the PngSuite test-suite

Using a varied dataset helps ensure the results reflect real-world usage rather than favouring a specific optimisation. I would usually try to stick to images that work for all libraries I'll be comparing but this time I have chosen this corpus because it includes a lot of varied images that cover every aspect that should be supported and will report on if they can be handled as part of the benchmarking.

Ensuring Fair Comparisons

Where possible, equivalent settings were used across libraries to ensure fair comparisons.

However, this is one of the challenges of benchmarking image encoding libraries — different libraries expose different configuration options, and not all settings map perfectly.

The goal was to use realistic, production-style configurations rather than artificially tuning one library to outperform others.

Speed: The metric everyone looks at first

Let’s start with speed — because nobody wants slow image processing.

Speed benchmarks were run using JMH (Java Microbenchmark Harness).

Each benchmark was run across multiple iterations, and the reported results reflect the average performance after the JVM had fully stabilised.