mstdn.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A general-purpose Mastodon server with a 500 character limit. All languages are welcome.

Administered by:

Server stats:

10K
active users

#compression

6 posts6 participants2 posts today

Would adding Brotli Compression help shrink ePubs?

shkspr.mobi/blog/2025/07/would

The ePub format is the cross-platform way to package an eBook. At its heart, an ePub is just a bundled webpage with extra metadata - that makes it extremely easy to build workflows to create them and apps to read them.

Once you've finished authoring your ePub, you've got a folder full of HTML0, CSS, metadata documents, and other resources. The result is then stored in a standard Zip file and is then renamed to .epub. This is known as the Open Container Format (OCF).

There are actually a few different compression schemes for Zip files, but the specification says:

OCF ZIP containers MUST include only stored (uncompressed) and Deflate-compressed ZIP entries within the ZIP archive.

The Deflate algorithm is venerable1 and, while incredible for its time, has been superseded by more modern compression schemes. For example, Brotli.

What happens if we unzip an ePub and then recompress it with Brotli? Will that dramatically reduce the file size?

Steps

  • Unzip the book
    • unzip book.epub -d book/
  • Brotli files can't contain directories, so tar the directory without any compression
    • tar -cvf book.tar book/
  • Create a Zip file with maximum compression
    • zip -9 book.tar.zip book.tar
  • Create a Brotli file with maximum compression
    • brotli -k -q 11 book.tar

Results

I took a random(ish) sample from Standard eBooks and a few from my personal stash2.

Book 1Book 2Book 3Book 4Contents768KB911KB389KB594KBDeflate250KB248KB103KB175KBBrotli190KB187KB82KB137KB

The good news is that ePubs compress pretty well already! That isn't much of a surprise - compression algorithms love the repetitious nature of HTML and human-readable text. Obviously Brotli is better but, on the file sizes we're talking about, not dramatically better. Saving 60KB is OK - but in a world of terabyte sized SD cards does it matter?

Brotli is also computationally harder to decompress, which makes it slightly less attractive for low-powered eReaders.

It's also possible to make a small saving by reducing the complexity and verbosity of the CSS and HTML.

However, that's not the real problem.

I lied to you

An ePub contains more than just text and text-based metadata. It can contain web fonts, images, even music. The above books had all their fonts and media stripped out. Let's run the experiment again but, this time, including everything in the original book.

Book 1Book 2Book 3Book 4Contents23MB3.8MB0.76MB0.93MBDeflate22MB1.7MB0.46MB0.51MBBrotli22MB1.5MB0.43MB0.47MB

All of a sudden, Brotli makes next to no difference. Yes, the textual compression is still there, but it is overshadowed by the huge cost of the media files.

Mixed Media

The ePub 3.3 specification lays out which multimedia formats are acceptable. As well as the older formats like gif, png, and jpeg - newer formats like WebP are acceptable. Similarly, TTF fonts are listed in the standard along with WOFF2.

Modern image and font formats have better compression than their ancestors. Indeed, WOFF2 uses Brotli as its compression scheme.

The biggest filesize saving in ePubs comes from properly compressing images and fonts.

Can You Picture That?

It is a matter of opinion as to what resolution is best suited to an ePub. Most modern eReaders have, at best, 300ppi resolution. They're also normally monochrome. But eBooks aren't always read on low-resolution, black and white eInk screens - so it probably makes sense to have high-resolution colour images in order to future-proof books.

But the compression of those images is not a matter of opinion. Lossless compression algorithms are well supported for legacy and modern image formats.

Let's take a specific example. Twenty Years at Hull House is the 22MB book above. Less than a MB of that is for text, the rest is images.

The largest illustration in the book is a 1937x1971, transparent PNG weighing in at 1MB. Increasing the lossless compression level takes it down to 840KB. Reducing the palette to something more suitable takes it to 640KB. If you were releasing this as an ePub 3.3 file, using WebP would take the image to a hair over 600KB.

Basically, a 20%-40% filesize reduction with no loss of fidelity.

Across all the PNG images in the ePub, I was able to easily get the filesize from 20MB to 16MB.

Converting to lossless WebP got it down to 13MB.

What The Font?

Fonts can be shrunk in a number of ways. The most obvious way is to compress to WOFF2 which, as described above, uses Brotli compression.

Based on my quick tests, a typical ePub's TTF will see about a 50% reduction in font size. For typical "English" language fonts, that's a reduction from 30KB to 15KB. So big relative compression, but small absolute compression.

Complex decorative fonts can go from 800KB to 80KB. But it is rare for a font to exceed a megabyte.

If it does, that usually means that it has more glyphs than strictly necessary. If your book is written entirely in the Latin alphabet, do you really need all those fancy accents, Chinese ideographs, and emoji? Probably not.

I've previously written about Subsetting Fonts and the perils of excessive trimming.

Back to Basics

Brotli is magic - but changing the compression algorithm for the ePub standard is probably a false economy. The text portion of modern eBooks is already fairly small and compresses with reasonable efficiency.

The best compression gains come from either using next-generation image and font formats or, if legacy compatibility is necessary, using the most aggressive compression settings for traditional images.

  1. OK! It is actually XHTML, but let's not quibble. ↩︎

  2. That's a fancy way of saying "old". ↩︎

  3. I couldn't be bothered automating this. Go ahead a run it on every ePub if you want something more representative. ↩︎

ePub logo.
Terence Eden’s Blog · Would adding Brotli Compression help shrink ePubs?
More from Terence Eden

🆕 blog! “Would adding Brotli Compression help shrink ePubs?”

The ePub format is the cross-platform way to package an eBook. At its heart, an ePub is just a bundled webpage with extra metadata - that makes it extremely easy to build workflows to create them and apps to read them.

Once you've finished authoring your ePub, you've got a folder full of HTML, CSS,…

👀 Read more: shkspr.mobi/blog/2025/07/would

#compression #epub

ePub logo.
Terence Eden’s Blog · Would adding Brotli Compression help shrink ePubs?
More from Terence Eden

I have to say that iZotope's Nectar remains incredible at cleaning up voices. I had two actors in which there was a very small amount of amplified noise. Small. But, for a guy like me, utterly unacceptable. Nectar's EQ and compression did the best job sweetening the voice and killing this minor "electric" gain than any other compression software I have.

🗜️ #compression #7zip
Pour illustrer les paramètres "compression maximale" dont j'ai parlé là : sebsauvage.net/links/?s0zmfA

Illustration avec un jeu : Loophole.

Décompressé: 3,18 Go
7z "ultra" (-mx=9) : 1,33 Go
7z avec mes réglages : 0,53 Go
(Et zpaq -m4 fait un peu mieux : 0,49 Go)

Bien sûr c'est un exemple qui marche bien, le gain ne sera pas forcément aussi bon sur d'autres données.

sebsauvage.net7-Zip en compression maximale. - Liens en vrac de sebsauvage

👑 #ArchiveDeJeux
#Compression #7zip
Comme il m'a été demandé, voici les paramètres que j'utilise désormais pour obtenir la meilleure compression avec 7-Zip.

Inconvénient :
- on passe en mono-thread (la compression prend *beaucoup* plus de temps ; la décompression sera un peu moins rapide).
- cela consomme plus de RAM (10 Go à la compression, 1 Go à la décompression).

1 Go pour la décompression ne me semble pas déraisonnable pour les machines de nos jours.

So I have hundreds of videos of ~1 minute recorded from my phone ~10 years ago, and they generally don’t have that great compression, nor they are stored in a modern and advanced video format.

For archiving purposes, I want to take advantage of my workstation’s mighty GPU to process them so that the quality is approximately the same, but the file size would be strongly reduced.

Nevertheless, compressing videos is terribly hard, and way more complex than compressing pictures, so I wouldn’t really know how to do this, what format to use, what codec, what bitrate, what parameters to keep an eye on, etc.

I don’t care if the compression takes a lot of time, I just want smaller but good looking videos.

Any tips? (Links to guides and tutorials are ok too)

Also, unfortunately I am forced to use Windows for this (don’t ask me why 🫠), but I know nothing about Windows because I hate it. Practical software suggestions are very much welcome, too!

#ffmpeg#help#askFedi