What Actually Makes a PDF Smaller — A Compression Guide and Fair Benchmark Method

Quick summary: "Compress PDF" sounds like a single operation, but a PDF can shrink for four completely different reasons, and the one your tool picks determines whether you get a 10% saving or a 90% saving — and whether your images turn to mush. This guide explains the actual mechanics of PDF compression, the three things that really drive a PDF's size, and a fair, reproducible way to benchmark compressors so you're not fooled by a tool that crushes quality to hit a big number.

"Compress PDF" is four different things

When a tool says it compresses your PDF, it is doing one or more of these — and they have very different consequences:

Image resampling and re-encoding. Most PDF size comes from embedded images. Downsampling them (e.g. from 600 DPI to 150 DPI) and re-encoding (e.g. lossless PNG → lossy JPEG/WebP) is where the biggest savings live — and where the biggest quality loss hides.
Font subsetting. A PDF can embed entire font files when it only uses a handful of glyphs. Subsetting keeps only the characters actually used, which can save a surprising amount on text-heavy documents.
Deduplication and object optimization. PDFs are a structured object format. They can contain duplicate objects, unused resources, or inefficient encodings. Cleaning these up costs no quality and can trim a messy file substantially.
Stream recompression. The raw content streams inside a PDF are already compressed (usually FlateDecode). Re-compressing them with a better algorithm or level squeezes a little more out at no quality cost, but the gains are small because the data was already compressed.

A good compressor lets you choose how aggressively it applies #1, because #1 is the one with a quality tradeoff. A bad compressor applies maximum image destruction silently and brags about the size reduction. Understanding this is the difference between "my PDF is now 80% smaller" and "my PDF is now 80% smaller and looks terrible."

What actually drives a PDF's file size

If you want to know why your specific PDF is big, it's almost always one of these three:

Images. A scanned document saved as full-resolution images per page is the classic offender. One 600 DPI color scan can be multiple megabytes per page. Resampling to a screen/print-appropriate DPI is the single highest-impact lever.
Embedded fonts (un-subsetted). Documents from certain authoring tools embed complete font families. On a long text document this can dominate the size.
Bloat and duplicates. Repeated exports, "save" over "save," inserted pages from other PDFs, and embedded resources that are no longer used all accumulate. This is the "my PDF mysteriously grew every time I edited it" effect.

You can find out which one applies to your file before compressing: a tool like Page Dimensions or View Metadata shows the document's properties, and PDF to JSON exposes the structure. Knowing the cause tells you which compression lever will actually help — there's no point re-encoding images on a file whose size is all un-subsetted fonts.

The quality trap: why "90% smaller" can be a bad sign

The easiest way for a compressor to produce an impressive percentage is to destroy image quality. A 10 MB scanned document can become a 500 KB document if every image is downsampled to 72 DPI and re-encoded as heavily-compressed JPEG. On a phone screen that might look acceptable; printed or zoomed, it's unusable.

This is why percentage reduction alone is a misleading metric. A tool that achieves 60% reduction with visually-lossless images is far better than one that achieves 90% by nuking them. Any honest benchmark reports quality alongside size — typically DPI of the output images and a visual or structural comparison.

The corollary: when a "compress PDF" site leads with a giant percentage, treat it as a warning, not a feature. Ask what it did to get there.

A fair, reproducible benchmark method

Most "we compared PDF compressors" articles are meaningless because they run each tool on a different file, or on one unrepresentative file, or with each tool's default (unknown) settings. Here is a method that actually produces comparable numbers. Run it yourself on any set of tools — including CommandPDF's compressor.

1. Build a varied corpus

A single PDF tells you nothing. Build a small set covering the common cases:

A scanned document (image-heavy, high DPI).
A text-heavy document (fonts dominate).
A mixed document (text + charts/photos — the typical business report).
A graphics-heavy document (vector + embedded images).

Use the same input files for every tool. Record the original size of each.

2. Fix the quality target, then measure size

The fair comparison is: given the same acceptable quality, which tool produces the smallest file? So set each tool to a comparable quality target — for example, "output images at 150 DPI, visually lossless" — and record the resulting size. Comparing one tool's "max compression" against another's "recommended" is meaningless.

3. Record three numbers per file per tool

Output size (bytes).
Reduction (percentage of original — report alongside, not instead of, absolute size).
Output image DPI (or a visual side-by-side at 200% zoom) so quality is visible.

4. Watch for cheating

Does the tool strip metadata, annotations, or bookmarks to hit a number? That's a change, not just compression — record it. (If you want those stripped, that's a separate job — see Remove Metadata and Sanitize.)
Does it rasterize vector text into images? That destroys quality and accessibility while sometimes increasing size. Flag it.
Is the "compressed" file actually a re-encoded image PDF with no remaining text layer? That breaks search and copy.

5. Repeat and average

Run each tool a few times. Some tools have nondeterministic output. Average the results.

Illustrative results (example format — verify with your own run)

The table below shows the format to use when you run the method above on your own corpus. The numbers are representative ranges to demonstrate how to present findings; replace them with your measured results before publishing.

Document type	Original	Tool	Output	Reduction	Output DPI	Quality notes
Scanned (image-heavy)	12.4 MB	Tool A	1.1 MB	91%	150	Light artifacting acceptable
Scanned (image-heavy)	12.4 MB	Tool B	3.8 MB	69%	200	Visually lossless
Text-heavy	2.1 MB	Tool A	0.9 MB	57%	n/a (no images)	Font subsetting
Text-heavy	2.1 MB	Tool B	1.4 MB	33%	n/a	No subsetting
Mixed report	5.6 MB	Tool A	1.6 MB	71%	150	Charts preserved
Mixed report	5.6 MB	Tool B	2.9 MB	48%	200	Charts preserved

The point of this table is the shape of the comparison, not the numbers: same input, fixed quality target, size + DPI + quality reported together, and each document type reported separately because the winning tool can differ by type. A tool that wins on scanned documents can lose on text-heavy ones — which is exactly why a single "best compressor" headline is usually wrong.

Practical recommendations by file type

Once you know why your PDF is big, the right move is obvious:

Scanned/image-heavy document you'll read on screen: resample images to ~150 DPI and re-encode as JPEG or WebP. Big savings, acceptable quality. This is the common case and where Compress PDF spends most of its effort.
Scanned document you'll print: keep 300 DPI. Accept smaller savings to preserve legibility.
Text-heavy document: the win is font subsetting and object cleanup, not image work. Quality loss should be zero.
Mysteriously bloated file: run an optimization/cleanup pass (deduplication, remove unused resources) before anything else — you may get a big saving at no quality cost.
Document you also want to clean for sharing: pair compression with a Sanitize or Remove Metadata pass so you're not shipping author names, edit history, and hidden content along with your "compressed" file.

Frequently asked questions

What DPI should I target? For screen reading, ~144–150 DPI is plenty. For print, 300 DPI is the standard floor. Going above 300 almost never justifies the size unless the document is meant to be enlarged.

Why did my "compressed" PDF get bigger? Usually because the tool rasterized content into images, or re-encoded something that was already efficiently encoded. It can also happen if the tool added its own metadata/watermark. Inspect the output structure to find out.

Is lossless compression possible? Yes — font subsetting, object deduplication, and stream recompression are lossless. The large savings, though, almost always come from image resampling, which is lossy by nature. A "lossless" compressor will produce smaller percentage savings but identical visual output.

Does CommandPDF process my file locally during compression? Yes — the entire compression runs in your browser, so even a sensitive document you want to shrink never leaves your device. The quality/size tradeoff is yours to set.

Conclusion

Compression isn't magic and it isn't one number. It's four techniques with different tradeoffs, applied to a file whose size is driven by one of three causes. Pick the technique that matches the cause, set a quality target instead of chasing a percentage, and benchmark tools with the same inputs and fixed quality so the comparison means something. A compressor that brags only about percentage reduction is telling you what it destroyed, not what it saved.

Compress a PDF — choose your quality level, all in your browser →

Related reading:

Blog