Midnight Call: Crushing a 983MB PDF Down to 48MB

Midnight. Phone buzzes.

An urgent request:

"This PDF is 983MB. It needs to be smaller. Way smaller. I've given up, I can't even open the file."

The File

983 megabytes.
A document.
That needs to open on a phone.
Over mobile data.

Challenge accepted.

The Problem

The PDF was a product catalog, 329 pages thick. A mix of text, product photos, and detailed technical images. Nearly a gigabyte of pages with barely any meaningful compression.

Open it on a phone? Nearly impossible.
Send it over WhatsApp to distribute? Not a chance.

The Failed Attempts

First step was naturally to try the standard approaches.

Online PDF optimizers? Out of the question. Uploading a 983MB file to some random website is risky, and would most likely timeout halfway through anyway.

A few tools I tried:

Stirling PDF, my go-to self-hosted PDF toolkit. Tried it first. It kept erroring out. A file this massive choked the process every single time.
Ghostscript, the old reliable command-line workhorse. The file size dropped dramatically, but the quality collapsed with it. Text became blurry, photos turned to mush. For a product catalog where people need to read specs and see product details, this was unacceptable.

At this point one thing was clear: this wasn't about tweaking compression sliders.

The problem was in the document's structure. And the solution wasn't to shrink the PDF, but to tear it apart and rebuild it from scratch.

The Strategy: Don't Compress the PDF, Rebuild Its Contents

The usual approach would be converting PDF to images using pdf2image or Poppler. But this catalog wasn't big because of complex content. It was big because of:

layers upon layers of embedded fonts
complex vectors
high-resolution assets stacked on top of each other

Converting directly would just transfer all that weight into equally heavy output images.

What I actually needed was the final rendered output. Exactly what you see on screen. Already flattened. No hidden layers. No excessive vectors.

And who already does all that heavy lifting?

The PDF viewer.

The PDF viewer already composites all those layers into a single raster view at screen resolution. I just needed to capture it.

So the strategy broke down into simple steps:

Capture each page directly from the PDF viewer
Crop to just the content area
Optimize the images as aggressively as possible
Export as lightweight assets for phone screens

I wrote a Python script:

mss for super fast screenshots
OpenCV to select the crop area just once
keyboard to auto-advance pages

The workflow was simple: select the area once, hit Enter, and the script loops through every page. Screenshot, crop, save, next page. Press Esc when done.

329 pages captured in minutes.

The raw output?
300.29 MB of PNGs.

Still too big. Time for the compression war.

Round 1: The Safe Approach

"Don't break anything."

Starting conservative. Focus on preserving quality.

The strategy:

Try WebP lossy, WebP lossless, and optimized PNG
Keep whichever format produces the smallest file per image
No resizing

Key settings:

WebP quality 88
Original resolution
Auto-pick best format per page

Result:
Result 1

Total:  300.29 MB → 68.13 MB  (77.3% smaller)

68MB. Decent. But still too heavy for phones.
And the process was slow, nearly 10 minutes because it was still single-threaded.

Round 2: Getting Aggressive

"Phones don't need 4K pages."

Time to get realistic. Add downscaling and parallelism.

WebP quality 55
Max 1440px on the long edge
Light sharpen to keep text crisp
8 parallel threads

Result:
Result 2

Total:  300.29 MB → 43.57 MB  (85.5% smaller)
Time:   16.6s (19.8 images/sec)

43MB. Much more reasonable. But it felt like there was still more to squeeze.

Round 3: Maximum Squeeze

"How low can we go?"

Time to pull out every trick.

WebP quality 35
Max 1080px
Light Gaussian blur (radius 0.5) to smooth noise before compression
Color quantization down to 256 colors
14 threads

Result:
Result 3

Total:  300.29 MB → 15.22 MB  (94.9% smaller)
Time:   10.4s (31.5 images/sec)

15MB. Incredibly small.
But there was a price. Colors washed out. Product photos lost their character. For a catalog, that's a problem.

Round 4: Full Color, Same Size

"Bring back the colors. Drop the quantization."

Turns out quantization did the most visual damage, but its contribution to file size was almost zero.

So the settings became:

WebP quality 35
Max 1080px
Light Gaussian blur
Full color, no quantization

Result:
Result 4

Total:  300.29 MB → 15.16 MB  (95.0% smaller)
Time:   5.6s (59.2 images/sec)

15.16MB.
Nearly the same size, but colors came back to life.

One problem remained: small text was still a bit blurry.

Round 5: The Sweet Spot

"Bump the text detail, just a little."

Final tweak.

WebP quality 45
Max 1080px
Swapped Gaussian blur for UnsharpMask
14 threads

Final result:
Result 5

Total:  300.29 MB → 25.64 MB  (91.5% smaller)
Time:   6.2s (52.6 images/sec)

25.64MB.
Text is sharp. Photos stay natural. Comfortable to read on any phone.

One problem remained.

They needed a PDF, not a folder of images.

The Final Step: Back to PDF

"Images look great. Can you make it a PDF again?"

Yes.

The final script:

Load all 329 optimized WebP images
Bump brightness 15% because the output looked a bit dark on phone screens
Natural numeric sort so page 2 doesn't end up after page 19
Stitch into a single PDF

First run:

D:\katalog program\python>python images_to_pdf.py
Loading 329 image(s) from 'optimized5/'...
Done! Saved 'output.pdf' 47.74 MB, 329 page(s).

A small tweak to the brightness curve, run again:

D:\katalog program\python>python images_to_pdf.py
Loading 329 image(s) from 'optimized5/'...
Done! Saved 'output.pdf' 46.94 MB, 329 page(s).

46.94MB.

A proper PDF.
329 pages.
Full color.
Readable text.
Ready for phones.

The Final Score

Stage	Size	Reduction
Original PDF	983 MB	-
Raw captures (PNG)	300.29 MB	69.4%
Round 1, safe	68.13 MB	93.1%
Round 2, downscale	43.57 MB	95.6%
Round 3, maximum	15.22 MB	98.5%
Round 4, full color	15.16 MB	98.5%
Round 5, sweet spot	25.64 MB	97.4%
Final PDF	~48 MB	95.1%

From 983MB to 48MB. Cut by 95.1%.

Result

Side by side comparison. Original on the left, optimized on the right:

Original vs Optimized side by side

Can you spot the difference?
And remember, that's 95% smaller.

The PDF is slightly larger than the raw WebP folder because PDF has its own internal compression for embedded images. But 48MB for a 329-page catalog opens instantly on any phone, no hesitation.

The entire pipeline, from capture, 5 rounds of tuning, brightness correction, to PDF assembly, finishes in under 10 seconds across 14 threads.

What I Learned

Don't fight the format. If the PDF is beyond saving, tear it apart and rebuild.
WebP is a monster. Quality 45 still looks better than JPEG quality 80 at a fraction of the size.
Downscaling wins the most. Dropping to 1080px has more impact than any quality slider.
Color quantization is overrated. For photo content, barely saves space but visibly degrades quality.
UnsharpMask is far smarter. Sharpens text without making photos look crunchy.
Threading is free speed. Every image is independent. From minutes down to just seconds.
Natural sort is mandatory. page_2 must come before page_10. Never trust alphabetical sort.
Brightness matters at the end. A small boost makes pages look clean on phone screens.

The Tools

All Python. Local. No cloud.

mss for multi-monitor screenshots
Pillow for crop, resize, sharpen, brightness, and PDF assembly
OpenCV for interactive area selection
keyboard for hotkeys and auto page flip
ThreadPoolExecutor for parallelism

No paid APIs. No uploading anywhere. Just Python and a midnight deadline.

What's Next: Bringing it to the Browser

These Python scripts are powerful, but they're clearly not for everyone. The next step is to bring this pipeline to the browser.

The vision:

A tool built on WebAssembly and PDF.js
All processing local in the browser, files never leave your machine
Drag and drop, no setup
No Python, no terminal
If your browser can run, any file can be shrunk

At 1 AM, the phone buzzed again:

"Got it. Way beyond my expectation! Thank you so much."

983MB to 48MB.
Six Python scripts.
One midnight.

Worth it.