Question 1

Why is my output empty?

Accepted Answer

The PDF probably contains scanned images instead of real text. PDF.js can only extract text that exists as text in the file. For scanned PDFs you need OCR (Optical Character Recognition), which this tool does not include.

Question 2

Why do tables come out garbled?

Accepted Answer

PDFs do not record table structure; they record positioned text runs. Even Adobe Acrobat struggles with tables. The "Preserve lines" mode helps but is not a full table parser.

Question 3

Will it work on large PDFs?

Accepted Answer

Yes, up to roughly 200-300 pages or about 50 MB before browser memory becomes a bottleneck. Parsing happens page by page so you see progress incrementally.

Question 4

Is the file uploaded?

Accepted Answer

No. The PDF is read into your browser memory and parsed by PDF.js. The library worker is loaded from a CDN once, then all parsing is local. Verify in DevTools Network tab.

Question 5

Can it extract images or annotations?

Accepted Answer

No, this tool only extracts text. For pages-as-images use the PDF to Images tool.

Free PDF to Text Extractor

How to use

FAQ

More in Productivity