· John Batey IV

March 14, 2026

Made a Semantic Diff page to compare text, PDF, and DOC files.

I wanted a way to compare two versions of a document without tripping over page numbers, repeated headers, or whitespace changes. Most diff tools work line-by-line on exact text, which is useless when one file came from a PDF export and the other from a Word doc.

This normalizes the content first (strips page numbers, collapses whitespace, removes repeated header/footer lines) then diffs what’s left. You can paste text in directly, drag files in, or use the file picker. PDF and DOCX extractio, and everything else, happens client-side. Nothing gets uploaded anywhere.

It also does word-level highlighting within changed lines rather than just “this line is different.”

If you want a copy, you can just save the HTML page above and run it locally.