Made a Semantic Diff page to compare text, PDF, and DOC files.
I wanted a way to compare two versions of a document without tripping over page numbers, repeated headers, or whitespace changes. Most diff tools work line-by-line on exact text, which is useless when one file came from a PDF export and the other from a Word doc.
This normalizes the content first (strips page numbers, collapses whitespace, removes repeated header/footer lines) then diffs what’s left. You can paste text in directly, drag files in, or use the file picker. PDF and DOCX extractio, and everything else, happens client-side. Nothing gets uploaded anywhere.
It also does word-level highlighting within changed lines rather than just “this line is different.”
If you want a copy, you can just save the HTML page above and run it locally.