Related: https://sive.rs/plaintext
We should be using text files for our thinking.
I understand why many file formats are some form of compressed binary. (In ‘compressed’ I include anything where software needs to ‘inflate’ data using knowledge or algorithms to convert a file into something usable.) The main reasons for this are
- Processing speed - File formats can be setup to allow faster lookup of specific items, such as having an index of objects with the location offset.
- Reduced file size via special format - this is more of a reason for older formats. Now many file formats (including MS Office) compress using public domain compression algorithms. I do not consider something like gz files to be ‘the data format’, it’s just a package.
- Complex structures - 3D files come to top of mind, or anything with multi-dimensional lookup tables.
- Proprietary formats - generally, but not always, these are because of a reason above.
The problem with non-text files
We’ve gone from saving documents as paper/scans, to WordPerfect to MS Word to Word365. For document storage we’ve gone from using file cabinets, to local zip drives, to a local network drive (with tape backups!), to various cloud services, to Sharepoint. Each time we run into new hurdles migrating our documents. The simpler the format, the easier mitigation will be.
Benefits of Text
The benefits of using text are two-fold.
First, it forces you to focus on content, not formatting. I’ve lost so much time when someone finds a new way to break MSWord styles. I’ve also spent a ton of time ‘tinkering’ with formatting that simply doesn’t matter. Text formats greatly reduce this requirement.
Text files are maximally future proof. These files will open on every foreseeable computer from a Commodore 64 to a modern phone. You can save them anywhere and open them in notepad, Word, Sharepoint, Teams, GitHub, Notion or anything else which will open text files.
Text also maximizes volume of data to volume of file.
Flavors of text
Markdown
The first version of 90% of what I write is in a Markdown format. These are just text files, but have a minimal amount of formatting similar to 1980’s BBS or Usenet. The important thing is that, having never seen a markdown file before, you can still read it and understand the format. Just as important, the format is simple enough that a parser can be written with even the most basic of coding experience (or by an LLM).
I migrate everything possible to Markdown. LLM performance lives and dies based on context, and formatting and file-type dilutes the context.
JSON / XML
Where data formats are needed and scripts are used to interface with the data, straightforward JSON and XML are still great. Importantly, these are still viewable in any text editor.
These formats should only be used if ACTUALLY required. Wherever possible, I still try to use markdown, enforcing formats via whatever interface is being used to create/edit/view the data.
HTML / CSS / JS
This is borderline, but I still consider simple webpages as text. If a compile step is needed, we are crossing the line from thinking to coding an app.
Maybe technically text….
PDFs and DOCX format Word files - Technically you can open these up and read them. Realistically most of the file formats are focused on formatting, not actually on the text.
Where doesn’t this work
In situations where our thinking relies on numbers, [[Spreadsheet are Magic]]. I’d argue that there is a thin line between ‘thinking’ about a topic and calculating relevant information. Spreadsheets let us do these concurrently.
I’m trying to use Markdown for work things too, but in those cases needing to have other people also edit is usually a roadblock. My 0.Vehicle Group Guidelines falls into this category, since others need to be able to edit.