Index ¦ Archives ¦ Atom

Files and File Formats


Files have no notion of interconnections within or externally. This then requires all file formats and file based systems to reinvent the mapping of these higher level concepts into a 'array of bytes' in a myriad different ways, resulting in an explosion of representation for a few ideas.

There is no encapsulation of meaning in a file centric world - the meaning of a file is always transmitted out of band (in a filename or mime type or human interpreted text)

Requires pre-shared deep knowledge 'array of bytes' representations (file formats)

I've written about issues specfically with the plain text file format. One incremental solution can be to produce a better file format. Before we jump into that, let's look at the framing concept of a file itself.

A file is named array of bytes - this both incredibly flexible (you can represent anything!) and incredibly low level (you must implment everything!)

File formats emerge as a means to limit the ways in which we can organize concepts into a flat array of bytes. The 'meaning' of the file is then transmitted outside the file itself.

To take a concrete example, I hand you an array of numbers (0 through 255) and tell you that this is picture from my vacation. For your system to be able to display a picture from these numbers, I also transmit a 'tag' (i.e. mime type) which identifies a well known file format. This model then works for the well known file formats, as long as they don't evolve.

If I define a new way to represent a picture in an array of bytes, my 'file format' won't just work. I'll have to transmit the format knowledge 'out of band' (i.e. supply a transcoder and make sure you install it). However there's no way for me do this in-band! Even if I have a precise, computer readable description of how to convert my array of bytes into pixels, there's no way to embed this with the array of bytes itself. This is the encapsulation problem.

© Shalabh Chaturvedi. Built using Pelican. Theme by Giulio Fidente on github.