Mounting tar archives as a filesystem in WebAssembly

jeroen.github.io

125 points by datajeroen a day ago

Only peripherally relevant, but also see Ratarmount: https://github.com/mxmlnkn/ratarmount

It lets you mount .tar files as a read only filesystem.

It’s cool because you basically get random access to the tarball without paying any decompression costs. (It builds an index saying exactly where so-and-so is for every file.)

crabique - a day ago

Very cool, I wish there were something similar to this for filesystem images though.

Just recently I needed to somehow generate a .tar.gz from a .raw ext4 image and, surprisingly, there's still no better option than actually mounting it and then creating an archive.

I managed to "isolate" it a bit with guestfish's tar-out, but still it's pretty slow as it needs to seek around the image (in my case over NBD) to get the actual files.

tredre3 - 20 hours ago

There are surprisingly few tools to work on file system images in the Linux world, they expect loopback mounting to always be available.
There are a few libraries to read ext4 but every time I've tried to use one it missed one feature that my specific image was using (mke2fs changes its defaults every couple years to rely on newer ext4 features).
7-zip can also read ext4 to some degree and, I'm not sure but, they seem to have written a naive parser of their own to do it: https://github.com/mcmilk/7-Zip/blob/master/CPP/7zip/Archive...

Ecco - a day ago

How about using a format that has actually been designed to be a compressed read-only filesystem? Something like a SquashFS or cramfs disk image?

johannes1234321 - a day ago

When looking at established file formats, I'd start with zip for that usecase over tarballs. zip has compression and ability to access any file. A tarfule you have to uncompress first.
SquashFS or cramps or such have less tooling, which makes the usage for generating, inspecting, ... more complex.
- nrclark - a day ago
  
  You only have to decompress it first if it's compressed (commonly using gzip, which is shown with the .gz suffix).
  Otherwise, you can randomly access any file in a .tar as long as: - the file is seekable/range-addressible - you scan through it and build the file index first, either at runtime or in advance.
  Uncompressed .tar is a reasonable choice for this application because the tools to read/write tar files are very standard, the file format is simple and well-documented, and it incurs no computational overhead.
  - electroly - a day ago
    
    You've just constructed your own crappy in-memory zip file, here. If you have to build your own custom index, you're no longer using the standard tools. If you find yourself building indices of tar files, and you control the creation, give yourself a break and use a zip file instead. It has the index built in. Compression is not required when packing files into a zip, if you don't want it.
    
    marginalia_nu - a day ago
    
    Yeah it's pretty common to use zip files as purely a container format, with no compression enabled. You can even construct them in such a way it's possible to memory map the contents directly out of the zip file, or read them over network via a small number of range requsts.
  - johannes1234321 - a day ago
    
    > Uncompressed .tar is a reasonable choice for this application
    Yes, uncompressed tar (with transfer compression, which is offered in HTTP) is an option for some amount of data.
    Till the point where it isn't. zip has similar benefits as tar(+transfer compression) but a later point where it fails for such a scenario.
    
    chungy - a day ago
    
    Zip allows you to set compression algorithm on a per-file basis, including no compression.
    
    QuantumNomad_ - a day ago
    
    You can achieve the same with tar if you individually compress the files before adding them to the tar ball instead of compressing the tar ball itself.
    I don’t see how that plus a small index of offsets would be notably more or less work to do from using a zip file.
    
    chungy - 21 hours ago
    
    Zip has a central directory you could just query, instead of having to construct one in-memory by scanning the entire archive. That's significantly less work.
    
    QuantumNomad_ - 19 hours ago
    
    I mean if they include a pre-made index with it. For example an uncompressed index at byte offset 0 in the tar ball that lists what is inside and their offsets. It would still be comparable amount of work to create software to do that with tar as to use a zip file, if fine grained compression levels etc is being used.
    
    johannes1234321 - 16 hours ago
    
    But then you are not using tar, you are doing your own file format atop of tar.