A file format uncracked for 20 years

landaire.net

282 points by todsacerdoti 11 days ago


amiga386 - 6 hours ago

So if I understand this right:

* common.lin contains filenames, so that filename-expansion code in the game can work. But the offsets and sizes associated with the files are garbage

* <filename>.lin contains a stream of every byte read from every file, while loading the level <filename>. The stream is then compressed in 16k chunks by zlib.

* There is no indication in that stream of which real file was being read, nor the length of each read, nor what seeking was done (if any). All that metadata is gone.

* The only way to recover this metadata is to run the game code and log the exact sequence of file opens, seeks, reads.

* Alternatively, extract all that Unreal object loader code from the game and reimplement it yourself, so that you can let the contents of the stream drive the correct reading of the stream. The code should be deterministic.

This sounds pretty hellish for the game developers, and I bet the debug versions of their game _ignored_ <filename>.lin and used the real source files, but _wrote_ <filename>.lin immediately after every load... any change to the Unreal objects could alter how they were read, and if the data streamed didn't perfectly match up with what was in the real files, you'd be toast.

It reminds me of the extreme optimisation that Farbrausch did for .kkrieger -- they built a single binary, then ran and played it under instrumentation, and _any_ code path that wasn't taken was deleted from the binary to make it smaller. They forgot to take any damage in that playthrough, so all the code that applies damage to the player was deleted. Oops!

mcdeltat - 11 hours ago

> Compressing data means you save space on the disc... If you conveniently ignore the fact that common.lin is duplicated in each map's directory and is the same for every map I tested, which kinda negates part of this.

This is an interesting thing I've noticed about game dev, it seems to sometimes live in a weird space of optimisation requirements vs hackiness. Where you'll have stuff like using instruction data as audio to save space, but then forget to compile in release mode or something. Really odd juxtaposition of near-genius-level optimisation with naive inefficiency. I'm assuming it's because, while there may be strict performance requirements, the devs are under the pump and there's so much going on that silly stuff ends up happening?

LunicLynx - 15 hours ago

The Xbox had strong requirements for loading times. This is probably a linear (lin) record of how the data was loaded unoptimized from the disk. And just written to a file.

So in this file seek doesn’t do anything because seek kills the requirement of 45 sec per loading screen.

Instead the logic is as follows: check if a .lin file exists. Yes: open a handle to it and only read from it with fread, what ever currently is at the current file position. No: while reading any file write the read bytes to a .lin file in the order they are read.

This gives a highlyy optimized .lin file which can be read from the disk into memory, without creating a better dedicated loading mechanism.

So if your really would like to unpack this. The first file being read is most likely the key, as it dictates what comes next. If it is a level model, then the position of the player in it might affect which other files to load etc.

In short it’s not a file format in the classical sense, it’s a linear stream of game data.

blixt - 8 hours ago

The quirks of field values not matching expectations reminds me of a rabbit hole when I was reverse engineering the Starbound engine[1] and eventually figured out the game was using a flawed implementation of SHA-256 hashing and had to create a replica of it [2]. Originally I used Python [3] which is a really nice language for reverse engineering data formats thanks to its flexibility.

[1] Starbounded was supposed to become an editor: https://github.com/blixt/starbounded

[2] https://github.com/blixt/starbound-sha256

[3] https://github.com/blixt/py-starbound

tapia - 10 hours ago

I'm always amazed by people doing reverse engineering of some country formats. There's a binary format that I've been wanting to reverse engineer, but I don't know exactly how to start. It's for the result file of a proprietary finite element program. Could anyone point me to some resources and also what are the basics that I need to learn to achieve this?

vivzkestrel - 13 hours ago

what about splinter cell conviction, 15 yrs and nobody has figured out its map file format .unr that uses custom unreal engine 2.x. It even has a tool that lets you unpack its UMD files https://github.com/wcolding/UMDModTemplate The library on github requires this tool unumd https://www.gildor.org/smf/index.php/topic,458.msg15196.html... The same tool also works for blacklist. I would like to change the type of enemy spawned in the map but I cannot find any assistance on it. UEExplorer doesnt work because it is some kinda custom map file

Dwedit - 17 hours ago

Lowercase 'x' is always your dead giveaway that it's ZLIB.

tomaytotomato - 7 hours ago

Loved Splinter Cell

I wonder if any of the original devs will stumble upon the author's article and then remember why they did those weird file offsets.

There was a difference in the PC and Xbox versions, so it will be interesting to find out if the author sees any snippets or missing game assets in the Xbox version.

hiimkeks - 2 hours ago

> The entire content of the function is:

> retn 4

> That's it.

I've seen this before; it's a random() function.

noufalibrahim - 12 hours ago

This is an interesting post. I've been spending time on a hobby project[1] that requires reading some old archives and game asset files. I didn't have to do any reverse engineering since it's already done by others and documented on on the moddingwiki. However, I did implement the algorithms myself to work with the assets.

It's an interesting rabbit hole to go down into and this post makes me appreciate the way in which this kind of forensic analysis is done.

1: https://eye-of-the-gopher.github.io/

rawling - 18 hours ago

Weird that the one of these with no interaction got bumped rather than https://news.ycombinator.com/item?id=45842851

harrylepotter - 15 hours ago

ironically this was the game that enabled the savegame exploit with the bert and ernie fonts if i recall correctly

LEEECHIES - 3 hours ago

[dead]

suckow - 7 hours ago

[flagged]