Reverse engineering efforts, like it was done for old "office" file formats by the LibreOffice team. Or the same by various free *CADs teams. There are even tools for helping reverse engineering file formats like Kaitai .
Was looking good until the penultimate paragraph where it says that this tool will not be available to the general public. Would have been nice to have had that up front so that I could have stopped reading right there.
If the physical media are indeed floppies and CDs, these guys will need to take into consideration high share of corrupted data. I know, I know, this is not what the project is about, but without it, the outcome will be more theoretical than practical.
I remember having trying to read a decade-old stash of floppies a few years ago. Too many were unusable. CDs are much better but also relatively unreliable. My USB drives fared much better, even the 64 Mb one with the kangaroo logo still works.
I have a PDP/11-23 in my lab right now. It contains the data and code my former advisor used to get his tenure-track job, as well as an interface to an electrophysiology recording rig. Takes up a good part of the room.
I have no idea where to put this thing, but we virtualized the entire thing years ago and spin up the VM when required.
> Each new release tends to only support files created within the last two versions
Ugh, I hate software that does this. If you're creating an obscure proprietary format for your files, the least you can do is support it.
This is a good introduction to some of the issues facing academic libraries these days. For a more in-depth look at what strategies librarians are using to provide access to older operating systems / formats / user experiences, I'd recommend taking a gander at "Emulation & Virtualization as Preservation Strategies" by David S. H. Rosenthal:
Laudable idea, but there's just so much variety of old platforms, file formats, and backup formats. It's hard to guess what might be highest priority. Maybe "Universal" just sounds a little strong to me.
Just covering every popular RISC/Unix platform would be daunting. Ever hear of Pyramid Osx? It was once popular. That's skipping over mainframes (not just IBM either, see https://en.m.wikipedia.org/wiki/BUNCH), mid-range (os/400, mpe, VMS, Tandem), OS/2, BeOS, embedded platforms, and much, much, more.
I've worked on this problem in the past for modernization and migration away from legacy languages to modern ones. You need to write a parser for the source language, but that is a one-time upfront cost. The parser should populate an intermediate language agnostic model and then you can write as many generators as you require to translate that agnostic model into your target language of choice.
We had one-click solutions for COBOL to C, Java to C#, VB to Angular, etc. Once you have the parsers and generators the work for each project after that is minimal.
File formats are fascinating: https://github.com/corkami/pics/blob/master/binary/README.md
Open some files in a hex editor and have a look. Run "strings". Investigate and learn.
When I was younger I used to play BBS games. Things like LORD, ArrowBridge, Usurper et al. It would be cool to get those running again as a website.