30th January 2004 · Last updated: 21st December 2011
I realised recently that all web content is doomed. When the paid hosting runs out, the content will simply disappear. A typical domain name is registered for only 1 or 2 years. If the owner doesn't re-register it, the provider will delete everything. If the user hasn't kept a backup, that content will have vanished. I always back up regularly because I had a case where the server for my site went down. I had a forum running at the time but hadn't bothered to back it up for a few days. When my site came back up, all recent files were missing! The company I use had clearly had to resort to a previous backup disk, which meant my forum was missing new posts that had occured since then. What was worse is that I also lost a new member, who had just registered.
Even if you keep an up-to-date backup of your website though, what happens when you die? Let's say the website isn't paid for any more. Eventually it will be removed because you haven't paid to keep it going. No company will keep hosting it until you decide to pay later. So what happens then? You've lost your site. Now it might be that your actual domain name is still paid for, and can be hosted elsewhere. But even that will expire one day. So your entire web presence is gone. Unless you buy the domain name back, along with fresh web hosting for your files, you may as well have never been on the web.
Yes, there are caches of internet pages. But these are unlikely to last long, unless you've been lucky enough to be archived at somewhere like the Wayback Machine. But there too though they could run out of funding, the servers could break down, a lightning strike or fire could destroy all backups.
My point is that the web isn't like a book. Books too can be lost, but we still have books written hundreds of years ago to read today. They get reprinted - but who will reprint your website? Yet websites aren't like books - they are dynamic. You click on links. Images animate. Menus slide in and out of view. How can you preserve that?
The next point is that technology is constantly changing. What worked once on Windows 3.11 or in DOS no longer works on Windows XP. Even if a program lasted forever, the hardware to run it most likely would not. Unless the program can be converted, or an old machine found on which to run it, the program is effectively dead. And who will be running old operating systems in a decade's time?
I've been encouraged by online Spectrum 48K and Commodore 64 emulators recently. Because those computers used fairly basic processors, people have been able to replicate their exact workings enabling ancient games to be replayed on a PC. Amazingly, they work just like the real thing. I found joy replaying Boulderdash and Impossible Mission (see actual size screenshot!). Though games like Ant Attack (a 3D isometric breakthrough at the time) seem hopelessly rough looking and tricky to play by the standard of current PC games like Max Payne 2. But it's a superb example of restoring programs I had considered gone for good. If only someone could speed up the 3D in The Sentinel (see actual size screenshot)... surely it should be instant on a 1.7GHz processor?
Websites like Praystation have gotten round the problem of disappearing web pages by releasing CDs of their content. This is a clever and effective way to archive your site. However, how are you going to sell it without another web page? And if that expires...
I don't know what the solution is. Maybe there isn't one. An idea might be to do something many of us are doing anyway, which is simply to keep converting our code over time. So when new browsers are released, the code can be made to work in them. Often only subtle tweaks are required. Imagine though how many early websites, released when HTML was new and Netscape 1 was the prominent browser, have been rewritten again and again to display properly today. No wait, many haven't! Yet through the backwards-compatibility built into HTML and current browsers, these pages can still display properly today (more or less). Even if they don't, we can always view the page source and see the text from there.
Now the next hurdle to this could well be XHTML 2. Why? Because it isn't designed to be backwards-compatible. It introduced new tags that old browsers won't know what to do with. Again, you should still be able to read the text, it just won't be presented correctly.
Another way to 'future-proof' your site may lie in converting all your content to XML. This way you or someone else can write a program to convert it to any format you like. Because XML is pure data, it can be processed first and output in ways to fit any number of devices. If internet toasters finally hit the big time, your XML site is safe, so long as you can change the data to display in the new format. Browsers like Opera 7 are able to reshape web pages for small and medium sized screens. This saves you the job, but the results will rely on the browser. It's better if you can control the output yourself. But can you predict what new devices and screen sizes plus resolutions will emerge in the future? Each one could mean a new look for your site. Future screens could be so high in resolution that existing pages sit in the corner of the screen, much like the Commodore 64 emulator works. I once played those games on a TV set, filling the screen. Now they run in a tiny window on my desktop, such is the higher resolution of a typical PC.
But even if you used XML and converted it on-the-fly for your site to fit an infinite range of resolutions and devices, who's to say XML 2 won't take over one day and your code no longer work, as old XML parsers are abandoned for new ones? Look a hundred years into the future and will we even be using XML? Will there even be an internet as we know it? Will any of the text and photos on your blog be of any relevance to anyone at all? If the future is a direct connection to the brain, giving virtual reality indistinguishable from real life, who'll want to read about your new pet fish? Perhaps then, no amount of future-proofing will matter - people will have gladly deleted all your pages.