O'ReillyNet: How the Wayback Machine Works. A great interview with Brewster Kahle about what they're doing at the Internet Archive and how they're doing it:
In the Wayback Machine, currently there are 10 billion Web pages, collected over five years. That amounts to 100 terabytes, which is 100 million megabytes. So if a book is a megabyte, which is about what it is, and the Library of Congress has 20 million books, that's 20 terabytes. This is 100 terabytes. At that size, this is the largest database ever built. It's larger than Walmart's, American Express', the IRS. It's the largest database ever built. And it's receiving queries -- because every page request when people are surfing around is a query to this database -- at the rate of 200 queries per second. It's a fairly fast database engine. And it's built on commodity PCs, so we can do this cost-effectively. It's just using clusters of Linux machines and FreeBSD machines.