The library will copy material from the web and this includes news websites, open access journals and books, and is entitled to archive material from behind paywalls and login facilities. This could eventually build a database holding every public Tweet or Facebook page (but not those in privacy settings).
Under the project, the library will harvest the web once a year and as the technology is refined, the archiving process will become fast and comprehensive.
The library expects to capture up to 100 terabytes of data annually in the first few years. The archive will be open to the public from January 2014.
"If you want a picture of what life is like today in the UK you have to look at the web," says project leader Lucie Burgess.
A three-month operation to harvest an initial 4.8 million websites -- or one billion web pages -- began today.
The UK has initiated new regulations which now allow a small number of librairies to hold digital content without seeking copyright clearance. From today, the new regulations apply to any digital publication or website created in the UK. Offline digital works, including DVDs, must be deposited at the British Library within a month of publication.
The scale of the project can be gauged from the fact that it took over 300 years to collect 750 million pages from printed newspapers and now the British Library aims to collect 1 billion webpages in a year.