The National Security Agency is, by nature, an extreme example of the e-hoarder. And as the governmental organization responsible for things like, say, gathering intelligence on such Persons of Interest as Osama bin Laden, that impulse makes sense--though once you hear the specifics, it still seems pretty incredible. In a story about the bin Laden mission, the NSA very casually dropped a number: Every six hours, the agency collects as much data as is stored in the entire Library of Congress.
That data includes transcripts of phone calls and in-house discussions, video and audio surveillance, and a massive amount of photography. "The volume of data they're pulling in is huge," said John V. Parachini, director of the Intelligence Policy Center at RAND. "One criticism we might make of our [intelligence] community is that we're collection-obsessed — we pull in everything — and we don't spend enough time or money to try and understand what do we have and how can we act upon it."
NSA's budget is not disclosed by law, but we'd imagine it would awfully expensive and difficult to even listen to such vast quantities of data, let alone analyze it intelligently. They mostly listen for keywords now--bits that don't make sense (and thus might be code), certain red-flag words (like, well, "bomb," which seems kind of unsubtle but I guess we're talking about terrorists here and of course it's possible there are intricacies of language that are missing in translation), and any conversation between principals like bin Laden. Still, next time you're aghast at how much space the entire series of Blue Planet takes on your hard drive, just be glad you're not the NSA.
How much data is that in tera--or peta, as the case may be--bytes?
Another case of government waste. How can they possibly uses or analyze the contents of the library of congress every 6 hours.
How much traditional information is there?
The 20-terabyte size of the Library of Congress is widely quoted and as far as I know is derived by assuming that LC has 20 million books and each requires 1 MB. Of course, LC has much other stuff besides printed text, and this other stuff would take much more space.
Thirteen million photographs, even if compressed to a 1 MB JPG each, would be 13 terabytes.
The 4 million maps in the Geography Division might scan to 200 TB.
LC has over five hundred thousand movies; at 1 GB each they would be 500 terabytes (most are not full-length color features).
Bulkiest might be the 3.5 million sound recordings, which at one audio CD each, would be almost 2,000 TB.
This makes the total size of the Library perhaps about 3 petabytes (3,000 terabytes).
As search algorithms improve they will be able to churn through the data faster and come up with accurate answers. Plus hdd space only increases. For sure in 10 years all this data will seem not that much compared with present hdds.
Deleting stuff also posts risks as the data will be forever lost.
I'm going to guess that they process, rather than store this amount of data every six hours.
One source on the internet stated that the Library of Congress's PRINTED selection is ~10TB of data (at the date of the posting - no doubt larger now). 10TB is really not that much if you are simply processing it through hardware filters, for example the government's Carnivore engine. I can totally see this being the case if it was audio and video processing (as mentioned above - looking for spoken keywords, and no doubt facial recognition)... I'm sure some of the "hits" are stored for further analysis but I can't fathom all that data being stored for very long. I'd really hate to be that Backup Admin :)
some people have touched upon it, but file size here is key.
one person said each book in the library of congress is compressed to 1mb. Well I can take a full Super HD 10 sec video that is 1 terabyte uncompressed. well... so that is 10 seconds and that is equal to 1\20 of the ENTIRE library.
how long would it take to analyze that? depends, but not that long. We really have ZERO ZERO ZERO context! to say this is e-hoarding, or a waste is not a valid statement. we have no idea how much data we are talking about. let me ask you. How big is a 1 hour video. the answers are infite. a 1 hour video could be 10kb or 10 terabytes.
Never mind the comparisons. It's just extremely sad that there's a need for an effort like this.
they are really in nead of a true AI to sift through all this and not miss anything
Redundant! Completely redundant! Can't learn after thousands of years!