A new high-tech catchphrase has sprung into common usage in the past year: "Big Data."
While the history of Silicon Valley is littered with soon-to-be-forgotten buzzwords, Big Data is likely to become central to 21st-century businesses, governments, societies and individuals. So it's worth investigating its benefits and potential dangers.
What is Big Data? It is the gathering of unprecedentedly large amounts of digital information generated on the Web and applying sophisticated statistical algorithms to identify new or previously unrecognized trends and predict future behavior.
The applications of Big Data range from consumer marketing (more-powerful, personalized Web ads), to practical (your car's GPS suggesting an alternate route during rush hour based on traffic projections) to public health (identifying epidemics and famines as they begin).
The growth of data globally is staggering, doubling roughly every eighteen months. Most of this data is "unstructured" -- the digital "footprints" generated by individuals on the Internet (think Twitter, Facebook, surfing trails, e-mails) or by what's being called the "Internet of Things" -- machines we use that are connected to the Internet (ATMs, smartphones, cars, refrigerators, traffic lights, electricity meters).
Of course, all data is structured in some sense. But "structured" means something specific in the world of information technology -- data that lends itself to being stored in a relational database, with columns and tables (a spreadsheet being the simplest example), with defined types of data in each column.
Think of it this way: structured information is a data zoo -- everything in it is precisely located, labeled, maintained, and monitored. Unstructured data is more like a giant nature preserve -- much bigger than the largest zoo, but with less control over any one entity. A zoo counts its lions: a nature preserve surveys them. While zoos are thousands of years old, a modern nature preserve would not be viable without modern technology -- electrified fences, radios, radio-tagged animals and so on.
Using algorithms to analyze unstructured data is not new -- for the past two decades it has been called "data mining."