A new high-tech catchphrase has sprung into common usage in the past year: "Big Data."
While the history of Silicon Valley is littered with soon-to-be-forgotten buzzwords, Big Data is likely to become central to 21st-century businesses, governments, societies and individuals. So it's worth investigating its benefits and potential dangers.
What is Big Data? It is the gathering of unprecedentedly large amounts of digital information generated on the Web and applying sophisticated statistical algorithms to identify new or previously unrecognized trends and predict future behavior.
The applications of Big Data range from consumer marketing (more-powerful, personalized Web ads), to practical (your car's GPS suggesting an alternate route during rush hour based on traffic projections) to public health (identifying epidemics and famines as they begin).
The growth of data globally is staggering, doubling roughly every eighteen months. Most of this data is "unstructured" -- the digital "footprints" generated by individuals on the Internet (think Twitter, Facebook, surfing trails, e-mails) or by what's being called the "Internet of Things" -- machines we use that are connected to the Internet (ATMs, smartphones, cars, refrigerators, traffic lights, electricity meters).
Of course, all data is structured in some sense. But "structured" means something specific in the world of information technology -- data that lends itself to being stored in a relational database, with columns and tables (a spreadsheet being the simplest example), with defined types of data in each column.
Think of it this way: structured information is a data zoo -- everything in it is precisely located, labeled, maintained, and monitored. Unstructured data is more like a giant nature preserve -- much bigger than the largest zoo, but with less control over any one entity. A zoo counts its lions: a nature preserve surveys them. While zoos are thousands of years old, a modern nature preserve would not be viable without modern technology -- electrified fences, radios, radio-tagged animals and so on.
Using algorithms to analyze unstructured data is not new -- for the past two decades it has been called "data mining."
But until recently, data mining was a pricey endeavor, with data storage and access costs prohibitively high for most industries outside of financial services.
Big Data employs a new generation of technologies designed to efficiently and inexpensively manage these previously low-value data sources. For example, Hadoop, a leading "open source" platform for storing and manipulating giant data sets, is rooted in work done by the leading Internet companies, particularly Google and Yahoo, as they strove to manage their ever-growing volumes of data.
Here are my takeaways on Big Data:
Big Data is here to stay -- All the ingredients already are present: Massive amounts of information being generated as a result of our other activities on the Web; the inexpensive tools, and the "quants" who seek to leverage their algorithmic sophistication. It will very likely become foundational in consumer marketing, political campaigns and other social-science-driven industries.
Big Data will be clever, not wise -- Will Big Data usher in an age of objective truth? Not likely. Like all computer applications, Big Data mimics human wisdom through brute-force calculation. In other words, it is most useful when asked to do more efficiently something people already know how to do.
Big Data will raise serious privacy issues in democracies -- As people spend more of their lives in cyberspace, they may well become uneasy with sacrificing their privacy for access to free Internet applications.
Big Data may be used as a tool of oppression in authoritarian societies -- the information innovations of the past 50 years (copier, fax, PC, Internet, smartphone, Twitter) made top-down government control of a society incredibly difficult. Big Data has the unfortunate potential to reverse that trend. Indeed, the ability to feed every sort of electronic data into giant databases and seek out rebellious patterns may support dictatorships in resisting the trend toward democracy.
Ultimately, Big Data can be thought of as the Internet equivalent of "fracking," the controversial revolution in oil and natural gas drilling. Fracking's new technologies, such as horizontal drilling, make it possible to tap previously inaccessible natural gas deposits trapped in rocks miles beneath the surface. Thoughtlessly implemented, it also has the potential to poison groundwater, trigger earthquakes and accelerate global warming.
Like fracking, Big Data will inevitably be used, and abused. Society will struggle to find a balance.