Saturday, April 21, 2012


What actually is Big Data?

New technology and innovation often bring about new terminologies. With Big Data, this is exactly the case. But what does Big Data really mean?

It appears that so far there is no standard definition for the term Big Data. A search reveals that various explanations have evolved over time.

       In 2009, Adam Jacobs described Big Data as “Data whose size forces us to look beyond the tried-and-true methods that are prevalent at that time” in his interesting article “The pathologies of Big Data” (http://queue.acm.org/detail.cfm?id=1563874) Jacobs argues that getting stuff into databases is easy, but getting it out (in a useful form) is hard; the bottleneck lies in the analysis rather than the raw data manipulation.
       In 2011, IBM, which has the Big already in its nickname "Big Blue" in turn focuses on the three V’s on its definition of Big Data
  • Volume – Big Data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information.
  • Velocity – Often times-sensitive, Big Data must be used as it is streaming into the enterprise in order to maximize its value to the business.
  • Variety – Big Data extends beyond structured data, including unstructured data of all varieties: text, audio, video, click streams, log files and more. (http://www-01.ibm.com/software/data/bigdata/)
IBM is one of the pioneers of bringing Big Data analyses to their customers. I highly recommend taking a look at their eBook titled “Understanding Big Data”

       Recently, the McKinsey Global Institute, the research arm of McKinsey and Company pointed out that no specific threshold can be set for amounts of data to be accounted for as Big Data by saying: “Big Data” refers to data sets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. This definition is intentionally subjective and incorporates a moving definition of how big a data sets needs to be in order to be considered as Big Data - i.e., we don’t define Big Data in terms of being larger than a certain number of terabytes (thousands of gigabytes). We assume that, as technology advances over time, the size of data sets that qualify as Big Data will also increase. Also note that the definition can vary by sector, depending on what kinds of software tools are commonly available and what sizes of data sets are common in a particular industry. With those caveats, Big Data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes). The consultancy also provides insides into the financial opportunities associated with the topic. Check out their report.

What do all these definitions have in common? They highlight that existing approaches to collecting, handling and analyzing data no longer help companies to gain a competitive advantage. In contrast, new approaches are needed to take into account the exponential speed of change. It seems that Big Data calls for
     a) Radical thinking
         and
     b) Willingness to deal with uncertainty

We will investigate these points further and keep you posted!



No comments:

Post a Comment