Making sense out of BIG DATA
Alright! Now we have got tonnes of information about Big
Data. Question is, how do enterprises make sense out of it? So let us explore
the various Data Analysis techniques that are either 1) most commonly used by
companies across various industries or 2) relatively new but show strong growth
potential in the near future.Through a series of posts, we will try to touch upon these
techniques. The idea is to get familiarized with the buzzwords
around Big Data.
Although there is a buzz around “Advanced Analytics” these days for Big Data analysis, researchers claim
that they are mostly built upon the fundamentals of “Business Intelligence” or “BI”
techniques, so barring all tweaks, customization and modifications at the
moment, let us grasp the basics first.
BI encompasses a
set of computer based methodologies that help analyze and report/present large
amounts of ‘structured’ or ‘unstructured’ data. Is this something new?
Apparently not, it has been used by businesses since long to support various
business related activities like decision making, predictions, number crunching
etc. Checkout this marketing video by a company called Avitas
giving an idea of BI and the prospects: http://goo.gl/blKTe
However, the context in which these techniques are being utilized
is changing - that is to analyze Big Data, which is just data after all!
Here are some known techniques under BI:
1. OLAP – Online
Analytical Processing:
A data retrieval process used for structured databases more
commonly known as Data ware houses. The major focus of this technique is to
query or retrieve and effectively combine data from multiple sources or dimensions
aggregated in a relational structure. Commonly used are the OLAP cubes, which
combine, analyze and present data from 3 different sources. A typical data
extraction would read like: - Sales of a company’s product x in region y for a
period z which has been extracted from data sets for products (x,y,z), regions
(x,y,z), periods (x,y,z).
2. Data Mining:
A methodology used to
extract patterns from large datasets by combining methods from statistics and
machine learning with database management. Examples of usage might include
mining customer data to determine segments most likely to respond to an offer,
mining human resources data to identify characteristics of most successful
employees, or market basket analysis to model the purchase behavior of
customers.
Further drilling into
this category, following are certain methods which are used independently or in
conjunction with one another to analyze data or in extension ‘Big Data’ -
- Association rule learning
A technique for
discovering interesting relationships, i.e., “association rules,” among
variables in large databases based upon a set of algorithms. One application is
market basket analysis, in which a retailer can determine which products are
frequently bought together and use this information for marketing (a commonly
cited example is the discovery that many supermarket shoppers who buy diapers
also tend to buy beer. you can refer to the Forbes article about the IBM
computing which brought about that discovery here - http://goo.gl/UNIFS
- Cluster analysis
A method for
classifying objects from diverse groups into smaller groups of ‘seemingly’
similar objects whose characteristics of similarity are not known in advance.
An example of cluster analysis is segmenting consumers into self-similar groups
based on collective group behavior
for targeted marketing. Example - recommending a customer in a movie which was
bought/liked by another customer in the same group. It is almost in contrast to
simple ‘classification’, up next!
- Classification
This method identifies categories in which new data points
belong, based on a training set containing data points that have already been
categorized based on similar traits. One application is the prediction of
segment-specific customer buying behavior where there is a clear hypothesis or
objective outcome.
Dear avid readers! Considering the heaviness of the data dose
being provided in this post, we have decided to use a common technique in
providing the most sought after information effectively – (No it’s not related to Big Data!)
It’s simply called providing a 'sequel'. So keep visiting to find the next one
soon where we will talk a bit more about some other basic techniques and
introduce the latest trends like Hadoop, Mashup, MapReduce in managing BIG DATA
.…
Sources and references for detailed report and materials: