Friday, June 8, 2012

Big Data Analytics - Techniques and Trends - continued..


Welcome back! So we continue to understand some more techniques and trends to analyze Big Data. Our idea is not for you to become experts in all of these, but hopefully to be able to germinate the seed of inquisitiveness in your mind and simultaneously touch upon the most prevalent concepts.

A couple of more widely used techniques trying to utilize Big Data potential:

Sentiment Analysis:  A technique to identify and extract subjective information from source text material. Key aspects of these analyses include identifying the feature, aspect, or product about which a sentiment is being expressed, and determining the type, “polarity” (i.e., positive, negative, or neutral) and the degree and strength of the sentiment. Examples of applications include companies applying sentiment analysis to analyze social media (e.g., blogs, microblogs, and social networks) to determine how different customer segments and stakeholders are reacting to their products and actions.

Predictive Analysis: A set of techniques in which a mathematical model is created or chosen to best predict the probability of an outcome. It deals with extracting information from data and using it to predict future trends and behavior patterns. The core of predictive analytics relies on capturing relationships between explanatory variables and the predicted variables from past occurrences, and exploiting it to predict future outcomes. An example of an application in customer relationship management is the use of predictive models to estimate the likelihood that a customer will “churn” (i.e., change providers) or the likelihood that a customer can be cross-sold another product. This is used in conjunction with some earlier described data analyzing techniques like data mining. Following video is sweet and short illustration by a Predictive Analytics company http://goo.gl/9k0sP


Now we look at some buzz words regarding Big Data Analytics as promised before, there are a growing number of technologies used to aggregate, manipulate, manage, and analyze Big Data, most of them are based on Distributed Computing platform, which is:

 - Massive parallel computing where a problem is divided into multiple tasks, each of which is solved by one or more computers working in parallel.

Here are some trendy technologies:

MapReduce: A software framework introduced by Google for processing huge data sets on certain kinds of problems on a distributed system. Check out this nice online presentation for a simple understanding http://goo.gl/Qz5PP

Mashup: An application that uses and combines data presentation or functionality from two or more sources to create new services. These applications are often made available on the Web, and frequently use data accessed through open application programming interfaces or from open data sources.

Hadoop: An open source (free) software framework for processing huge data sets on certain kinds of problems on a distributed system. Its development was inspired by Google’s MapReduce and Google File System. It was originally developed at Yahoo! and is now managed as a project of the Apache Software Foundation.

Although the scope of this genre of technologies is very vast and hard to bring under the purview of this post, nevertheless, we tried to make you familiar with the basic concepts. Do let us know your views, see you soon …..

References:
McKinsey report: http://goo.gl/ycvef
TDWI library reports: www.Tdwi.org
Wikipedia


1 comment: