Big Updates on BIG DATA: June 2012

Thursday, June 14, 2012

Big Data Startups Making for an Easier Commute

Many emerging Big Data start ups are smaller B2B solutions providers that are not in the headlines, and they may never become mainstream names like Splunk. In the recent Wall Street Journal article, ¨Tapping 'Big Data' to Fill Potholes,¨ several of these smaller startups are mentioned with a theme to help drivers to avoid traffic issues. Intrix Inc. has turned its data analysis into a viable commercial businesses by generating revenue from the state of New Jersey and has plenty more highways in the world it can potentially expand to. According to the article, Inrix and, “The New Jersey center offers a glimpse at the power of "big data," a term for techniques to gather reams of computerized information points, analyze them and spit out patterns, often in easy-to-understand visuals like maps or charts.”

In addition to traffic authorities having better information to deal with traffic concerns, Google maps and navigation systems are telling more and more every day to consumers about travel conveniences. Both mobile phone applications as will as in car services such as OnStar make this possible. These companies are using both live update information as well as historic traffic pattern data to predict congestion and travel time.

INRIX Inc. is not only getting involved with helping states to improve their traffic situation, they have also recently been selected by BMW to improve navigation and fuel economy efforts. This is a great opportunity for them and we will keep you posted on progress on their partnership.

In addition to Inrix, both RAC & Waze have interesting related stories:

RAC - Over in the UK the RAC uses vehicle data to identify congestion situations. This insurance based firm has a business model that is designed to utilize navigation and data from vehicles to provide additional value for Breakdown Coverage services.

WAZE - Another startup, called Waze Inc. concentrates on mobile applications catered towards navigation and traffic patterns. In fact they tell you the optimal times to travel for holiday weekends! Check it out for your next vacation!

These are just a couple of the business models that are looking to establish commercial businesses of traffic and navigation. If you are interested in other start ups leveraging Big Data, another great site called Beautiful Data recently came out with a list of Top 10 hot big data start ups that is worth taking a look at! Let us know if you know of any other interesting Big Data efforts we should continue to keep an eye on!

Friday, June 8, 2012

Big Data Analytics - Techniques and Trends - continued..

Welcome back! So we continue to understand some more techniques and trends to analyze Big Data. Our idea is not for you to become experts in all of these, but hopefully to be able to germinate the seed of inquisitiveness in your mind and simultaneously touch upon the most prevalent concepts.

A couple of more widely used techniques trying to utilize Big Data potential:

Sentiment Analysis: A technique to identify and extract subjective information from source text material. Key aspects of these analyses include identifying the feature, aspect, or product about which a sentiment is being expressed, and determining the type, “polarity” (i.e., positive, negative, or neutral) and the degree and strength of the sentiment. Examples of applications include companies applying sentiment analysis to analyze social media (e.g., blogs, microblogs, and social networks) to determine how different customer segments and stakeholders are reacting to their products and actions.

Predictive Analysis: A set of techniques in which a mathematical model is created or chosen to best predict the probability of an outcome. It deals with extracting information from data and using it to predict future trends and behavior patterns. The core of predictive analytics relies on capturing relationships between explanatory variables and the predicted variables from past occurrences, and exploiting it to predict future outcomes. An example of an application in customer relationship management is the use of predictive models to estimate the likelihood that a customer will “churn” (i.e., change providers) or the likelihood that a customer can be cross-sold another product. This is used in conjunction with some earlier described data analyzing techniques like data mining. Following video is sweet and short illustration by a Predictive Analytics company http://goo.gl/9k0sP

Now we look at some buzz words regarding Big Data Analytics as promised before, there are a growing number of technologies used to aggregate, manipulate, manage, and analyze Big Data, most of them are based on Distributed Computing platform, which is:

- Massive parallel computing where a problem is divided into multiple tasks, each of which is solved by one or more computers working in parallel.

Here are some trendy technologies:

MapReduce: A software framework introduced by Google for processing huge data sets on certain kinds of problems on a distributed system. Check out this nice online presentation for a simple understanding http://goo.gl/Qz5PP

Mashup: An application that uses and combines data presentation or functionality from two or more sources to create new services. These applications are often made available on the Web, and frequently use data accessed through open application programming interfaces or from open data sources.

Hadoop: An open source (free) software framework for processing huge data sets on certain kinds of problems on a distributed system. Its development was inspired by Google’s MapReduce and Google File System. It was originally developed at Yahoo! and is now managed as a project of the Apache Software Foundation.

Although the scope of this genre of technologies is very vast and hard to bring under the purview of this post, nevertheless, we tried to make you familiar with the basic concepts. Do let us know your views, see you soon …..

References:

McKinsey report: http://goo.gl/ycvef

TDWI library reports: www.Tdwi.org

Wikipedia

Friday, June 1, 2012

Big Data Analytics - Techniques and Trends …

Making sense out of BIG DATA

Alright! Now we have got tonnes of information about Big Data. Question is, how do enterprises make sense out of it? So let us explore the various Data Analysis techniques that are either 1) most commonly used by companies across various industries or 2) relatively new but show strong growth potential in the near future.Through a series of posts, we will try to touch upon these techniques. The idea is to get familiarized with the buzzwords around Big Data.

Although there is a buzz around “Advanced Analytics” these days for Big Data analysis, researchers claim that they are mostly built upon the fundamentals of “Business Intelligence” or “BI” techniques, so barring all tweaks, customization and modifications at the moment, let us grasp the basics first.

BI encompasses a set of computer based methodologies that help analyze and report/present large amounts of ‘structured’ or ‘unstructured’ data. Is this something new? Apparently not, it has been used by businesses since long to support various business related activities like decision making, predictions, number crunching etc. Checkout this marketing video by a company called Avitas giving an idea of BI and the prospects: http://goo.gl/blKTe

However, the context in which these techniques are being utilized is changing - that is to analyze Big Data, which is just data after all!

Here are some known techniques under BI:

1. OLAP – Online Analytical Processing:

A data retrieval process used for structured databases more commonly known as Data ware houses. The major focus of this technique is to query or retrieve and effectively combine data from multiple sources or dimensions aggregated in a relational structure. Commonly used are the OLAP cubes, which combine, analyze and present data from 3 different sources. A typical data extraction would read like: - Sales of a company’s product x in region y for a period z which has been extracted from data sets for products (x,y,z), regions (x,y,z), periods (x,y,z).

2. Data Mining:

A methodology used to extract patterns from large datasets by combining methods from statistics and machine learning with database management. Examples of usage might include mining customer data to determine segments most likely to respond to an offer, mining human resources data to identify characteristics of most successful employees, or market basket analysis to model the purchase behavior of customers.

Further drilling into this category, following are certain methods which are used independently or in conjunction with one another to analyze data or in extension ‘Big Data’ -

- Association rule learning

A technique for discovering interesting relationships, i.e., “association rules,” among variables in large databases based upon a set of algorithms. One application is market basket analysis, in which a retailer can determine which products are frequently bought together and use this information for marketing (a commonly cited example is the discovery that many supermarket shoppers who buy diapers also tend to buy beer. you can refer to the Forbes article about the IBM computing which brought about that discovery here - http://goo.gl/UNIFS

- Cluster analysis

A method for classifying objects from diverse groups into smaller groups of ‘seemingly’ similar objects whose characteristics of similarity are not known in advance. An example of cluster analysis is segmenting consumers into self-similar groups based on collective group behavior for targeted marketing. Example - recommending a customer in a movie which was bought/liked by another customer in the same group. It is almost in contrast to simple ‘classification’, up next!

- Classification

This method identifies categories in which new data points belong, based on a training set containing data points that have already been categorized based on similar traits. One application is the prediction of segment-specific customer buying behavior where there is a clear hypothesis or objective outcome.

Dear avid readers! Considering the heaviness of the data dose being provided in this post, we have decided to use a common technique in providing the most sought after information effectively – (No it’s not related to Big Data!) It’s simply called providing a 'sequel'. So keep visiting to find the next one soon where we will talk a bit more about some other basic techniques and introduce the latest trends like Hadoop, Mashup, MapReduce in managing BIG DATA .…

Sources and references for detailed report and materials:

McKinsey report: http://goo.gl/ycvef

TDWI library reports on BigData: www.Tdwi.org