The Big Data Cheat Sheet
posted by John Spacey, November 21, 2012Definition: What is big data?
Big data is a dataset that's so large that it can't be managed with traditional information technology tools.
The term big data describes business processes that capture, process, store, search, share, analyze or visualize large datasets.
Big data business processes have fed demand for technologies that are specifically designed to handle large data sets. As a result, big data can also be defined as a technology:
The term big data describes technologies built to efficiently process large, complex datasets.
How big is big?
You're probably wondering: how big is big data?
Unfortunately, there's isn't a precise answer to this question. Big data is any dataset that's large enough to push the limits of standard technologies. In other words, you need specialized technologies to tackle big data.
What's considered big data today, won't be considered big data tomorrow.
The world's capacity to capture, store, transport and process data has grown rapidly ever since the early 1960s. Over the same time period, new business drivers have doubled the amount of information the average business stores every 1.2 years.
Yeah, but how big is it?
When people speak of big data they're usually taking about data in the petabyte (1 million gigabyte) to exabyte (1 billion gigabyte) range.In some cases, a terabyte can be considered big data when there's a requirement to process the data in a short time interval. For example, you may need big data technologies to process a terabyte of data in a matter of minutes.
The Laws of Big Data
Big data is often explained in terms of three laws:
1. Volume
Big data is characterized by large volumes of data. The challenge is to reduce large volumes of data to business results. For example, how can a business use social profiles and click-streams to improve online sales.
2. Variety
Big data is often captured from a wide variety of sources and stored in a variety of formats.
For example, let's say a grocery chain wanted to understand shopper behavior to improve sales. They might collect and analyze video, audio, sensor data, user profiles and point of sale transactions.
3. Velocity
Organizations have dealt with petabytes of data for a long time. The big difference with big data is data velocity — the speed by which data must be captured, stored or processed.
There's a big difference between maintaining a petabyte of historical data and collecting and processing a petabyte of data in an hour.
Trends: Where is big data going?
Business trends such as social media, internet of things, crowdsourcing, data integration, natural language processing, analytics and visualization are expected to drive the continued growth of big data.
Corporate data is projected to almost double each year for the next decade. High demand for data scientists and data-literate managers is expected over the same time period.
Why is everyone so excited about big data?
It isn't unusual for large and mid-sized organizations to handle petabytes of data. In other words, big data is a common business problem.Big data related IT spending is estimated at around $100 billion globally with an annual growth rate of over 9% a year.
Real world examples of big data include:
Google's search index increased from 11 billion to 50 billion pages between 2008 and 2012.
Facebook has more than 1 billion active users
Walmart often processes more than 1 million transactions an hour representing over 2 petabytes of data.
Twitter handles 400 million tweets a day.
An estimated 100 trillion emails are sent each year. Thats 14,285 emails for every human on the planet.
Pitfalls: Common big data myths
1. Big data is a technology
Big data is both a business and a technology problem (and opportunity).
2. There's a silver bullet for big data
Big data isn't a product. It's a series of complex and diverse business problems that are addressed with numerous tools and architectures.
3. Big data is only a concern for large organizations
Big data is quickly becoming an issue for organizations large and small.
4. Relational databases (RDBMS) can't handle big data
Relational databases are a common component of big data solutions. For example, massive parallel-processing (MPP) databases are often built with relational technologies.
5. Big data is all hype
Big data is a relatively new term for one of information technology's oldest trends: the exponential growth of business data. In fact, business data has grown dramatically for the past 40+ years.
As hardware storage capacity has grown and prices have fallen, demand for storage has increased. As business data has grown, so has its value.
Quick Summary
A few key points to remember about big data:
Big data is a business and technology term that describes processes and tools for achieving value from large volumes of data.
Big data typically implies datasets of a petabyte or more. However, the term might also be used to describe processing smaller datasets at high speed. For example, processing a terabyte of data in a minute.
Big data is defined by its volume, variety and velocity.
Corporate data is doubling each year* — driving demand for big data technologies.
Strong demand for data scientists and data-literate managers is expected over the coming decade.
Big Data Guide A guide to big data including an overview of key technologies. |
Recently on Simplicable
The 9 Principles Of Soa Designposted by Anna MarService-oriented Architecture (SOA) is as simple as can be — it can all be boiled down to these 9 principles. |
What Big Data Really Meansposted by John SpaceyThe 3 things you need to know to cut through the big data hype. |
The 5 Levels of Enterprise Integrationposted by Anna MarEnterprise Integration has traditionally focused on moving data from one database to another. Recent technology trends have challenged this approach. |
Do "Real" Architects Dislike Technology Architecture?posted by Anna MarGo to any job site and query architect — you'll be hard pressed to find the adverts for construction architects in the sea of job postings for technology architects. |