Summary -

In this topic, we described about the below sections -

How the data increased?

In the past years, the communication between the people increased a lot due to the modern technologies and devices. Every communication from a single person through the internet creating the data at the background.

The data is growing rapidly day by day due to the internet traffic increasing day by day. Let us discuss with statistics how the global internet traffic got increased drastically.

  • In 1992, the global internet traffic is 0.0017GB/Sec.
  • In 1997, the traffic got increased to 0.028GB/Sec.
  • In 2002, the traffic got increased to 100GB/Sec.
  • In 2013, the traffic is 28,875GB/Sec and as per forecasts in 2018, the data traffic may get increased to 50,000 GB/Sec.

As per the statistics in 2015 mid, 2.5 QUINTILLION BYTES (2,500,000,000,000,000,000 BYTES) of data creating daily. Finally, the data is too big for better or worse, 90% of world's data generated over past years.

What is Big Data?

Big data is a data that exceeds the processing capacity of existing/conventional techniques/database systems. Big data usually includes datasets with sizes beyond the ability of commonly used software tools to manage/process.

The data is too big and grows/changes too fast. In other words, Big data is the data that exceeds the processing capacity of conventional database systems. The data is large, too big, creates too fast and don’t have a proper structure.

The data comes from everywhere like,

  • Sensors to gather climate information
  • Satellite images
  • Social media posts
  • Digital pictures
  • Videos
  • Purchase transaction records
  • Banking transactions
  • Content of Webpages
  • Cell phone GPS signals
  • Web server logs
  • Financial market data and so on…

Big Data Challenges -

The Big data challenges include

  • Capture
  • Process
  • Store
  • Search
  • Analyze
  • Sharing
  • Transfer
  • Presenting

Big Data 3vs -

There are three Vs that care commonly used to characterize various aspects of big data.

  1. Volume
  2. Velocity
  3. Variety

These three Vs are the helpful lens through which to view and understand the nature of the data. There are software platforms available to exploit and most probably the data can be reached to each of the V at one or other stage.

Big Data Overview

Let’s discuss about each V in detail -

Volume -

Volume describes the amount of data generated from different sources. Volume always talks about size of the Data. The process is a batch operation. The process suits for analytical or non-interactive computing tasks.

Example:

The size of the data increasing day by day. The data is generation from individuals, companies, social networking sites etc,.

Let us take a website as an example. In a website, the data is published by publisher. Registered or unregistered users read the content and generate the data in terms of comments.

The data generation depends on the users visiting the particular site. If the users increased, then there might be a chance of increasing the data as well. If the data increased, the volume of the data also increases. If the volume is big, then big volume certainly represents the Big Data.

Velocity -

Describes about frequency of the data. The data can be generated, captured and shared. It is not just the velocity of incoming data. It is possible to stream fast-moving data into bulk storage for later batch processing.

Industry terminology for such fast-moving data tends to be either “streaming data” or “complex event processing”. There are two main reasons to consider streaming processing.

The first is when the input data are too fast to store in their entirely. The second reason to consider streaming is where the application mandates immediate response to data.

Example:

In olden days, the data processing can be done through batch process as the update window gap time is in hours. For example, before 2003 in media, the news update window gap was very high and the news updates showing for specific day are the news from the day before specific day.

But today, social media like Facebook, news channels etc got updated and providing live updates. Sometimes people are not even showing interest to hour back news.

The data movement now days are almost real time and the update window got reduced to fraction of seconds. The velocity of real time data updates has been increased to very high and this high velocity data represents the Big Data.

Variety -

Describes the type of the data. The type of the data can be structured or unstructured. The examples can be text, sensor data, audio, video etc.

A common use of big data processing is to take unstructured data and extract ordered meaning for consumption. The process of moving data from source to processing application might involve the loss of information.

Example:

If the data is in the same format, then the processing can be simple and easy. Data can be stored in multiple formats like database, excel, doc, text etc,. Sometimes the data may be in the format which is not understandable.

The data can be in unstructured formats like audio, video etc,. The big challenge is to arrange the data and process it meaningful for consumption. The variety of the data represents Big Data.

Big Data Uses -

Big data is useful and plays crucial rule to create statistics or reports from the data collected from the various resources. A common use of big data processing is to take unstructured data and extract ordered meaning for consumption.

Below are the examples of some uses -

  1. Real-time transport information by collecting the data from various sensors or GPS data from various sources.
  2. Healthcare trends information by collecting the data from various places or locations about diseases.
  3. Economic Development based on the reports generated from existing trends and predicts current trends.
  4. Similarly Social networking reports, Retail information reports, crime reports etc,.

Big Data Types -

Big data describes the collection of complex and large datasets. The data in big data can be separated as three types.

  1. Structured data. Ex: Data in DB tables or Relational data.
  2. Semi Structured data. Ex: XML.
  3. Unstructured data. Ex: Videos, images, text etc.

We have RDBMs already in hand to process structured Databases and tables which are in the form of rows and columns. But now a days, we are getting the data in the form of videos, images and text etc. and known as semi structured or unstructured data.

This data can’t be processed by RDBMs and need an alternative way to store and process the unstructured or semi structured data. The solution for the Big Data problem is Hadoop. Hadoop is entirely different from traditional systems processing and can overcome all the above problems to process the data.