This lesson formally presents the definition of Big Data.

How can we identify Big Data? Is it 1 gigabytes, 10 terabytes or a 1000 petabytes? With each passing year as the processing and storing capacity of hardware improves, we can process bigger and bigger data sizes. We can’t say that any size greater than a number constitutes Big Data. Big Data is the exponential increase and availability of data in our world. More formally, Big Data is defined as data with greater variety arriving in increasing volumes and with ever higher velocity and requires a scalable architecture for efficient storage, manipulation, and analysis. These properties are also known as the three Vs. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems that couldn’t be tackled before.

  • Volume: The sheer amount of data matters. It’s estimated that 40 zettabytes of data will be created by 2020, an increase of 300 times from 2005. With Big Data, one has to process high volumes of data. These can be data of unknown value, such as Twitter data feeds, click streams on a webpage or a mobile app, or sensor-enabled equipment. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.

  • Velocity: Velocity is the rate at which data is received and acted on. Normally, the highest velocity of data streams directly into memory, instead of being written to disk. Some internet-enabled smart products operate in real-time or near real-time and will require real-time evaluation and action.

  • Variety: Variety refers to the many types of available data. Data comes in different forms. Structured data can be organized neatly within the columns of a database. This type of data is relatively easy to enter, store, query, and analyze. Unstructured data is more difficult to sort and extract value from. Examples of unstructured data include emails, social media posts, word-processing documents; audio, video, or photo files, web pages, and more.

