Most common myths

“All companies have big data-related activities and scale”

Big data is a new trend, and while many companies may want to work on it or might already be working on it, not all of them have implemented it.

There are many reasons why companies may not use big data. A few reasons are:

  • They might not fully understand the importance and power of big data.
  • They do not want to expand their business through big data, due to personal or financial barriers.
  • They cannot hire the right people to set up big data processing.

But most importantly, they do not have a clear use case to justify the return on investment. This is often the case with smaller businesses or businesses that do not have access to existing big datasets.

“Only Google uses big data”

While Google is a major pioneer in the field of big data, it is not the only one in terms of creating and leveraging it.

Facebook and Uber are also big players with significant contributions to the open-source community. This cooperation helps these companies gain value from this data and spread knowledge in the community.

Other sectors that love big data are banking, streaming services, and gaming.

“Big data is about size only”

As explained in the lesson “What Is Big Data And Why is it Popular?”, size is only one of the characteristics used to define big data. While it matters a lot, it is not the only dimension that we study.

“Big data is expensive”

Let’s be realistic: it’s not cheap to start using big data.

First, it is essential to identify use cases that apply to the given company, and have a potentially good return on investment. Then the company needs to attract good engineers to build the system and pay to set up the infrastructure to collect and use the data.

We cannot avoid recruitment and business analysis. Infrastructure costs are getting cheaper, thanks to the elastic pricing most cloud providers offer.

So this myth has a bit of truth to it: big data is becoming cheaper, but may still be too expensive for some companies.

“The big data analysis outcome is always correct and can predict everything”

The only certain thing is that the sun will rise from the East tomorrow morning. Everything else is about probability.

Of course, the higher the probability, the better. And as the saying goes, “garbage in, garbage out”. If you have high-quality data, you will get high-quality results. By the same logic, if we use poor data that includes biases or faulty values, we’ll get poor predictions from our model. However, even the best data will sometimes lead to a false prediction.

“Machine learning and big data are the same thing”

This is an interesting misconception, and while Machine Learning models usually deal with big datasets, this is not necessarily always true.

Machine Learning is about algorithms learning over time, whereas big data is a discipline to tackle data that cannot be processed with traditional methods. In other words, machine learning and big data are often used together, but they are not the same.