Common Mistakes Made by Aspiring Data Scientists
Learn about the typical mistakes aspiring data scientists often make that prevent them from landing a data science job.
We'll cover the following
- Learning theory without practical application
- Moving to machine learning without prerequisites
- Using inappropriate data science terms in your resume
- Real projects and competition projects
- Accuracy is not the most important thing
- Learning multiple tools
- Leaving public speaking and communication skills
- Not working on case studies
The mistakes included in this lesson are the factual mistakes people make when learning data science to get a job. So, you should learn from these mistakes and avoid them.
Learning theory without practical application
The first mistake is learning theoretical concepts without practicing them on a real dataset to understand the real science behind those concepts. Learn from books, blogs, videos, and courses, whatever you believe makes learning easier. But it is not sufficient if you don't practice them. Whenever you learn a new concept, search for real-life problems based on that concept, and you will find that your grasp on that concept is more robust than earlier. You'll likely find almost all problems along with the solutions on any search engine.
Remember: Ensure that if you learned something, you can use it in solving real problems. Here's a fundamental rule you should follow while studying the concepts: don't try to learn everything in one go. It is more important to go slow and steady instead of rushing.
Moving to machine learning without prerequisites
The second mistake is moving on to studying machine learning concepts without having prerequisites. The majority of students are amazed by machine learning techniques. But if you want to understand it, you should understand how an algorithm works from scratch.
Mathematics and statistics play an important role here, and the concepts you need to learn are linear algebra, calculus, matrix theory, statistics, and probability. There are many online and offline resources for these topics, but don't get confused by the options. Initially, choose one and follow through on it. Eventually, you'll get to know the concepts and their applications. Once you feel comfortable, exploring different books and courses will significantly help you.
Using inappropriate data science terms in your resume
The third mistake is misusing data science terms in your resume. Hiring managers are looking for your background in data science, and using too many data science terms without explaining why you used them will make your resume vague.
For example, suppose you mention, "I am proficient in the random forest algorithm." This shows that you are proving your knowledge without any practical experience. It can be revised to, "Applied random forest to predict loan approval." This shows that you have used the random forest algorithm, and they can ask you more about your project.
It is essential to mention the data science terms, but how you mention them needs to be more specific, and using too many terms will make your interview difficult because you will have to process so much information. In that case, you should only list the terms you feel very confident about and are mentioned in the job role. Always write a line about why you used them.
Real projects and competition projects
The fourth mistake is thinking that real data science problems are similar to data science competition problems. The datasets provided in competitions are almost clean, with a few missing values that don't require too much work to fill out.
On the other hand, real data science problems are not like that. In most real data science problems, you'll get unclean data and spend around 80% of your time cleaning it. This is frustrating, but it's the reality of data science work. You can avoid this mistake only by getting experience.
Accuracy is not the most important thing
The fifth mistake is prioritizing accuracy. Let's understand this through an example. Suppose you have 100 variables to predict the sales of an e-commerce company, such as the type of products, locations, delivery dates, and so on. You may not know what some variables mean, but you build a model with good accuracy and drop some variables. Some of these dropped variables might be important to the business, but you dropped them because they were not contributing to the higher accuracy of the model.
For example, suppose there exists a variable called init
, which actually means initial interaction with the website, but the documentation says it is an integer variable, and you see random integers that do not contribute much to the accuracy. You dropped it, but this variable is crucial for the marketing team. So, dropping init
isn't a good idea.
This understanding can be achieved if you have domain knowledge, but don't panic if you don't know about any businesses yet. The truth is nobody knows; everyone explores. Once we explore, we learn about businesses and understand the facts.
Author's note: I know a data science consultant who did not have any knowledge about the banking industry but worked for a bank. The reason for that is that higher level executives understand that finding domain experts who have expertise in data science is really a difficult task because data science is still new.
Learning multiple tools
The sixth mistake is trying to learn many tools. Some learners are confused about choosing a tool, and many think they should learn R and Python from the start, which is the wrong way. Think about it like if you have teeth problems. Who do you go to, a dentist or a general physician? The dentist, of course! They're a specialist. In the same way, you need to be a specialist in R or Python. Once you are a master in one of the two, you can learn the other to add tool expertise and use it according to your needs.
There's an old saying: "Jack of all trades, master of none." It would be best if you were a master of one of the two instead of being a Jack of all tools. If you are inclined toward programming, then choose Python. But if you are less inclined to solve coding problems, you should learn R.
Since this course is designed for programmers, we suggest initially forgetting about R and focusing on Python. Some things can be done more quickly in R than in Python, but the demand for Python professionals is very high. So, initially focus on that, and later you can decide for yourself which is better. The first objective is to enter into data science; the rest will come along.
Leaving public speaking and communication skills
The seventh mistake is not working on public speaking and communication skills. In a data science job, you'll have to deliver presentations, and most people think that they will have to do that after completing the project, but you might have to do it in the interview or the first week of your job.
Author's note: Since the company knows the power of data, they hire a data scientist, but they don't know how you can help them. So, they'll ask you to explain your importance to them, and you'll have to prepare. If they know what they want to achieve with your potential, then they might ask you for the presentation after the analysis. You will have to present your insights to them in a professional setting.
Since we want to fill the demand for the data science workforce, we should focus on publicly communicating your project needs and insights to nontechnical people. That is why confident public speaking is a must; how you take this is where communication skills come in. The interviewer will monitor your communication skills throughout the interview process, and they might also ask for a presentation.
Not working on case studies
The eighth and the last most common mistake is not solving the case studies. In the final onsite round, most people fail at solving case studies. You have knowledge and tool expertise, but it is not enough because a data scientist's job is to solve a business problem. The case studies are copies of real problems faced by the company, and they want you to propose a possible solution to the problem.
Since you are not part of the team, they know you are less likely to solve the problem, but they are looking for your problem-solving approach. Whatever approach you have should be logical, and you must be able to communicate your steps. These are the most common mistakes that must be avoided if you want to become a data scientist, and you'll save a lot of your time by keeping these mistakes in mind.