The First Step to become a Data Scientist

Many people say that Data Scientist is the sexiest profession of the 21st century. If you are planning to become a Data Scientist, it will probably be helpful for you to answer a question that I often hear from others. That question is: How do I become a Data Scientist? Where do you start learning? You will find many different answers to this question, but I will share my thoughts and recommend books for you to read.

Programming, or ML?

Yes, that’s right, if you want to become a Data Scientist, then you need to learn programming in at least one language. Some of the most popular programming languages among Data Scientists are R and Python. I use and recommend Python, and if you want to learn why Python is a great language for Data Science, then read this article. The next thing you will need to understand and learn is Machine Learning. Should you start your Data Science adventure with these things?

Your first important step

Prediction and finding different structures in data is one of the most important parts of a Data Scientist’s job. It is statistics and probability that enable us to uncover hidden information in big data. Both of these areas of mathematics are also integral to the various predictive algorithms that are available through machine learning. Your task at the beginning of your journey to becoming a Data Scientist is to learn statistics and probability.

About three, maybe four years ago, when I already knew I wanted to apply for a job as a Data Scientist on my profile on LinkedIn in the description I wrote this quote:

Josh Wills

A Data Scientist is a person who is better at statistics than any programmer and better at programming than any statistician.

If you want to be a future data scientist, remember this sentence and strive to be just such a person. Well, now your first step. Below you will find my recommendations, which include three books. Each of these books will help refresh or acquire enough knowledge of statistics and probability for a future Data Scientist. Future Data Analysts can also make use of them.

Think Stats 

Think Stats emphasizes simple techniques that are useful to explore real data sets and answer interesting questions. The book presents a case study using data from the National Institutes of Health. Readers are encouraged to work on a project with real data sets. This book demonstrates the practical use of statistics and probability while providing code in the Python programming language.

Practical Statistics for Data Scientists

Statistics is a very broad field and only part of it applies to Data Science. This book does an excellent job of focusing on topics that are closely related to Data Science. If you are looking for a book that can quickly provide you with enough knowledge, then this book is for you. Like in Think Stats, you will find practical examples and code that will allow you to replicate what is discussed in the book. This time you will have code available in both R and Python.

The Art of Statistics: How to Learn from Data

In this book, the author shows how to use data to solve real-world problems. Book emphasizes mathematical ideas and connections. This book can be a reliable addition to your journey into the world of Data Science because it teaches you to think like statisticians and solve real-world problems. Unfortunately, you won’t find any code here, but it’s still worth keeping this item in mind.


  2. Importance of Statistics and Probability in Data Science – link
  4. Role of Statistics in Data Science – link
  5. Data Science: the impact of statistics – link
  6. Photo by Bruno Nascimento on Unsplash

Leave a Comment

Your email address will not be published. Required fields are marked *