How to Get into Data Science — Math Or Coding
I hope I know this answer when I was trying hard to enter into data science before I graduated from university.
Some background of mine, I came from a Mathematics background, who does not take a lot of programming courses during university. The programming language I learned in university included R, C++, and Matlab.
Matlab is not an open-source language and is mainly used in the research industry. R does not have a large community as Python has especially in libraries related to data science. C++ (C family) is still the fundamental of programming, so if you are learning to code, I will still suggest you learn the C family’s language.
When I was having my internship, Python is mostly used in the industry. Therefore, I still have to pick up Python on my own. Besides, I only took one course that is related to math for machine learning.
I felt overwhelmed as I was required to not only learn math but also improve my coding skills at the same time. Thus, at that time, I was wondering should I put more effort into coding or into learning math.
Mathematics OR Coding
I will be sharing my perspective on which is actually more sought after in the current industry.
Let me ask you one question. If you were the tech lead of data science, and there already has a lot of Ph.D. people working for you, at the same time, you would like to expand your team. You have two candidates in mind, one is better in coding and one is better in math concept, which candidate will you prefer?
There is no right or wrong answer to this question, but from what I observed, usually, they will prefer the ones who have better skills in coding.
You may think, why?
The reason is quite simple, because most of the direction of the data science projects, will be provided by the Ph.D., who should be more knowledgeable. Thus, the one who can implement multiple approaches faster will be the last man standing.
Then, you might ask, the statistic is the root of data science, and you are telling me to just learn how to code well in order to enter data science? 🤔
Nope, math is still very important in data science. The ones who understand math better will be the ones who can come out with new ideas to improve the machine learning model.
There are tons of machine learning models in the market currently. Thus, having knowledge of which models to use in what kinds of scenarios will definitely save you a lot of time. Besides, when the model which previously performs very well, and suddenly the performance drop, you will be able to find out the possible reasons.
However, if you just want to get to the data science field, you do not need to deep dive too much in detail in the mathematics part. Data Science is not all about knowing how to derive or solve mathematics equations. More importantly, it is to know how to define and solve the business problem.
For instance, you are working in an e-commerce company. You are given a task to auto-categorize listings. Probably, the first step you would need is to define the problem, maybe stating a timeline and accuracy which you would need to achieve. Next step, you will be thinking of some problems the models might face and need clarification.
Let’s say, if the listing name and the picture belong to different categories, then how should the listings be classified? Should it be classified according to the picture or the listing name?
After understanding the standard operating procedure (SOP) which your team agrees to, then only you will be able to start the project.
Back to the topic, one of the skills that are highly required by data science is the ability to fork the GitHub code and try out on your dataset. Therefore, if you are good at coding, you would be able to test different approaches no matter what the programming language is.
For example, you are training a NER (name entity recognition) model with a given dataset. Let’s imagine that currently, there is no code written on NER in Python yet, the only available code is written in Java which is provided by Standford University. Thus, having knowledge in different programming languages is definitely a plus so that you could save your time on writing the whole code in Python in order to train the model.
On the other hand, if you study more in mathematics parts of machine learning, you will be more sensitive to which metrics you should be taking care of, subject to different problems. Let’s say you are working on a credit fraud project. The metric you should focus on is no longer accuracy, it should be for instance f1-score. As your aim is to not only able to identify as many fraud cases as possible but also maintain the precision.
Mathematics and coding are equally important in data science, but if you are considering to switch or start your career in the data science field, I would say coding or programming skills are more important than deep dive to the math for various kinds of machine learning models.
Start to do more real-world projects, and able to present and answer questions clearly during the interview will definitely increase your chance to get into data science.
Get into data science is hard, but remember not to give up and continue to work hard.
About the Author
Low Wei Hong is a Data Scientist at Shopee. His experiences involved more on crawling websites, creating data pipeline and also implementing machine learning models on solving business problems.
He provides crawling services that can provide you with the accurate and cleaned data which you need. You can visit this website to view his portfolio and also to contact him for crawling services.