LEARNING ABOUT DATA SCIENCE THE “SCIENTISTS” WAY
As a kid, the term “Scientist” always fascinated me and made me wonder about the wonderful experiments that Scientists would conduct wearing those white coats. Scientists were considered to be higher-level professionals at that time in this field. Scientists were the knowledgeable ones and had a plethora of smartness. But today, the word Scientist is a designation that is used for different purposes. Data science is one of the upcoming fields and “Data Scientists” has become a very famous term in relation to this.
Data Science is an interdisciplinary field of study comprising three major fields, Mathematics and Statistics, Computer Science knowledge and also information about the domain Operations/Marketing/Finance. Knowledge of all three bubbles helps in solving any problem in the data science domain. Computer science programming and fundamentals are required for the implementation of any solution to a problem.
The fundamental subjects of Networking, Data Structures and Algorithms, Operating Systems, Programming, Databases etc. are essential skills for any Data Scientist to solve Data Science problems. A computer engineer can very well relate to the different complexities faced in any kind of problems and be active in understanding its implications. A computer science engineer also possesses the intellect to deal with all kinds of structural problems. Good knowledge of computer science fundamentals is a must for getting to learn data science.
Coming to mathematics and statistics, concepts of both of these fields are very necessary to become data scientists. Knowledge about the basic probabilities, its advanced concepts, derivatives, integrations, linear algebra and basic calculus. These concepts form the base of the machine learning algorithms and higher concepts. Preliminary statistics and mathematics are very important for all kinds of back end operations that are performed using any programming language.
Once you have gained knowledge on the basics, getting a clear understanding of all the elementary descriptive statistics, prescriptive statistics, distributions, hypothesis testing etc is essential for understanding the baseline of how things will work around different problems. Depending on the problem statement, the application of one of the statistical methods will have to be identified to get on to building a solution for it. Solving preliminary problems can help you understand the application of each of the mathematical concepts.
Moving on to domain knowledge about Operations/Marketing/Finance, this is something that you will always gain at your workplace depending on the tasks that you have at hand. There could be two approaches for gaining domain knowledge, the first being you take up a course which specializes in a particular domain. This will give you a lot of theoretical knowledge on the subject that you are pursuing and also gives you practical knowledge as and when you work on different dummy case studies and also encounter some real-life problems on the desk.
Secondly, you directly obtain all of this knowledge from your workplace. So as long as you have the other two bubbles in place, you can keep your third bubble of domain knowledge for the desk job. Getting to know about business knowledge is very important. The way you approach any problem statement will change altogether depending on the domain. This is one bubble which cannot be ignored as it forms the fundamentals of solving data science problems.
The three bubbles mentioned above form the core of the “Scientists” approach to data science. Having a good understanding of all the concepts is primary to getting yourself stronger on each of the bubbles. Most of the algorithms have a mathematical construct and use mathematical terminology. Hence, mathematics will help build the base for the understanding of the various machine learning algorithms. Data Science has been made easy by people who have built libraries which consist of all the machine learning algorithms implementations and they can be easily used to perform any of the mathematical operations.
All the machine learning algorithms have mathematical derivations and this bridge is well built between programming and mathematics using python libraries like numpy, pandas, matplotlib etc. Python is one of the languages which has well-established tools for data science and most importantly, they are easy to learn and use. Numpy and pandas allow you to perform mathematical operations on structured data. Numpy has numerous functions which are implemented to make mathematics in programming easier. Pandas is a library which allows you to read data into a structured manner and allows us to perform any kind of operations on it. These form the base of data prepping when the intention is to use machine learning to solve the problem.
Learning all the core concepts and understanding them is central to arriving at a solution in an organised manner. Data Science is a different kind of science and has to be approached in a scientific manner to get to its core. If the approach towards learning is right, it will seem much easier. It is a very interesting field as nothing is defined in relation to the problem. After understanding the problem, the approach used for solving it is identified.