How Python rose to the top of the data science world
Source – cbronline.com
It’s safe to say that Python is a pretty popular tool across a whole range of industries and professions, thanks, no doubt, to the programming language’s accessibility, wealth of libraries and frameworks, and of course, its huge community of die-hard devs that claim Python should be the tool of choice for any self-respecting developer. Packt’s 2017 Skill Up survey, backed up these claims when it revealed that Python is the most-used tool for tech professionals across a range of vastly different job roles, slithering its way up from the number 2 spot in 2016. We asked Sebastian Raschka, applied machine learning and deep learning researcher and the author of Packt’s best-selling book Python Machine Learning, why he always turns to Python and what’s next for what is perhaps undeniably the most popular language of the last two decades. Here’s what he had to say.
Snaking its way to the top: How did Python establish itself as a lingua franca for Data Scientists?
Python is one of the most popular programming languages of all time, there’s no doubt about that, but It is hard to tell what happened first – the language or the love of it? Did we develop all the great open-source libraries for scientific computing, data science, and machine learning first, which drove people to choose Python as their tool of choice so that they could take advantage, or did we develop them because more and more people were using Python?
Whatever the truth, one thing is obvious enough: Python is a very versatile language that is easy to learn and easy to use. That, in my opinion, is why it’s so commonly used today. While most algorithms for scientific computing are not implemented in pure python
Python is an excellent language for interacting with very efficient implementations in Fortran, C/C++, and other languages under the hood.
This, calling code from computationally efficient low-level languages, as well as the fact that Python provides users with a very natural and intuitive programming interface, is probably one of the big reasons behind Python’s rise to popularity as a lingua franca, especially in the data science and machine learning community.
Frameworks of the future: What tools, frameworks, and libraries should we be paying attention to?
There are many interesting libraries being developed for Python. As a data scientist or machine learning practitioner, I’d be tempted to highlight the well-maintained tools from Python core scientific stack. For example, NumPy and SciPy are efficient libraries for working with data arrays and scientific computing.
When it comes to serious data wrangling, I use the versatile Pandas package. Pandas is an open source library that provides fast and simplified data manipulation and data analysis tools for Python programming language. It focusses on providing realistic and high-end data analysis in Python. I’d also recommend Matplotlib for data visualization, and Seaborn for additional plotting capabilities and more specialized plots. And Scikit-learn is a great tool for general machine learning, which provides efficient tools for data mining and analysis. It’s probably one of my favorites – it has a great and clean API for almost all basic machine learning algorithms and many helpful data processing tools.
There are, of course, many, many more libraries that I find useful in my projects. I could go on and on. When I need some extra performance, my go-to data frame library is Dask. Dask is an excellent library for working with data frames that are too large to fit into memory and to parallelize computations across multiple processors. Or take TensorFlow, Keras, and PyTorch, all of which are excellent libraries for implementing deep learning models. What you use depends on your personal preferences and the demands of your project, but there are so many handy and exciting frameworks being developed for use with Python all the time – the key is figuring out what works for you.
What does the future look like for Python?
In my opinion, Python’s future looks very bright! It was just ranked as the number one programming language by IEEE Spectrum in July, and Packt’s recent Skill Up survey showed that it’s the most popular tool in tech at the moment. While I mainly speak of Python from a data science and machine learning perspective, I’ve heard from many people in other domains that they appreciate Python as a versatile language and its rich ecosystem of libraries. Of course, Python may not be the best tool for every problem, but it’s very well regarded as a ‘productive’ language for programmers who want to ‘get things done’.
Also, while the availability of plenty of libraries is one of the strengths of Python, I’d also want to highlight that most of the packages that have been developed are still being exceptionally well maintained today – new features and improvements to the core data science and machine learning libraries are being added on a daily basis. For instance, the NumPy project, which has been around since 2006, just received a $645,000 grant to further support its continued development as a core library for scientific computing in Python.
Python and its associated libraries have been so useful to me in my career that I would want to personally thank all the developers of Python and its open source libraries that have made Python what it is today. It’s an immensely handy tool for me, and as a Python user, I’d hope that those reading this might consider getting involved in open source – every contribution is useful and appreciated – from small documentation fixes, bug fixes in the code, new features, or entirely new libraries. Thanks to the awesome community around it, I think Python’s future looks very bright indeed.