MODERNIZED REQUIREMENTS OF EFFICIENT DATA SCIENCE SUCCESS ACROSS ORGANIZATIONS
Does the success of companies like Google depend on that of the algorithms or that of data? Today’s fascination with artificial intelligence (AI) reflects both our appetite for data and our excitement about the new opportunities in machine learning. Amalio Telenti, Chief Data Scientist and Head of Computational Biology at Vir Biotechnology Inc. argue that newcomers to the field of data science are blinded by the shiny object of magical algorithms — and that they forget the critical infrastructures that are needed to create and to manage data in the first place.
Data management and infrastructures are the little ugly duckling of data science but they are necessary for a successful program and therefore need to be built with purpose. This requires careful consideration of strategies for data capture, storage of raw and processed data and instruments for retrieval. Beyond the virtues of analysis, there are also the benefits of facilitated retrieval. While there are many solutions for visualization of corporate or industrial data, there is still a need for flexible retrieval tools in the form of search engines that query the diverse sources and forms of data and information that are generated at a given company or institution.
Besides, the need for well-thought-out solutions for capture, storage and retrieval of data only serves the purpose if there is buy-in to the intrinsic value of data, beyond the original, ad hoc use, thus the concept that investment in data — maximizing generation of data and completeness of metadata — is in the critical path to successful use of AI. The intrinsic value of the data may grow over time to provide insight into manufacturing processes, client engagement, business decisions, and others, including for third-party use.
While Amalio emphasized the importance of creating one’s own data, it is important to highlight some technologies that address the value of external data for the analysis of data that is internal and unique to a given enterprise. One such approach is “transfer learning”, a deep learning method that allows the efficient building of accurate models. As one author writes, “With transfer learning, instead of starting the learning process from scratch, you start from patterns that have been learned when solving a different problem.” A well-known example of this is when generic images (flowers, cars, animals, etc.) are used to pre-train models for the analysis of medical images.
According to him, the role of complete data science operations (data infrastructure, data acquisition, and data analytics) is a key component for the implementation of AI. Implementation of new machine learning algorithms is necessary, but not sufficient, for a successful program. Second, many companies may need to ponder the concept of “data as an asset.” What data in your company has intrinsic value for operations or research and development, and how can you preserve and grow this resource? Therefore, for success, businesses must pursue a combination of dedicated infrastructures, more data, better data, and the best algorithms.