21Jul - by aiuniverse - 0 - In Data Mining

Source: analyticsinsight.net

The number of data science and big data projects is growing, but very little has been spoken and leveraged to understand the industry best practices for Data Science leaders. The complex world of Big data characterised by its different V’s further complicates the process.  In addition, the proliferation of open-source technologies adds more layers of cloud. To demystify the complexities associated, here are the 10 Industry Best Practices for Data Science leaders to follow-

Leverage Open Source Data

Because open-source tools are such an important part of the data science technology. Much of this data resides in silos, and data scientists with experience will have a better understanding of how to evaluate and manage open-source tools by looking at code activity, package metadata, release history, and project contributors.

Embrace the Changing Data Landscape

The increased use of internet-connected smart devices has changed how does live data flow across organisations. The Internet of Things (IoT) creates large amounts of data quickly from sensors touted as one of the contributing factors in creating this category of big data. The rise of data science is primarily a result of big data, other than traditional structured data, such as text, machine-generated, and geospatial data. Big data and data science go hand in hand; thus, it is imperative for organisations to embrace the changing data landscape. But over time, data scientists will need to collaborate to develop processes and eliminate data redundancy, besides working with IT to understand how to put projects into production, assess the potential resources, and understand security standards.

Integrate Data Management

Data science leaders must integrate the CRISP-DM data mining process, to have a clarity on business understanding including data preparation, modelling valuation and deployment.

To solve business problems, data science teams should understand how to speak the language of the business units they work with. It’s essential that common terms and acronyms are used in presentations with their respective lines of business. This will help establish common ground in defining and evaluating success.

Developing the PoC Phase

Nearly half of data science projects never make it to production. One way to help ensure models ultimately make it into the hands of end-users and bring value to the business is to involve IT and software developers early in the process, especially for security protocols to be met early on.

Promoting Collaborative Efforts

Data scientists do not work in silos. Data scientists scattered across the organization should meet regularly to discuss processes, tools, and projects, while those in centralized structures should meet regularly with business managers. Through regular communication, data scientists will learn more quickly, grow their skill set, make a better case for resources they need, and provide more value to the organization overall.

Data Risk Mitigation

Data science and machine learning are increasingly used to help make decisions that impact people’s lives through credit scoring, job and college applicant scoring, and even potential healthcare outcomes. When implemented thoughtfully, machine learning can improve human decision-making and reduce racial disparity. On the other hand, when machine learning models are implemented without regard for bias or fairness, they can enforce and exacerbate human biases.

The most important steps data scientists can take are to understand biases in their data and understand how their models make decisions. Fortunately, several new open-source tools are available to help data scientists do this, such as FairLearn, InterpretML, and LIME.

Understanding Ethics and Data Governance

As modern data science becomes more and more ingrained into day-to-day business practices, politics, and society, it’s important that questions around bias and fairness be on the minds of every data scientist, business leader, and academic.

A failure to proactively address these areas poses a strategic risk to enterprises and institutions across competitive, financial, and even legal dimensions. We see an opportunity for data professionals to exert leadership within their organizations and drive change.

Putting Strict Controls

Democratization means that business analysts will try to use more advanced technology. Make sure controls are in place before a model is put into production. This might include confirming the validity of a model.

Acting on Data

Analytics without action won’t yield measurable impact. Even if you aren’t ready to operationalize your analysis, it makes sense to start implementing a process to take action, even if it’s manual action. You’ll be building a more analytically-driven culture for when you want to build more operational intelligence.

Building a Centre of Excellence

A CoE can be a great way to make sure that the infrastructure and analytics you implement are coherent. CoEs can help you disseminate information, provide training, and establish or maintain governance.

Monitor the Data Structure

Data can get stale. Models can get stale. It’s important to revisit any kind of analysis where action is taking place on a periodic basis to make sure that your data is still relevant and that your model still makes sense.

Facebook Comments