Machine learning and information architecture: Success factors
The enterprise has long since eclipsed the days of manually analyzing data, as doing so is both expensive and impractical considering the sheer amount of data organizations generate. For years, this task has been delegated to programmers, who often were tasked with creating custom scripts requiring frequent revision and fine-tuning.
Those days are quickly coming to an end, as both the quantity of data and a variety of sources from which that data is collected have increased beyond the practicality of this strategy. Now, organizations are rapidly adopting machine learning to generate insights from data. However, this transition is not a completely seamless one. Understanding how to efficiently utilize machine learning and the data regulations for information processed by machine learning, as well as contending with how computers inherit bias from human decision making, are vital to successful adoption in your organization.
How to prepare unstructured data for processing with AI and ML
The preparations for unstructured data depend on what type of data it is and how you define unstructured. “‘Unstructured’ is often a misnomer, as lots of data types associated with ‘big data,’ such as JSON files (associated with mobile and social feeds), log files, text documents, email messages, and more have structure,” Doug Henschen, principal analyst on data-driven decision making at Constellation Research, told ZDNet. “In the case of this semi-structured data, parsing, filtering and transformation steps can be applied in ETL and ETL-like processes. When this happens at scale, [Apache] Spark is often used instead of old-school commercial integration servers. This processing can bring more structure and consistency to the data.”
What is necessary to generate actionable intelligence from AI/ML-powered analysis?
Think back to an introductory statistics class you may have taken as a student: Without a sufficiently large sample size — or in this case, data set — no meaningful conclusions can be drawn. According to Henschen, some machine learning systems “require at least 10,000 rows of data before you can achieve adequate accuracy.”
The role that machine learning plays in your organization is also worth reconsidering — using it as a drop-in replacement for analytics scripts undercuts the benefits that machine learning offers. “AI has a job to do. You’re defining the model to automate and scale your decisions and actions, to take care of a job,” Forrester enterprise architecture analyst Michele Goetz told ZDNet. “What you’re doing is training a system to be a co-bot with the rest of your organization, not to just say, ‘Oh, look how great our performance was,’ or, ‘This is just where your forecast is going.'”