Oracle Data Science efforts advance with new services
Oracle introduced new data services that expand the number of services available on its cloud platform.
The marquee new service from the software giant is the Oracle Cloud Infrastructure Data Science offering — an evolved version of the DataScience.com platform that Oracle acquired in 2018.
The Oracle Data Science service provides an automated workflow for machine learning and data analysis. Oracle is also launching a new data catalog service that helps users organize data for analysis. Another new capability is the Cloud SQL service that enables users query cloud data stores, while the Data Flow service enables users to run Apache Spark big data analysis as a service.
Oracle is playing to its strength in data with the new services, unveiled Feb. 12, according to Nucleus Research analyst Daniel Elman.
“Oracle made its name on database technology and remains to this day a preeminent leader in the space,” Elman said. “With these services, it’s leveraging this expertise with data management and offering its thousands of database customers a natural route to enabling data science initiatives without having to migrate data or learn new specialized tools.”
Oracle Data Science positioned for ease of use
Oracle is marketing the Data Science service as a way for teams of data scientists to work together collaboratively to generate machine learning models and then apply them to production applications.
The data science service has a project environment that sets up all the infrastructure and the networking needed to access data assets, as well as providing the tools needed for data science, explained Greg Pavlik, senior vice president of product development, data and AI services at Oracle. Among the tools is an automated machine learning feature that provides these capabilities for common data science tasks such as algorithm selection.
Oracle getting into the data catalog market
Alongside the Oracle Data Science service, the vendor launched a new data catalog to help organizations track all the data sets that come into a cloud deployment.
“Say you’re setting up a data warehouse, we can introspect the data warehouse model, and allow users — it could be data scientists, it could be data stewards, it can be analysts — to find out what data is available, who owns it and what it’s meant to be used for,” Pavlik said.
The Oracle data catalog also provides tagging capabilities that enable administrators to define taxonomies and start to organize data sets hierarchically.
Data Flow service enables Apache Spark Big Data
The new Data Flow service also helps meet a different need, enabling users to run Apache Spark jobs as service in the Oracle cloud. One of the challenges some organization face with running Spark analytics jobs is that they are often running on top of Hadoop clusters, which introduces additional complexity, Pavlik noted.
All that’s needed to run a big data workload in the Data Flow service is to upload the script, click on an application that is sort of the pointer to the script, and then specify how many CPUs the job should run on, Pavlik explained.
“We will synthesize the job on the fly in a totally serverless architecture, executing in tens of seconds,” he said. “We really think about this as a big generational leapfrog in terms of how to how to make big data workloads consumable by the enterprise.”
Oracle is also expanding the ability of users to query data in the cloud with the new
Oracle Cloud SQL offering. Users can use the SQL capability to query against cloud-based object stores.
“So you can reach out into a cloud-based data lake and apply the full semantic richness of the Oracle database,” Pavlik said.
Data integration service is coming
In addition to the Oracle Data Science services, the vendor has more data services in the works, among them a data integration service. Pavlik said that an upcoming data integration service will provide data preparation and ETL capabilities.
“It figures out where’s the most cost-effective way to run elements of the flow so that it’s filtering data and minimizing data movement,” Pavlik said. “It’s also filled with a data immersive view, so you can really drill down, understand your datasets and manipulate the data.”