software engineering Archives - Artificial Intelligence

What’s different about hiring data scientists in 2020?

aiuniverse — Fri, 14 Aug 2020 05:34:19 +0000

Source: techcrunch.com

It’s 2020 and the world has changed remarkably, including in how companies screen data science candidates. While many things have changed, there is one change that stands out above the rest. At The Data Incubator, we run a data science fellowship and are responsible for hundreds of data science hires each year. We have observed these hires go from a rare practice to being standard for over 80% of hiring companies. Many of the holdouts tend to be the largest (and traditionally most cautious) enterprises. At this point, they are at a serious competitive disadvantage in hiring.

Historically, data science hiring practices evolved from software engineering. A hallmark of software engineering interviewing is the dreaded brain teaser, puzzles like “How many golf balls would fit inside a Boeing 747?” or “Implement the quick-sort algorithm on the whiteboard.” Candidates will study for weeks or months for these and the hiring website Glassdoor has an entire section devoted to them. In data science, the traditional coding brain teaser has been supplemented with statistics ones as well — “What is the probability that the sum of two dice rolls is divisible by three?” Over the years, companies are starting to realize that these brain teasers are not terribly effective and have started cutting down their usage.

In their place, firms are focusing on project-based data assessments. These ask data science candidates to analyze real-world data provided by the company. Rather than having a single correct answer, project-based assessments are often more open-ended, encouraging exploration. Interviewees typically submit code and a write-up of their results. These have a number of advantages, both in terms of form and substance.

First, the environment for data assessments is far more realistic. Brain teasers unnecessarily put candidates on the spot or compel them to awkwardly code on a whiteboard. Because answers to brain teasers are readily Google-able, internet resources are off-limits. On the job, it is unlikely that you’ll be asked to code on a whiteboard or perform mental math with someone peering over your shoulder. It is incomprehensible that you’ll be denied internet access during work hours. Data assessments also allow the applicants to complete the assessment at a more realistic pace, using their favorite IDE or coding environment.

“Take-home challenges give you a chance to simulate how the candidate will perform on the job more realistically than with puzzle interview questions,” said Sean Gerrish, an engineering manager and author of “How Smart Machines Think.”

Second, the substance of data assessments is also more realistic. By design, brainteasers are tricky or test knowledge of well-known algorithms. In real life, one would never write these algorithms by hand (you would use one of the dozens of solutions freely available on the internet) and the problems encountered on the job are rarely tricky in the same way. By giving candidates real data they might work with and structuring the deliverable in line with how results are actually shared at the company, data projects are more closely aligned with actual job skills.

Jesse Anderson, an industry veteran and author of “Data Teams,” is a big fan of data assessments: “It’s a mutually beneficial setup. Interviewees are given a fighting chance that mimics the real-world. Managers get closer to an on-the-job look at a candidate’s work and abilities.” Project-based assessments have the added benefit of assessing written communication strength, an increasingly important skill in the work-from-home world of COVID-19.

Finally, written technical project work can help avoid bias by de-emphasizing traditional but prejudicially fraught aspects of the hiring process. Resumes with Hispanic and African American names receive fewer callbacks than the same resume with white names. In response, minority candidates deliberately “whiten” their resumes to compensate. In-person interviews often rely on similarly problematic gut feel. By emphasizing an assessment closely tied to job performance, interviewers can focus their energies on actual qualifications, rather than relying on potentially biased “instincts.” Companies looking to embrace #BLM and #MeToo beyond hashtagging may consider how tweaking their hiring processes can lead to greater equality.

The exact form of data assessments vary. At The Data Incubator, we found that over 60% of firms provide take-home data assessments. These best simulate the actual work environment, allowing the candidate to work from home (typically) over the course of a few days. Another roughly 20% require interview data projects, where candidates analyze data as a part of the interview process. While candidates face more time pressure from these, they also do not feel the pressure to ceaselessly work on the assessment. “Take-home challenges take a lot of time,” explains Field Cady, an experienced data scientist and author of “The Data Science Handbook.” “This is a big chore for candidates and can be unfair (for example) to people with family commitments who can’t afford to spend many evening hours on the challenge.”

To reduce the number of custom data projects, smart candidates are preemptively building their own portfolio projects to showcase their skills and companies are increasingly accepting these in lieu of custom work.

Companies relying on old-fashioned brainteasers are a vanishing breed. Of the recalcitrant 20% of employers still sticking with brainteasers, most are the larger, more established enterprises that are usually slower to adapt to change. They need to realize that the antiquated hiring process doesn’t just look quaint, it’s actively driving candidates away. At a recent virtual conference, one of my fellow panelists was a data science new hire who explained that he had turned down opportunities based on the firm’s poor screening process.

How strong can the team be if the hiring process is so outmoded? This sentiment is also widely shared by the Ph.D.s completing The Data Incubator’s data science fellowship. Companies that fail to embrace the new reality are losing the battle for top talent.

The post What’s different about hiring data scientists in 2020? appeared first on Artificial Intelligence.

Machine Learning Engineer versus Software Engineer

aiuniverse — Wed, 01 Apr 2020 08:40:06 +0000

Source: towardsdatascience.com

Software engineering has blown up to encompass more than 1million employees in the US as of 2018 and is not forecasted to slow in growth. Next to come is the machine learning engineer, who takes an automation or decision making problem and applies cutting edge tools to it.

With the pervasive nature of machine learning (particularly deep learning) across industry, more engineers deploy these tools on a day to day basis. The list of tools that use deep learning that make companies huge profit margins is effectively endless: search recommendation, speech-to-text, voice assistants, facial recognition, advertisements, and more.

How does implementing these models differ from the roles of building vast distributed software-systems? The mindset is similar, but the specializations are different.

Software Engineering —building a data network

The data flow is the key to any at-scale software project. Engineers must choose the right algorithm to deploy on devices locally, what languages to develop in (and what language they compile into), and how many levels in the software stack.

The software engineer ultimately works in the space of language, data structures, and algorithms.

Language: The development and test languages are the work environment of software engineers. They develop an intimate understanding of capabilities for different languages, and the tradeoffs scale drastically. Python is a favorite because the downstream decisions become so much more fluid (I agree with Python).
Data structures: Different data structures determine which computer operations are fast — do we want fast access of data (hash table)? Fast post processing with the learning tool (Tensor)? Or something else? Different languages have different properties to leverage, and the best software engineers are fluent in these like a foreign language.
Algorithms: Standard algorithms are the foundation of technology interviews — sort, search, and so on — because they do matter at scale. “Big O” notation is a quirky tool for learning, but the ideas translate massively when working on deployed systems.

Loving the complexity of one’s own system so that you can create more in it and show off metrics falls short when others try to use it is the downfall of a super-capable engineer. Simplicity is king because it scales and enables collaboration at the company scale.

Good software engineering ultimately will make the task of machine learning easier. The data will be more available and more uniform for distillation into products and value.

Machine Learning Engineering— building a knowledge network

Learning engineers are distilling logged knowledge (data) and creating decision boundaries. The decision boundaries are frequently nonlinear, and frequently difficult to interpret (such as a trading agent or a robot planner), but they are decision boundaries informed by data.

Machine learning engineers think in the space of Models, Deployment, and Impact.

Models: When should I use a deep model or a Bayesian approximation? Knowing which systems generalize better, can be fine-tuned on-device, and are interpretable is the key for machine learning engineers. Also, the expertise over models is what makes ML PhD’s such valuable hires for technology companies.
Deployment: Many companies have defined their niche in this area. Device scale artificial intelligence is the current push for consumer electronic companies (ahem, Apple) and model efficiency dominates costs of the digital goliaths. (Facebook, Google, etc). Tesla dominates the automative automation market with unmatched cloud car updates. Next is how individual engineers contribute — more specific models for specific tasks will add up in our lives, and the efficiency of models will change internet speeds and battery life.
Impact: Ethics. Does the model I am deploying benefit a subgroup at the cost of another? This is something ML engineers need in their repertoire because the dataset you choose and train on will be reflected in your product. Consider if a dataset is collected from a sample of 100 pre-alpha users, how will that translate when it touches millions of unknowing eyes? Data transparency is behind and individuals need to be accountable.

When surveying options for implementation, other machine learners want to be able to extract and mirror useful code in a modular fashion, enabling rapid evolution. I’ve tried to utilize multiple state-of-the-art projects that were trapped in too many layers internally to take the next step and make them impactful at scale in the real world — which is why simplicity is king.

The Theme — Digital is Cheap, Simplicity is King

Both the engineering roles leverage the face that iterating in the digital domain is cheap and fast — every marginal user adds high value at a low cost. With this, the simplest methods tend to dominate because they can be so pervasive — simple methods have better generalization in learning and better interfacing in software.

The best engineering students don’t optimize within the box given to them, they look for cracks that’ll totally change the nature of the game. In software engineering, that is in the form of using new tools and data structures, in machine learning engineering that’ll be in tweaking a new model type or how it is deployed. I suspect as software engineering becomes increasingly automated, machine learning engineers will drive the best companies.

This post was inspired by a conversation on the Artificial Intelligence Podcast, with Lex Friedman hosting Andrew Ng, when discussing the impact that Massive Online Open Courses are having, how computer science is taught, and how the big tech companies dominate markets.

The post Machine Learning Engineer versus Software Engineer appeared first on Artificial Intelligence.

DotData’s AI Builds Machine Learning Models All by Itself

aiuniverse — Sat, 07 Mar 2020 07:05:27 +0000

Source: spectrum.ieee.org

Demand for data scientists and engineers has, for the past couple of years, been off the charts. The number of openings for machine learning and data engineers posted on recruiting web sites continues to grow by double digits annually, and those working in the field have been commanding ever-higher salaries.

Joining the ranks of these desperately sought after techies takes serious coding chops, definitely expertise in Python, along with familiarity with other languages. That combination—of job openings for data engineers along with the dominance of Python, means Python regularly makes the charts of most in-demand coding languages.

So anyone contemplating a future in data science or machine learning needs to build up software engineering skills, right?

Wrong, says Ryohei Fujimaki, founder and CEO of dotData. Fujimaki has, for nearly a decade, been working to use AI to automate much of the job of the data scientist.

We can, he says, “eliminate the skill barrier. Traditionally, the job of building a machine learning model can only be done by people who know SQL and Python and statistics. Our system automates the entire process, enabling less experienced people to implement machine learning projects.”

DotData—which is currently offering its tools as a cloud-based service—came out of NEC. Fujimaki, then a research fellow at the company, started thinking about automating machine learning in 2011 as a way to make the 100 or so data scientists on his research team more productive. He got sidetracked for a few years, focused on commercializing an algorithm designed to make machine learning transparent, but in 2015 returned to the machine learning project.

“A typical use case for machine learning in the business world is prediction,” he said, “predicting demand of a product to optimize inventory, or predicting the failure of a sensor in a factory to allow preventive maintenance, or scoring a list of possible customers.”

“The first step in developing a machine learning model for prediction is feature engineering—looking at historical patterns and coming up with hypotheses,” he says. Feature engineering generally requires a team of people with a multitude of skill sets—data scientists, SQL experts, analysts, and domain experts. Typically, only after this team comes up with a set of hypotheses does machine learning step in, combining all those hypotheses to figure out how to best weigh them to come up with accurate predictions.

In dotData’s system, AI takes over that first step, coming up and testing its own hypotheses from a set of historical data.

So, he says, “you don’t need domain experts or data scientists, and as a subproduct AI can explore many more hypotheses than human experts—millions instead of hundreds in a limited time window.”

Fujimaki’s group at NEC in 2016 let Japan’s Sumitomo Mitsui Banking Corp. (SMBC) test a prototype against a team using traditional data science tools. “Their team took three months, our process took a day, and our results were better,” he says. NEC spun off the group in early 2018, remaining as a shareholder. Right now DotData has about 70 employees, about 70 percent of those are engineers and data scientists, along with a few dozen customers, Fujimaki says.

“In the near future,” Fujimaki says, “80 percent of machine learning projects can be fully automated. That will free up the most skilled, computer-science-PhD-type of data scientists, to focus on the other 20 percent.”

Demand for data scientists overall won’t drop from what it is today, Fujimaki predicts, though the double-digit growth may slow. The job, however, will become more focused. “Data scientists today are expected to be superman, good at too many things—statistics, and machine learning, and software engineering.”

And a new role is likely to emerge, he predicts. “Call it the business data scientist, or the citizen data scientist. They aren’t machine learning people, they are more business oriented. They know what predictions they need, and how to use those predictions in their business. It will be useful for them to have basic knowledge of statistics, and to understand data structures, but they won’t need deep mathematical understanding or knowledge of programming languages.

“We can’t eliminate the skill barrier, but we can significantly lower it. And here will be many more potential people who will be able to do this.”

The post DotData’s AI Builds Machine Learning Models All by Itself appeared first on Artificial Intelligence.