machine learning technology Archives - Artificial Intelligence

Google infuses machine learning into suite of ad tools, takes aim at Amazon

aiuniverse — Wed, 11 Jul 2018 05:31:22 +0000

Source – zdnet.com

Google on Tuesday detailed a new suite of marketing tools that taps into the company’s vast repertoire of machine learning technology. Overall, the tools are aimed at helping marketers create more effective and optimized ads — but there’s a noticeable element of anti-Amazon underlining some of the key products.

Namely, Google’s new Local Campaigns service. The toolset uses machine learning to help drive brick-and-mortar store visits by optimizing ad placement across various Google platforms.

Businesses provide a location and ad, and Google automatically optimizes ads across properties “to bring more customers into your store.” Google said Local Campaigns report on store visits using anonymous, aggregated data from Google users who have signed-in and turned on their location history.

According to Google, people still make the majority of their purchases in physical stores, with mobile searches for nearby locations growing three-fold in the past two years, while “almost 80 percent of shoppers will go in store when there’s an item they want immediately.”

Local shopping is seemingly a direct swipe at Amazon on behalf of local brick-and-mortar retailers. With its focus on optimized ads and marketing campaigns to drive store visits, it’s essentially about everything but Amazon.

In April, Amazon stopped buying product listing ads, or PLAs, from Google’s Shopping ads, signaling that it was eying a push into the digital advertising market. Cut to today, and Google is upping the ante for Amazon’s local competitors.

As for the rest of Google’s new machine learning ad tools, the company also launched Responsive search ads, which use a machine learning ad format that “mixes, matches, and optimizes creative assets in real time to show the best-performing ad for each search query.”

Similarly, Smart Shopping campaigns use machine learning to optimize marketing efforts based on certain criteria and goals. Finally, Maximize Lift for YouTube, already in beta, automatically adjusts bids to optimize ad performance.

The post Google infuses machine learning into suite of ad tools, takes aim at Amazon appeared first on Artificial Intelligence.

Automation, Machine Learning Key to YouTube Clean-Up

aiuniverse — Fri, 04 Aug 2017 09:15:37 +0000

Source – lightreading.com

Responding to concerns from advertisers and politicians, YouTube Inc. has added new measures and improved existing ones to better regulate objectionable content uploaded on its platform. The company detailed new measures in a blog post, chief among which is the use of machine learning and automation to remove objectionable content from the site and also limit access to content that falls into a gray area.

YouTube has been under pressure to clean up hate speech and terrorist-related content on its site. Earlier this year, major global media buyer Havas Mediapulled all advertising off Google (Nasdaq: GOOG) and YouTube. Havas is estimated to spend about £175 million ($230 million) every year on behalf of its clients in the UK. The move followed other major advertisers including the Guardian, the BBC and Transport for London also pulling their advertising. In fact, Google was summoned by government ministers to explain why government advertising was being placed next to extremist content on YouTube.

The Internet giant promised to improve its ad placement, and announced a four-step strategy to combat extremist content a few months ago. These included: better detection and faster removal driven by machine learning, more experts to identify objectionable content, tougher standards for “borderline” videos that are controversial but don’t violate YouTube’s stated policies, and more counter-terrorism efforts.

The challenge for Google/YouTube is that digital advertising is increasingly sold programmatically. This term refers to the automated trading of advertising online. Media companies make advertising slots available via a programmatic system and advertisers and media buyers bid on these slots. The entire process is conducted using digital trading desks that match advertising to buyers using various demographic and contextual criteria.

Unfortunately, this can sometimes result in advertising showing up next to the absolutely wrong video. For example, a toy manufacturer might find its advertisement placed in an adult video, or a government agency could have its message inserted into a video from a hate preacher. This is a huge concern for advertisers. Prior reports found that message from advertisers such as UK broadcasters Channel 4 and the BBC, retailer Argos and cosmetics brand L’Oréal were slotted into extremist content on Google and YouTube.

Google claims it previously removed nearly 2 billion inappropriate advertisements from its platforms, more than 100,000 publishers from its AdSense program and blocked ads from more than 300 million YouTube videos. Examples of inappropriate content that was removed included videos of American white nationalists and extremist Islamist preachers.

Following the theory that the problems created by automation can also be solved by automation, YouTube has invested in using machine learning to try and regulate the content uploaded to the site. The scale of the service makes it impossible for a human-only solution anyway: 400 hours of video are uploaded to YouTube every minute, and 5 billion videos are viewed daily.

The key to Google approach is better detection and faster removal driven by machine learning. It has developed new machine learning technology in-house specifically to identify and remove violent extremism and terrorism-related content “in a scalable way.” These tools have now been rolled-out, and according to the company it is seeing “some positive progress” already.

It cites improvements in speed and efficiency — more than 75% of the videos removed for violent extremism in the previous month were pulled automatically, before being flagged by a single human. The system is also more accurate due to the new machine learning technology, with Google claiming in many cases it has proven more accurate than humans at flagging objectionable videos. And, lastly, Google points out that given the massive volumes of video uploaded to the site every day, it’s a significant challenge to root through them and find the problematic ones. But over the past month, the new machine learning technology has more than doubled not only the number of videos removed, but also the rate at which they have been taken down.

Google is also adding new sources of data and insight to increase the effectiveness of its technology, partnering with various NGOs and institutions through its “Trusted Flagger” program, such as the Anti-Defamation League, the No Hate Speech Movement, and the Institute for Strategic Dialogue. And it’s using the YouTube platform to push anti-extremist messages. When users conduct potentially extremist-related searches on YouTube, they are redirected to a playlist of curated YouTube videos that challenge and debunk messages of extremism and violence.

In addition, Google is targeting videos that are flagged by users as being objectionable but don’t cross the line on hate speech and violent extremism. These videos are placed in what Google calls a “limited state”: they are not recommended, monetized and users cannot like, suggest or add comments to them. This will be rolled out in coming weeks for desktops and subsequently for mobiles.

While Google appears to be making a significant effort to make YouTube less likely to be misused by hate groups, advertisers and government agencies will probably have to see the results to believe them. Still, this is likely to help alleviate some of the concerns that have been building up.

However, these efforts seem aimed only at hate speech and extremism. Google has done little to alleviate concerns from brands about context and placement outside of extremist content. Advertisers are concerned about having their brands appear to sponsor content that could be damaging for their brand image even if the content is not hate speech — like a message from a religious group placed in a video featuring a wild, drunken party.

In the recently concluded “upfronts,” an annual event where advertisers buy inventory upfront for the year from broadcasters, “brand safety” was an important selling point. NBCUniversal ad sales head Linda Yaccarino pretty much led with that in her address, underscoring the benefits of human ad placement in broadcast advertising.

Google — and others such as Facebook and Twitter — will need to develop ways to resolve these advertiser concerns, because the larger brands that control the bulk of advertising expenditure are increasingly worried about where their brands are showing up. If the machine learning technology applied to YouTube is effective then it must be extended to also address objectionable content beyond extremist videos. It should also be able to relate advertiser messages to the videos they are placed in, and create better matches. If Google is able to do that, it will take away one of the most effective selling points from broadcasters, and help shift ad spend towards online video even faster.

The post Automation, Machine Learning Key to YouTube Clean-Up appeared first on Artificial Intelligence.

Facebook coming with ‘modular’ smartphone?

aiuniverse — Tue, 25 Jul 2017 07:46:54 +0000

Source – kashmirmonitor.in

In an application filed with the US Patent and Trademark Office, Facebook is exploring the development of a `modular electromechanical device` (read smartphone) which will allow users to add different components onto a device.
The modular device can incorporate a speaker, microphone, touch display, GPS and function as a phone, Business Insider reported late on Friday.
Facebook`s hardware lab `Building 8` which is focused on developing cutting-edge camera and machine learning technology is working on the project.
The patent noted that millions of devices connected to a server could be loaded with different software based on components that are swapped out.
“Typically, the hardware components included in the consumer electronics that are considered outdated are still usable. However, the hardware components can no longer be re-used since consumer electronics are designed as closed systems. From a consumer prospective, the life cycle of conventional consumer electronics is expensive and wasteful,” the patent read.
The device could be made using `3D printing` technology to function as a phone or a music speaker, the report added.
Not just a `modular` device, Facebook is reportedly foraying into consumer hardware products that may involve next-gen cameras, augmented reality (AR) devices, drones and even a brain scanning technology.
At its `Building 8` facility, the company is working on at least four unannounced consumer hardware products.
Tech giants like Google and Apple have also been exploring this area.
However, Google suspended its ambitious modular `Project Ara` last year.
The Ara team developed a concept design that reimagined the smartphone as a series of smaller, LEGO-style bricks that could be attached, rearranged and swapped out in seconds, media reports has said. ade.

The post Facebook coming with ‘modular’ smartphone? appeared first on Artificial Intelligence.

Technology Requirements for Deep and Machine Learning

aiuniverse — Sat, 15 Jul 2017 07:07:50 +0000

Source – nextplatform.com

Having been at the forefront of machine learning since the 1980s when I was a staff scientist in the Theoretical Division at Los Alamos performing basic research on machine learning (and later applying it in many areas including co-founding a machine-learning based drug discovery company), I was lucky enough to participate in the creation and subsequently to observe first-hand the process by which the field of machine-learning grew to become a ‘bandwagon’ that eventually imploded due to misconceptions about the technology and what it could accomplish.

Fueled by across-the-board technology advances including algorithmic developments, machine learning has again become a bandwagon that is becoming rife with misconceptions coupled with misleading marketing.

That said, the extraordinary capabilities of machine learning technology can be realized by understanding what is marketing fluff and what is real. It is truly remarkable that machines, for the first time in human history, can deliver better than human accuracy on complex ‘human’ activities such as facial recognition, and further that better-than-human capability was realized solely by providing the machine with example data. Significant market applicability means that machine learning, and particularly the subset of the field called deep-learning, is now established and is here to stay.

Understanding key technology requirements will help technologists, management, and data scientists tasked with realizing the benefits of machine learning make intelligent decisions in their choice of hardware platforms. Benchmarking projects like Baidu’s ‘Deep Bench’ also provide valuable insight by associating performance numbers with various hardware platforms.

Understanding what is really meant by ‘deep learning’

Deep learning is a technical term that describes a particular configuration of an artificial neural network (ANN) architecture that has many ‘hidden’ or computational layers between the input neurons where data is presented for training or inference, and the output neuron layer where the numerical results of the neural network architecture can be read. The values of the output neurons provide the information that companies use to identify faces, recognize speech, read text aloud, and provide a plethora of new and exciting capabilities.

Originally ‘deep learning’ was used to describe the many hidden layers that scientists used to mimic the many neuronal layers in the brain. While deep ANNs (DNNs) are useful, many in the data analytics world will not use more than one or two hidden layers due to the vanishing gradient problem. This means some claims about deep-learning capability will not apply to their work.

More recently, the phrase ‘deep learning’ has morphed into a catchphrase that describes the excellent work by many researchers who reinvigorated the field of machine learning. Their deep-learning ANNs have been trained to deliver deployable solutions for speech recognition, facial recognition, self-driving vehicles, agricultural machines that can recognize weeds from produce and much, much, more. Recent FDA approval of a deep-learning product has even opened the door to exciting medical applications.

Unfortunately, the deep-learning catch-phrase is now morphing into the more general and ambiguous term of AI or artificial intelligence. The problem is that terms like ‘learning’ and ‘AI’ are overloaded with human preconceptions and assumptions – and wildly so in the case of AI.

Let’s cut through the marketing to get to the hardware.

Training is not ‘learning’ in the human sense nor is it ‘AI’, it is the numerical optimization of a set of model parameters to minimize a cost function

People use the phrase ‘learn’ when discussing training because we all understand the concept of learning to do something. The danger is that people tend to lose sight of the fact that training is simply the process of fitting a set of model parameters for the ANN (regardless of number of layers) to produce a minimum error on a bunch of examples in a training set.

Unlike humans, ANNs have no concept of a goal or real-world constraints. For example, a project in the 1990s attempted to train an ANN to distinguish between images of a tank vs. a car. A low error was found after training but in the field the real-world accuracy was abysmal. Further investigation found that most of the tank pictures were taken on a sunny day while the pictures of the cars were taken on cloudy days. Thus the network ‘solved’ the optimization problem by distinguishing cloudy vs. sunny days and not cars vs. tanks, (which could have been bad news for people driving on a sunny day).

What is really exciting about machine learning is that once the training examples have been identified, the remainder of the ‘learning’ process becomes a computational problem that does not directly involve people. Thus, faster machines effectively ‘learn’ faster. Given the wide applicability and commercial viability of machine learning, companies such as Intel, NVIDIA, and IBM agree that machine learning will become a dominant workload in the data center in the very near future. Diane Bryant (formerly VP and GM of the Data Center Group, Intel) is well-known for having stated, “By 2020 servers will run data analytics more than any other workload.” In short, big money in the data center is at stake.

Inferencing is a sequential calculation

The payoff is achieved when the ANN is used for inferencing, a term that describes what happens when the ANN calculates the numerical result for a given input using the parameters from the completed training process to perform a task. Inferencing can happen quickly and nearly anywhere – even on low-power devices such as cell phones and IoT (Internet of Things) edge devices to name just two.

From a computer science point of view, inferencing is essentially a sequential calculation* that is also subject to memory bandwidth limitations. It only becomes parallel when many inferencing operations are presented in volume so they can be processed in a batch, say in a data center. In contrast, training is highly parallel as most of the work consists of evaluating a set of training parameters across all the examples.

This serial vs. parallel distinction is important because:

Most data scientists will not need inferencing optimized devices unless they plan perform volume processing of data in a data center. Similarly IoT edge devices, real-time, surveillance, autonomous driving and other verticals will perform sequential rather than massively parallel inferencing.
Inferencing of individual data items will be dominated by the sequential performance of the device. In this case, expect massively parallel devices like accelerators to have poor inferencing performance relative to devices such as CPUs. FPGAs are interesting as they may exhibit some of the lowest inference latencies plus they are field upgradable.

Parallelism speeds training

All hardware on the market uses parallelism to speed training. The challenge then, is to determine what kinds of devices can help us speed training to achieve the shortest ‘time-to-model’.

Each step in the training process simply applies a candidate set of model parameters (as determined by a black box optimization algorithm) to inference all the examples in the training data. The values produced by this parallel operation are then used to calculate an error (or energy) that is used by the optimization algorithm to determine success or calculate the next set of candidate model parameters.

This evaluation can be performed very efficiently using a SIMD (Single Instruction Multiple Data) computational model because all the inference operations in the objective function can occur in lock-step.

Key points:

The SIMD computational model maps beautifully and efficiently to processors, vector processors, accelerators, FPGAs, and custom chips alike. For most data sets, it turns out that training performance is limited by cache and memory performance rather than floating-point capability.
The ability of the hardware to perform all those parallel operations during training depends more on the performance of the cache and memory subsystems that flops/s. Once the memory and cache systems are saturated, any additional floating-point capability is wasted. Customers risk shooting themselves in the foot should they base purchase decisions solely on device specifications that claim high peak floating-point performance.
The training set must large enough to make use of all the device parallelism else performance is wasted. Contrary to popular belief, CPUs can deliver higher training performance than GPUs on many training problems. Accelerators achieve high floating-point performance when they have large numbers of concurrent threads to execute. Thus training with data sets containing hundreds to tens of thousands of examples may utilize only a small fraction of the accelerator parallelism. In such situations, better performance may be achieved on a many-core processor with a fast cache and stacked memory subsystem like an Intel Xeon Phi processor. Thus, it is important to consider how much data will be available for training when selecting your hardware.

Reduced precision and specialized hardware

Vendors are also exploring the use of reduced precision for ANNs because half-precision (e.g. FP16) arithmetic can double the performance of the hardware memory and computational systems. Similarly, using 8-bit math can quadruple that performance.

Unfortunately, basing a purchase decision on reduced-precision floating-point performance is a bad idea because it does not necessarily equate to faster time-to-model performance. The reason is that numerical optimization requires repeated iterations of candidate parameter sets while the training process converges to a solution.

The key word is convergence. Reduced precision can slow convergence to the point where the number of training iterations required to find a solution exceeds the speedup accrued due to the use of reduced precision math. Even worse, the training process can fail to find a solution due to getting stuck in what is known as a false, or local minima due to the reduced-precision math.

Also consider the types of ANNs that will be trained. For example, special-purpose hardware that performs tensor operations at reduced precision benefits only a few, very specific types of neural architectures like convolutional neural networks. It is important to understand if your work requires the use of those specific types of neural architectures.

In general, avoid reduced precision for training as it will likely harm rather than help.** That said, reduced precision can help for many (but not all) inferencing tasks.

Memory capacity and bandwidth are key to calculating gradients for orders of magnitude faster time-to-model runtimes.

Many of the most effective optimization algorithms such as L-BFGS and Conjugate Gradientrequire evaluation of a function to calculate the gradient of the ANN parameters with respect to the objective function.

Use of a gradient provides an algorithmic speedup that can achieve significant – even orders of magnitude – faster time-to-model as well as better solutions than gradient-free methods. Popular software packages such as Theano include the ability to symbolically calculate the gradient through the use of automatic differentiation so native code can be generated, thus getting the gradient function is pretty easy.

The challenge is that size of the gradient gets very large, very fast as the number of parameters in the ANN model increases. This means that memory capacity and bandwidth limitations (plus cache and potentially atomic instruction performance) dominate the runtime of the gradient calculation. ***

Further, it is important to know that the instruction memory capacity of the hardware is large enough to hold the all the machine instructions needed to perform the gradient calculation. The code for the gradient calculation for even modest ANN models can be very, very large.

In both cases the adage from the early days of virtual memory applies, “real memory for real performance”.

Given the dependence of gradient calculations on memory, look for hardware and benchmark comparisons using the stacked memory that is now available on high-end devices and systems with large memory capacities. The performance payback can be significant.

Near-term product expectations

Recent product announcements show that the industry has recognized the need to provide faster memory for both processors and accelerators. Meanwhile custom hardware announcements (namely from Google and Intel Nervana) are raising awareness that custom solutions might leapfrog the performance of both CPUs and GPUs for some ANNs. To utilize custom solutions (ASICS and FPGAs), on-package processor interfaces will be offered on some Intel processor SKUs. These interfaces should provide tight coupling to the custom device’s performance capabilities while acting as the front-end processor to bring custom devices to market. **** However, this is a performance conjecture at this point. Even without specialized hardware, it is expected that the inclusion of the wider AVX-512 vector instructions (and extra memory channel) will more than double both training and inference per core performance on the Intel Skylake processors without requiring an increase in data set size to exploit parallelism. (Using more cores should provide an additional performance increase.) Both Intel Xeon Phi and Intel Xeon (Skylake) product SKUs will offer on-package Intel Omni-Path interfaces, which should decrease system cost and network latency while increasing network bandwidth. This is good news for those who need to train (or perform volume inferencing) across a network or within the cloud. We look forward to validating all these points in practice.

Summary

When evaluating a new hardware platform consider:

Many data scientists don’t need specialized inferencing hardware.
What is the real (not peak) floating-point performance when the calculation is dominated by memory and cache bandwidth performance?
How much parallelism do I really need to train on my data sets (i.e. many-core or massive parallelism)?
Am I paying for specialized hardware I don’t need?
Reduced-precision data types are currently a niche optimization that may never become mainstream for training – although it is an active research area.

Rob Farber is a global technology consultant and author with an extensive background in HPC and in developing machine learning technology that he applies at national labs and commercial organizations. Rob can be reached at info@techenablement.com.

*Some acceleration through parallelism can be achieved during a single inferencing operation, but the degree of parallelism is limited by the data dependencies defined by the ANN architecture and is generally low.

** Current research indicates reduced-precision helps when the matrices are well-conditioned. Generally ANNs are not well-conditioned.

***Chunked calculations can help fit gradient calculations into limited memory devices such as GPUs. However, this introduces inter-device bandwidth limitations such as the PCIe bus which is famous for acting as a bottleneck in accelerated computing.

The post Technology Requirements for Deep and Machine Learning appeared first on Artificial Intelligence.