computer vision Archives - Artificial Intelligence

Investment Banking Practice Aimed at Robotics, Automation, AI Launched

aiuniverse — Sat, 17 Oct 2020 06:10:45 +0000

Source: sme.org

SEATTLE – Cascadia Capital said it is launching one of the nation’s first emerging growth investment banking practice groups dedicated to Robotics, Automation, and Artificial Intelligence (RAAI).

Cascadia said its new RAAI group is well-positioned to provide the nuanced M&A and capital raising guidance business owners and entrepreneurs in this sector need as RAAI technology continues to upend critical industries, resulting in permanent market shifts.

The RAAI group is built on the firm’s history of advising companies in the RAAI space through its long-standing Industrials, Energy & Applied Technology, Healthcare, Consumer, and Food & Ag practice groups.

The practice is led by Cascadia Chairman & CEO Michael Butler and Managing Directors Jamie Boyd and Firdaus Pohowalla, with support from Vice Presidents Yee Lee and Jason Lippenberger, Associate Tarek Elmasry, and Analysts Scott Whiting and Mikaela Slade.

“We have been active in the RAAI space for several years with growing excitement and felt that the time was right to formally pull together the firm’s expertise in a way that maximizes the service we deliver to our clients,” Butler said.

“Our bankers have established a deep understanding of the underlying technologies and end markets while simultaneously developing unparalleled relationships with strategic buyers and investors.”

The firm’s most recent RAAI clients include Lucidyne Technologies, Inc., a world-leading manufacturer of AI-enabled computer vision systems; Vexcel Imaging, a leader in digital mapping and drones; and Qi2, a developer of advanced robotic sensor systems for the oilfield. In addition, Cascadia has active engagements with several other current clients in the sector.

The core areas of focus under the RAAI banner include Robotics, Mobility Tech, Manufacturing Automation, Artificial Intelligence/Machine Learning, Computer Vision, Data Management, and Analytics, Augmented Reality/Virtual Reality, and Internet of Things.

The post Investment Banking Practice Aimed at Robotics, Automation, AI Launched appeared first on Artificial Intelligence.

StradVision R&D Director To Reveal Insights Into Optimizing DNNs At 2020 Embedded Vision Summit

aiuniverse — Fri, 18 Sep 2020 10:25:09 +0000

Source: aithority.com

StradVision, whose AI-based camera perception software is a leading innovator in Advanced Driver Assistance Systems (ADAS) and Autonomous Vehicles (AVs), will reveal insights into optimizing Deep Neural Networks (DNNs) for multiple processors at the 2020 Embedded Vision Summit. During the “Designing Bespoke DNNs for Target Hardware” session, which will be held at 9:30am Pacific Time on September 17, R&D Director and Co-founder Woonhyun Nam will also elaborate on StradVision’s patent know-how via its deep learning-based SVNet technology.

The presentation marks the sixth time that StradVision has participated in the pre-eminent conference and expo devoted to practical, deployed computer vision and visual AI.

This year’s conference includes speakers from Google, Intel, Samsung, Qualcomm, and LG Electronics, with a keynote speech from Google’s Distinguished Engineer David Patterson, who is also a professor at the University of California, Berkeley, and the Vice-Chair of the RISC-V Foundation.

During his session, Nam will touch on StradVision’s established patents, including their patented DNN-enabled SVNet software, as well as discuss the challenges and opportunities of Deep Neural Networks, including cost-effective techniques to optimize DNNs to better fit different processors while also reducing model size and power consumption. The effort required to transform the DNN for each processor can be prohibitive for DNN developers, and Nam will explain how quantization and structured sparsification techniques can help by reducing model size and computation significantly.

A Ph.D. holder of Pohang University of Science and Technology’s Computer Vision Lab, Nam will draw from his experience as the head of the algorithm engineering team at StradVision, whose pioneering SVNet software can quickly and accurately identify potentially hazardous objects and road conditions for ADAS assisted and autonomous vehicles, such as other vehicles, lanes, pedestrians, animals, free space, traffic signs, and lights, even in harsh weather conditions or poor lighting.

Nam’s team is responsible for defining SVNet’s technical architecture, deploying new algorithm pipelines, and delivering algorithm solutions best optimized for each hardware target.

SVNet relies on deep learning-based embedded perception algorithms, which compared with its competitors is more compact and requires dramatically less memory and electricity to run. It supports more than 14 hardware platforms thanks to StradVision’s patented and cutting-edge DNN-enabled software.

Attendees will also learn about StradVision’s experience in bringing AI-based camera perception software to the mass market through international collaboration as well as the view from Asia on the future of the ADAS and AV market.

StradVision’s software is currently being deployed in 8.8 million vehicles worldwide, such as SUVs, sedans, trucks, and self-driving buses, and maintains partnerships with leading global automotive Tier 1 suppliers and five of the world’s top auto OEMs. StradVision’s global partners also include NVIDIA, Aisin Group, Hyundai Motor Group, LG Electronics, Texas Instruments, Renesas, Qualcomm, Xilinx, Socionext, Ambarella, and BlackBerry QNX.

StradVision has obtained China’s Guobiao certification and the coveted ASPICE CL2 (Automotive Software Performance Improvement and Capability Determination Containment Level 2) certification and was recently awarded the Grand Prize in the Electric/Electronic Category at the 14th Korea Patent Excellence Awards.

Since 2012, the Embedded Vision Summit has been organized each May in California by Edge AI and Vision Alliance, an industry partnership operated by Berkeley Design Technology, Inc. Participants regularly include industry experts, business leaders, leading academics, investors, and entrepreneurs interested in visual AI.

Due to global health concerns, the 2020 Embedded Vision Summit will be held online from September 15 to 25.

The post StradVision R&D Director To Reveal Insights Into Optimizing DNNs At 2020 Embedded Vision Summit appeared first on Artificial Intelligence.

MACHINE LEARNING MODELS CAN REASON ABOUT DAILY TASKS AND ACTIONS

aiuniverse — Mon, 07 Sep 2020 07:38:50 +0000

Source: analyticsinsight.net

Late advancements in artificial intelligence have recharged interest in building frameworks that learn and think as individuals. Numerous advances have originated from utilizing deep neural networks trained end-to-end in operations, for example, object recognition, video games, and board games, accomplishing tasks that are equal to or even beats people in certain regards. In spite of their biological inspiration and performance achievements, these frameworks are different from human intelligence in essential ways. Cognitive science is growing and proposing human-like learning and thinking machines should reach past current engineering trends in both what they learn, and how they learn it.

Machines should

• form causal models of the universe that help understanding and explanation, as opposed to just tackling pattern recognition problems;

• ground learning in intuitive theories of physics and psychology, to help and improve the information that is found out; and

• Leverage compositionality and figuring out how to figure out learning quickly and sum up knowledge to new assignments and circumstances.

Due to new computing advances, machine learning today isn’t like machine learning of the past. It was conceived from pattern recognition and the theory that PCs can learn without being programmed to perform explicit tasks; scientists intrigued by artificial intelligence needed to check whether computers could gain from data.

The iterative part of machine learning is significant in light of the fact that as models are presented to new information, they can independently adjust. They learn from previous computations to create solid, repeatable decisions and results. It’s a science that is not new – but rather one that has gained new momentum.

In another study at the European Conference on Computer Vision recently, scientists revealed a hybrid language-vision model that can contrast and compare a lot of dynamic occasions caught on video to coax out the elevated level ideas connecting them.

Their model showed improvement over people at two kinds of visual reasoning tasks — picking the video that adroitly best finishes the set, and picking the video that doesn’t fit. Demonstrated videos of a dog barking and a man yelling close to his dog, for instance, the model finished the set by picking the crying child from a set of five recordings. Researchers imitated their outcomes on two datasets for training AI frameworks in action recognition: MIT’s Multi-Moments in Time and DeepMind’s Kinetics.

According to Mathew Monfort, study co-creator and a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), “Language representations permit us to incorporate contextual information learned from text databases into our visual models.”

Words like ‘running,’ ‘lifting,’ and ‘boxing’ share some common characteristics that make them all the more closely related with the idea ‘working out,’ for instance, than ‘driving.’

Utilizing WordNet, a database of word implications, the researchers planned the connection of each action-class label in Moments and Kinetics to different names in both datasets. Words like “sculpting,” “carving,” and “cutting,” for instance, were associated with more significant level ideas like “crafting,” “making workmanship,” and “cooking.” Now when the model perceives an action like sculpting, it can choose reasonably comparative exercises in the database.

To perceive how the model would be compared to people, the researchers requested human subjects play out similar arrangements of visual reasoning tasks online. Amazingly, the model performed as well as people in numerous situations, at times with startling outcomes. In a minor departure from the set completion task, subsequent to viewing a video of somebody wrapping a gift and covering an item in tape, the model recommended a video of somebody at the seashore covering another person in the sand.

Impediments of the model incorporate an inclination to overemphasize a few highlights. In one case, it recommended finishing a lot of sports videos with a video of a child and a ball, obviously connecting balls with exercise and competition.

A deep learning model that can be trained to “think” all the more abstractly might be fit for learning with less information, state analysts. Abstraction additionally prepares toward a more significant level, more human-like thinking.

The post MACHINE LEARNING MODELS CAN REASON ABOUT DAILY TASKS AND ACTIONS appeared first on Artificial Intelligence.

Counting on AI: The right time for researchers to embrace Artificial Intelligence

aiuniverse — Wed, 22 Jul 2020 09:02:40 +0000

Source: dqindia.com

While Artificial General Intelligence, or “singularity” as they call it, may be decades away, we have already reached a point where AI can significantly augment our intelligence and help us achieve better outputs at a faster pace.

As of today, there is no area where AI has not been proven useful. From playing games to flying airplanes and from detecting cancers to automatically cleaning up selfie-portraits, AI has made its presence felt in all domains.

There are over 8 million active researchers who collectively spend over $1.5 trillion on academic research, with the promise of advancing the world’s combined knowledge and intellect.

The right AI–powered tools and techniques can make a significant difference in how research is conducted and how fast results are obtained.

Until a year ago, the general public may not have understood or even paid much heed to the need for speed and accuracy when it comes to research. But because of the recent COVID-19 situation, many are recognizing and feeling the pain of the pace of research in the race to find an antiviral drug or a vaccine.

While some of the results of academic research are celebrated, it is easy to forget the countless steps and processes behind the scenes, which last many months before any results are achieved; more often than not, the results of research aren’t always revolutionary or directly useful.

One of the early stages of the research lifecycle is discovery. On average, researchers spend 4 hours every week searching through research and 5 hours reading articles, with only 50% of the articles being useful. Here, AI can come in to help researchers discover the right articles to read.

There are many tools out therethat are powered by natural language processing and search based on machine–learned concepts, which help researchers narrow down their reading and discover the relevant research much faster.

The next stage is the actual research, which consists of gathering data; running experiments based on various hypotheses; collecting, analyzing, and representing the research outputs; and arriving at the conclusions.

For the above steps, many AI open–source tools, such as Python, R, Pandas, Scikit, and Spark, as well as proprietary AI tools like Mathematica, Matlab, and SAS can be very useful, especially when directed toward statistical machine learning.

Many research labs are making use of advanced AI streams such as computer vision, robotic arms, IOT, and speech and audio to assist them in the research process.

Finally, the most important stage for researchers is the publication and dissemination of their research—the tedious and time–consuming albeit critical final step of the process.

While there are editing services that exist to help with manuscript preparation, formatting, and language correction, there are many AI tools out there that can be used by researchers, which help with writing manuscripts, correcting grammar and language, and formatting them as per target journal standards, in addition to automated solutions for styling figures, tables, captions, and citations.

Pub-sure.com is an online suite of assistive tools that helpsresearchers make their manuscripts publication–ready.

Since its inception, Cactus has been partnering with researchers to assist them in their research journey. It has been our constant endeavor to enable researchers and innovators to find analogous concepts and novel ideas from different industries and fields.

We are excited to have entered the AI and deep–learning space as well, as the need of the hour is to develop innovative products for publishers as well as business and tech solutions for stakeholders in the research landscape.

With powerful initiatives like researcher.life, our aim is to put the researcher at the center of research. We have already developedseveral AI–powered tools that help researchers focus on their main work, the research. As a community, however, we still have a long way to go before AI is fully integrated in the researcher’s ecosystem.

The post Counting on AI: The right time for researchers to embrace Artificial Intelligence appeared first on Artificial Intelligence.

Shiny objects foil robots, but RGB-D holds the key

aiuniverse — Wed, 22 Jul 2020 07:44:22 +0000

Source: zdnet.com

Who doesn’t love shiny things? Well… robots for one. The same goes for transparent objects.

At least, that’s long been the case. Machine vision has stumbled when it comes to shiny or reflective surfaces, and that’s limited use cases for automation even as advances in the field push robots into more and more new spaces.

Now, researchers at robotics powerhouse Carnegie Mellon report success with a new technique to identify and grasp objects with troublesome surfaces. Rather than relying on expensive new sensor technology or intensive modeling and training via AI, the system instead goes back to basics, relying on a simple color camera.

To understand why it’s necessary to understand how robots currently sense objects prior to grasping. Cutting edge computer vision systems for pick-and-place applications often rely on infrared cameras, which are great for sensing and precisely measuring the depth of an object — useful data for a robot devising a grasping strategy — but fall short when it comes to visual quirks like transparency. Infrared light passes right through clear objects and is reflected and scattered by reflective surfaces.

Color cameras, however, can detect both. Just look at any color photo and you’ll clearly discern a glass on a table or a shiny metal railing, each with lots of rich detail. That was the vital clue. The CMU researchers built on this observation and developed a color camera system capable of recognizing shapes using color and, crucially, sensing transparent or reflective surfaces.

“We do sometimes miss,” David Held, an assistant professor in CMU’s Robotics Institute, acknowledged, “but for the most part it did a pretty good job, much better than any previous system for grasping transparent or reflective objects.”

That the solution is low-cost and the sensors battle-tested give it a tremendous leg up when it comes to the potential for adoption. The researchers point out that other attempts at robotic grasping of transparent objects have relied on training systems based on trial and error or on expensive human labeling of objects.

In the end, it’s the end, it’s not new sensors, but new strategies to use them that may give robots the powers they need to function in everyday life.

The post Shiny objects foil robots, but RGB-D holds the key appeared first on Artificial Intelligence.

Google Open-Sources Computer Vision Model Big Transfer

aiuniverse — Wed, 10 Jun 2020 07:27:35 +0000

Source: infoq.com

Google Brain has released the pre-trained models and fine-tuning code for Big Transfer (BiT), a deep-learning computer vision model. The models are pre-trained on publicly-available generic image datasets and can meet or exceed state-of-the-art performance on several vision benchmarks after fine-tuning on just a few samples.

Paper co-authors Lucas Beyer and Alexander Kolesnikov gave an overview of their work in a recent blog post. To help advance the performance of deep-learning vision models, the team investigated large-scale pre-training and the effects of model size, dataset size, training duration, normalization strategy, and hyperparameter choice. As a result of this work, the team developed a “recipe” of components and training heuristics that achieves strong performance on a variety of benchmarks, including an “unprecedented top-5 accuracy of 80.0%” on the ObjectNet dataset. Beyer and Kolesnikov claim,

[Big Transfer] will allow anyone to reach state-of-the-art performance on their task of interest, even with just a handful of labeled images per class.

Deep-learning models have made great strides in computer vision, particularly in recognizing objects in images. One key to this success has been the availability of large-scale labelled datasets: collections of images with corresponding text descriptions of the objects they contain. These datasets must be created manually, with human workers applying a label to each of thousands of images: the popular ImageNet dataset, for example, contains over 14 million labeled images containing 21k different object classes. However, the images are usually generic, showing commonplace objects such as people, pets, or household items. Creating a dataset of similar scale for a specialized task, say for an industrial robot, might be prohibitively expensive or time-consuming.

In this situation, AI engineers often apply transfer learning, a strategy that has become popular with large-scale natural-language processing (NLP) models. A neural network is first pre-trained on a large generic dataset until it achieves a certain level of performance on a test dataset. Then the model is fine-tuned with a smaller task-specific dataset, sometimes with as few as a single example of the task-specific objects. Large NLP models routinely set new state-of-the-art performance levels using transfer learning.

For BiT, the Google researchers used a ResNet-v2 neural architecture. To investigate the effects of pre-training dataset size, the team replicated their experiments on three groups of models pre-trained with different datasets: BiT-S models pre-trained on 1.28M images from ILSVRC-2012, BiT-M models pre-trained on 14.2M images from ImageNet-21k, and BiT-L models pre-trained on 300M images from JFT-300M. The models were then fine-tuned and evaluated on several common benchmarks: ILSVRC-2012, CIFAR-10/100, Oxford-IIIT Pet, and Oxford Flowers-102.

The team noted several findings from their experiments. First, the benefits from increasing model size diminish on smaller datasets, and there is little benefit in pre-training smaller models on larger datasets. Second, the large models performed better using group normalization compared to batch normalization. Finally, to avoid an expensive hyperparameter search during fine-tuning, the team developed a heuristic called BiT-HyperRule, where all hyperparameters are fixed except “training schedule length, resolution, and whether to use MixUp regularization.”

Google has released the best-performing pre-trained models from the BiT-S and BiT-M groups. However, they have not released any of the BiT-L models based on the JFT-300M dataset. Commenters on Hacker News pointed out that no model trained on JFT-300M has ever been released. One commenter pointed to several models released by Facebook which were pre-trained on an even larger dataset. Another said:

I’ve wondered if legal/copyright issues block any release: there’s always someone who tries to argue that a model is a derived work, and nothing in the JFT-300M papers mentions having licenses covering public redistribution.

The code for fine-tuning and tutorials for using the released pre-trained models are available on GitHub.

The post Google Open-Sources Computer Vision Model Big Transfer appeared first on Artificial Intelligence.

Deep learning and AI drives ‘computer vision’ market

aiuniverse — Tue, 26 May 2020 06:48:46 +0000

Source: gadget.co.za

Computer vision (CV) is aiding numerous promising applications and revolutionizing the way people do things. According to new research from Omdia, the proof-of-concepts that started a few years ago are now going into production, with deployments across a wide range of applications.

Many businesses are realizing the value of vision and are starting to determine how to use CV technology for commercial benefit. Technology is evolving to address these needs. According to Omdia principal analyst Anand Joshi, “Deep learning has become the technology of choice for computer vision applications replacing classic computer vision techniques. This, in turn, is driving the need for new chipsets and software for CV applications.” Global CV market revenue is expected to grow from $2.9bn in 2018 to $33.5bn by 2025.

Omdia’s report, “Computer Vision Technologies and Market,” analyzes the market trends and technology issues surrounding CV. The study examines the industries and use cases and provides profiles of key industry players. Global market forecasts, segmented by region, industry, and use case, extend through 2025. An Executive Summary of the report is available for free download on the firm’s website.

The post Deep learning and AI drives ‘computer vision’ market appeared first on Artificial Intelligence.

What’s the point: Google AI advances, Git security updates, and API gateway Gloo

aiuniverse — Sat, 18 Apr 2020 08:45:27 +0000

Source: devclass.com

Google’s AI teams used the comparatively quiet post-easter days to get ML practitioners up to speed with their latest research in reinforcement learning, natural language processing, and computer vision.

In “An optimistic perspective on offline reinforcement learning”, a team of researchers has looked into ways to use a fixed offline dataset of logged interactions to teach agents how to handle themselves in real world situations. While agents normally learn by getting live feedback from their environment, this approach is meant to be useful in certain robotics use cases or autonomous driving, where enough recorded interaction data is available and other ways of collecting information either seem insufficient or are too expensive to realise.

These are usually seen as tricky to implement, since there’s no real way of knowing how an agent should be rewarded when it takes an action that differs from the dataset provided. To tackle that, Google’s AI team added some supervised learning methods into the mix which helps to improve generalisation and make the whole system more robust. The results are called Ensemble-DQN and Random Ensemble Mixture and can be investigated here.

Meanwhile another team has been busy improving the way objects are detected. The outcome has been dubbed EfficientDet and will be presented at the renowned computer vision conference CVPR in Seattle in June – if COVID refrains from putting a spoke in their wheel. It aims at introducing a “new family of scalable and efficient object detectors” to the computer vision community, building upon earlier work concerning the scaling of neural networks (EfficientNet).

In EfficientDet, EfficientNet is used as a backbone to more effectively extract features from images, while a new bi-directional feature network in combination with a fresh normalised fusion technique is meant to get to image characteristics faster at a lower computation cost.

If you’re more interested in NLP, Google has also been busy setting up a benchmark to make comparing multilingual representations easier. XTREME covers 40 languages from 12 language families and includes nine tasks ranging from sentence classification to question answering to evaluate methods making the most of the shared structures of languages. The project can be found at GitHub.

Git pushes out security updates to stop tricksters

This week, Git maintainer Junio C Hamano has unleashed versions v2.26.1, v2.25.3, v2.24.2, v2.23.2, v2.22.3, v2.21.2, v2.20.3, v2.19.4, v2.18.3, and v2.17.4 of the version control system onto the coding masses.

Updating is strongly advised, since the security fixes mediate an issue which “allowed a crafted URL to trick a Git client to send credential information for a wrong host to the attacker’s site”.

Gloo lures admins with new dev portal

Envoy-based API gateway Gloo hit version 1.3 earlier this week, focusing on performance, stability and extensibility improvements. However, the release also includes a developer portal, so that admins have an easier way of controlling who gets access to which APIs.

Once set up, they can select which interfaces should be shared at all and decide which users and groups get to see them once they’ve logged into the portal. The whole apparatus is designed for self-service, with the Gloo team promising easy integration into continuous delivery processes.

The post What’s the point: Google AI advances, Git security updates, and API gateway Gloo appeared first on Artificial Intelligence.

I Wizards- A Complete Solution OF Artificial Intelligence & Computer Vision

aiuniverse — Mon, 23 Mar 2020 07:48:07 +0000

Source: inventiva.co.in

There is a great importance of CCTV cameras in every place, be it a public place or a working place or home itself, every workplace and public place has CCTV cameras installed in it. CCTV cameras not only ensure safety but also provides peace of mind to the owner when he/she is absent from the office. There is no doubt in saying this that every place where CCTV camera is installed it acts as an severe deterrent to the criminals or anyone carting out illegal activity. In offices also it is always suggested to keep a record of your employees that at what time they are coming and leaving even, it helps to keep a record of visitors in the office.

It is essential to monitor each and everything happening in your office, as an entrepreneur one should always keep a check on what is going on in the office and what has to be done. Well talking of monitoring, records, and operating, it is not always possible for the owner and the employees to check CCTV footage every five to ten minutes, so to serve this purpose, one can have artificial intelligence to make their work effective and efficient.

Integration wizards solution has made it possible now this organization is a platform for artificial intelligence for CCTV cameras. The company is based in Bangalore, India that offers Computer Vision and Enterprise Mobility based solutions. It helps enterprises determine actionable intelligence from images, videos, and data captured live.

What is unique about I wizards?

Almost all the companies have a CCTV network, but only a few know its worth and importance. I wizards change this passive asset into an active solution by their flagship product, IRIS it is a smart solution that uses the existing CCTV network to provide customized solutions to enterprises. It provides insights into the areas that are otherwise opaque to organizations. From retail, logistics to warehouses or solar farms, it is easily installed anywhere to tackle various aspects of manufacturing, such as safety compliance, process optimization in warehouses, etc. It is also being deployed for outdoor security solutions and retail for multiple use cases.

Powered by artificial intelligence, the company’s real-time computer vision-based product IRIS analyzes and understands images on an advanced level. This involves everything from object recognition (think security systems that alert authorities after “seeing” a fire or an intruder) to navigation mapping (similar to GPS tracking with google maps), emotional recognition (now you don’t have to depend on manual customer feedback!), to posture recognition (useful in manual handling in industries), OCR and much more.

About the CEO and co-founder:

Co-founder and CEO, Kunal Kislay, is a young, determined entrepreneur who, along with two of his colleagues, Saquib Khan and Kumar Raman, bootstrapped a start-up that has been profitable from its inception, and now has an annual revenue of $3.2 million.

Kunal is a B.Tech IIT Mumbai alumnus with over a decade of experience in enterprise mobility, which inspired him to delve into the then futuristic domain further – Internet of Things, AI, Neural networks and Machine learning. He knew the pulse of technology and its changing paradigms. With the insight gained from the vast array of verticals he worked with and solutions he created, it was simple for him to realize that the future lies in the domain of Artificial Intelligence and Computer vision.

In the next few years, the aim of the company is expanding our installed base in warehouses, manufacturing premises and retail outlets at present.

The post I Wizards- A Complete Solution OF Artificial Intelligence & Computer Vision appeared first on Artificial Intelligence.

Facebook AI Releases New Computer Vision Library Detectron2

aiuniverse — Tue, 29 Oct 2019 06:59:45 +0000

Source: infoq.com

Facebook AI Research (FAIR) has released Detectron2, a PyTorch-based computer vision library that brings a series of new research and production capabilities to the popular framework.

Since its release in 2018, the original Detectron object detection platform has become one of FAIR’s most widely adopted open-source projects. While the first Detectron was written in Caffe2, Detectron2 represents a full rewrite of the original framework in PyTorch from the ground up, with several new object detection capabilities.

Detectron was, at the time of its initial release, a huge boost for the AI community. It enabled many to quickly and easily build state-of-the-art object detection models. Yet Detectron was stuck with a few limitations — limitations that quickly became deal-breakers for many AI practitioners.

Caffe2 was complicated, so implementing custom object detection models was a big challenge
From the beginning, Detectron was only designed for object detection and no other computer vision tasks such as semantic segmentation or pose estimation
With deploying machine learning models to production becoming a hot topic over the past couple of years, Detectron quickly fell behind as it lacked the capability for exporting inference models

Detectron2 was built to tackle those deal-breakers, making for a more robust and modern library. From the Detectron2 team at FAIR:

We built Detectron2 to meet the research needs of Facebook AI and to provide the foundation for object detection in production use cases at Facebook. We are now using Detectron2 to rapidly design and train the next-generation pose detection models that power Smart Camera, the AI camera system in Facebook’s Portal video-calling devices. By relying on Detectron2 as the unified library for object detection across research and production use cases, we are able to rapidly move research ideas into production models that are deployed at scale.

The move to PyTorch aligns with the AI community’s growing need and desire for a flexible yet easy-to-use library. PyTorch itself is modular by design, making it far easier to extend than Caffe2. The vast majority of the AI community already uses just two libraries: TensorFlow and PyTorch.

Detectron2 has expanded to handle computer vision tasks beyond object detection including semantic segmentation, panoptic segmentation, pose estimation, and DensePose. The authors have made a noticeable effort to add pre-trained state-of-the-art models like Cascade R-CNN, Panoptic FPN, and TensorMask.

FAIR’s team hinted in their official blog post that they’re planning to release an additional component to the library, Detectron2go, to make it easier to deploy models to production. It’s said to include features like network quantization, model optimization, and formatting for mobile deployment.

The post Facebook AI Releases New Computer Vision Library Detectron2 appeared first on Artificial Intelligence.