Review: Computer Vision: UAVs and Video Processing

On Monday, April 29, DC ACM was privileged to host Dr. Larry Davis, Professor at the Institute for Advanced Computer Studies and the Department of Computer Science at the University of Maryland. His presentation, “Computer Vision: UAVs and Video Processing,” chronicled just how far the field has come since it began in the mid 20th century.

Dr. Davis shared that computer vision technology was initially motivated by the problems facing the postal service in sorting mail. Optical character recognition was developed for postal service applications to address problems of data segmentation, representation, character recognition, and matching. Through this technology postal services were able to automatically sort addresses and packages of all kinds. Computer vision technology enabled the recognition of cursive, characters, logos, stamps, text, and different forms of handwriting.

Successes in the field of computer vision were achieved in other industries as well. In medicine, the ability of retina scans to detect retina diseases allowed for the early detection and treatment of eye conditions. Computer vision systems succeeded in distinguishing weeds from grass, and in picking and sorting fruits from trees. Food inspection also experienced gains as vision technology enabled x-ray machines to scan potatoes and determine their level of spoilage and density prior to processing.

Since the 20th century many companies have launched and flourished by harnessing the technology of computer vision. VideoSurf, founded in 2006, applies face recognition technology to index movie characters and was acquired by Microsoft in 2011. Pittpatt (Pittsburg Pattern Recognition) develops face recognition software for images and video and was acquired by Google during the same year., which allows users to recognize and “like” everyday consumer items via captured images, was also acquired by Google in 2010.

The U.S. Government has taken the greatest lead in adopting advanced computer vision technology. The Robotic Mule and BigDog Robot are both able to carry heavy military equipment while navigating through tumultuous terrain. Urban scale 3D mapping based on video or LIDAR (a combination of “light” and “radar”) allows the military to reconstruct 3D images with pinpoint accuracy. UAVs (Unmanned Aerial Vehicles), WAMI (Wide Area Motion Imagery), and the VIRAT (Video and Image Retrieval and Analysis Tool) program have all been utilized by the U.S. for military operations. A particularly interesting tool maintained by DARPA is the ARGUS-IS (Autonomous Real-Time Ground Ubiquitous Surveillance Imaging System). This imaging technology can identify objects as little as six inches long from nearly 20,000 feet in the air and has the ability to detect still and moving objects, as well as detect human actions across the range of a large city. The power of this camera is based on a 1.8 gigapixel sensor operating at 10 frames per second.

Despite such technological advances, on a broader scale computer vision technology has progressed more slowly, particularly with regard to image recognition. To address this issue machine learning applications, internet technology, and peer production systems have been leveraged. Peer production systems in particular have been effective in leveraging human intelligence to improve the capability of computer systems. Outstanding examples of peer production dataset collection systems include the ESP game, developed by Luis Von Ahn, LabelMe, developed by William T. Freeman, and Amazon’s Mechanical Terk.

While acknowledging that many challenges still exist within computer vision such as the capacity to accurately analyze scenery and to identify moving objects in rapidly changing environments, Dr. Davis remains optimistic about the future. He pointed to several consumer applications and advertising opportunities, novel smartphone applications such as Leafsnap, and the advent of Google Glass as progressive strides. Other noteworthy uses for computer vision technology include enabling families to remotely monitor their elderly parents while at home; the potential to map the landscape of a damaged area like Haiti or Japan after an earthquake; and the potential for measuring the calorie content of a plate of food simply by capturing its image.

It’s astounding to think that the visually impaired may very soon have access to technology only conceived in the character of Geordi La Forge from Star Trek: The Next Generation. I am excited to see this technology applied across sectors and disciplines to penetrate much needed markets in addition to the more traditional areas of research, defense, and commerce. For instance, in 2010 George Clooney helped to launch the Satellite Sentinel Project, a humanitarian effort using computer vision technology to effectively monitor and report on starvation and warfare in Sudan. This in indicative of just one of the many applications achievable through computer vision technology and of its truly unlimited potential to impact the world around us.