Active Sampling for Computer Vision

Yusuf Duman

doi:10.15126/thesis.900787

Mammalian vision systems do not view an entire scene in one go. Instead, rapid eye movements known as saccades point the high density areas of photoreceptors in the retina toward areas of detail. Consequently, a detailed view of the scene can be built by the brain using a relatively small amount of information. By integrating the imaging in this manner the quality of the visual processing found deeper within the brain is improved as it only has to process the salient details. A scanning pixel camera presents a way of realising this in hardware. A low cost, low power sensor system that builds up an image of a scene by rapidly sampling a sensor that sits behind a moveable set of optics. Advances in micro-actuation allows the low-cost optics to be scanned across the scene in a programmable manner. This can lead to the lens-less zooming effects by simply varying the scan speed or the sample rate. Furthermore, the amount of information that this type of sensor provides can be varied by simply changing the scan pattern. However, a major drawback of this type of sensor system is that it takes a long time to image a full scene when compared to a traditional CCD camera. This motivates the work of this thesis to find a scan pattern that allows the best use of the saccade-like behaviour of a scanning pixel camera. By focusing on scene details relevant to a predefined computer vision task, this thesis demonstrates that it is possible to produce a scan pattern that allows us to overcome this major issue. In this thesis we provide methods of generating useful sample maps that enhance the abilities of a scanning pixel camera and make it an efficient part of a computer vision pipeline. By actively providing sample patterns to the scanning pixel camera, the sensor becomes an active part of the computer vision system, rather than simply a source of data. This is similar to the purpose of saccades in a mammalian vision system. In doing this we create another challenge that is addressed in this thesis. Namely, the downstream computer vision task has only a partial view of the scene, that may be affected by different types of artefacting found in scanning pixel cameras. Therefore, how do these tasks need to be adapted to deal with data in this form, both during training and inference. This thesis approaches this problem by first making several assumptions about a scanning pixel camera to adapt existing computer vision techniques to find useful sample patterns. These initial assumptions include that scene is static and is imaged with full knowledge of its contents. These are then used to create simple model of an scanning pixel camera to establish the best possible way of generating sampling positions for a downstream task. These assumptions are then progressively removed in order to finally reach a method that can be deployed on a real system. The end result is a technique that requires no prior knowledge of the scene to begin with, forcing the scanning pixel camera to explore the scene before it knows what it is looking at. The sample maps generated are designed to generate images to be used by a downstream computer vision, rather than viewed by a human. To evaluate this we apply this technique to a variety of computer vision tasks and demonstrate that such a piece of hardware can form a useful part of a computer vision system. These tasks include object classification, tracking and instance segmentation.

Active Sampling for Computer Vision

Abstract

Files and links (1)

Metrics

Details

Active Sampling for Computer Vision

Abstract

Files and links (1)

Metrics

Details

Usage Policy