Abstract
The need for a generic and adaptable object detection and recognition method in static images is becoming a necessity today, given the rapid development of the internet and multimedia databases in general. Computer vision is out-performed by human vision in terms of efficiency, accuracy and depth of understanding, as the computerised recognition is achieved at contextual level, in contrast to human vision that also considers semantics and background knowledge. In order to achieve recognition at a semantic level, computer vision systems must not only be capable of recognising objects, regardless changes in appearance, location, and action, but also be able to interpret the relationship of the object with its surrounding environment, as it appears in the image. This work reviews the state-of-the-art techniques in object recognition and proposes a supervised learning method based on adaptable models for detecting thematic categories of objects. Investigation is performed extracting visual features, which can be used to distinguish the object; and analysis is performed examining the semantic relations, which are embedded within the object’s inner-parts and its surrounding. The systematic identification and measurement of associations between components forming an object and the relationship of the object with its surrounding environment, allows for the automatic generation of object models. This method goes beyond classical image indexing and retrieval techniques, based on visual features only, and introduce semantics as a way of recognising and retrieving objects. The ultimate goal of this work is to develop a knowledge guided object recognition method. The proposed method is able to achieve objects recognition in real life images with reasonable high accuracy (83-87% for foreground objects recognition and 87-94% for background objects recognition). More importantly, semantic reasoning is introduced to guide and assist the recognition process. Instead of relying on purely visual similarity measures, objects are recognised based on their key co-observed components and their relations with the surrounding environment. The ability to filter out the background objects allowed the foreground objects to be recognised with much higher accuracy (10%-15% improvement). This method also improves the recognition speed dramatically, as the method will only be applied on specific areas of the image.