Abstract
This thesis deals with the problem of estimating structure in data due to the semantic relations between data elements and leveraging this information to learn a visual model for category recognition. A visual model consists of dictionary learning, which computes a succinct set of prototypes from training data by partitioning feature space, and feature encoding, which learns a representation of each image as a combination of dictionary elements. Besides variations in lighting and pose, a key challenge of classifying a category is intra-category appearance variation. The key idea in this thesis is that feature data describing a category has latent structure due to visual content idiomatic to a category. However, popular algorithms in literature disregard this structure when computing a visual model. Towards incorporating this structure in the learning algorithms, this thesis analyses two facets of feature data to discover relevant structure. The first is structure amongst the sub-spaces of the feature descriptor. Several subspace embedding techniques that use global or local information to compute a projection function are analysed. A novel entropy based measure of structure in the embedded descriptors suggests that relevant structure has local extent. The second is structure amongst the partitions of feature space. Hard partitioning of feature space leads to issues of uncertainty and plausibility in the assignment of descriptors to dictionary elements. To address this issue, novel fuzzy logic based dictionary learning and feature encoding algorithms are employed that are able to model the local feature vectors distributions and provide performance benefits. To estimate structure amongst sub-spaces, co-clustering is used with a training descriptor data matrix to compute groups of sub-spaces. A dictionary learnt on feature vectors embedded in these multiple sub-manifolds is demonstrated to model data better than a dictionary learnt on feature vectors embedded in a single sub-manifold. In a similar manner, co-clustering is used with encoded feature data matrix to compute groups of dictionary elements - referred to as ‘topics’. A topic dictionary is demonstrated to perform better than a regular dictionary of comparable size. Both these results suggest that the co-clustered groups of sub-spaces and dictionary elements have semantic relevance. All the methods developed here have been viewed from the unifying perspective of matrix factorization, where a data matrix is decomposed to two matrices which are interpreted as a dictionary matrix and a co-efficient matrix. Sparse coding methods, which are currently enjoying much success, can be viewed as matrix factorization with a regularization constraint on the dictionary or co-efficient matrices. With regards to sub-space embedding, the sparse principal component analysis is one such method that induces sparsity amongst the sub-spaces selected to represent each descriptor. Similarly, a sparsity inducing regularization method called Lasso is used for feature encoding, which uses only a sub-set of dictionary elements to represent each image. While these methods are effective, they disregard structure in the data matrix. To improve on this, structured sparse principal component analysis is used in conjunction with co-clustered groups of sub-spaces to induce sparsity at group level. The resultant structured sparse sub-manifold dictionary is demonstrated to provide performance benefits. In a similar manner, group Lasso is used with co-clustered groups of dictionary elements to induce sparsity in terms of topics. The structured sparse encoding is demonstrated to improve aggregate performance in comparison to a regular sparse coding. In conclusion, this thesis estimates structure in descriptor sub-spaces and learnt dictionary, uses co-clustering to compute semantically relevant sub-manifolds and topic dictionary, and finally incorporates the estimated structure in sparse coding methods, demonstrating performance gain.