Abstract
Object based video representation is an essential step towards multimedia communications. Using video objects has many advantages including content based compression, editing and manipulation. The MPEG-4 standard is a black box definition for multi-media video - it defines how the video should be coded but does not specify how the object representation is obtained. Accurate video segmentation is a very demanding problem due to the vast number of possible combinations of segmentation criteria and input data. Multimedia applications are also so numerous that any object segmentation system should be robust and use only general constraints from very limited prior knowledge. Motion estimation using robust statistical analysis has been used to find object motion that is minimally biased by other objects and noise. A higher order search is shown to converge on the estimate in less iterations than other searches and a data "reliability" weighted search has been proposed to eliminate less meaningful data points as a route to further speed gains. A directional approach to optical flow segmentation using iterative motion merging via model selection is used to find objects conforming to a planar facet model. This allows mosaics to be generated of objects for finding occlusions. Novel techniques are proposed to speed up the alignment of images in the mosaic that is required for dealing with the problem of accumulated errors, particularly with longer video sequences. A new shape adaptive phase correlation technique is proposed to assist dealing with object based motion estimation involving large displacements. The algorithms and methods developed in this thesis provide a tool box to produce a multimedia video data structure that fits an MPEG-4 syntax, an essential criterion for acceptance in multimedia communications. The number of arbitrary set thresholds are minimised to a few insensitive parameters and should be image sequence independent.