Abstract
This paper presents an approach to large lexicon sign recog- nition that does not require tracking. This overcomes the issues of how to accurately track the hands through self occlusion in unconstrained video, instead opting to take a detection strategy, where patterns of motion are identi ed. It is demonstrated that detection can be achieved with only minor loss of accuracy compared to a perfectly tracked sequence using coloured gloves. The approach uses two levels of classi cation. In the rst, a set of viseme classi ers detects the presence of sub-Sign units of activity. The second level then assembles visemes into word level Sign using Markov chains. The system is able to cope with a large lexicon and is more expandable than traditional word level approaches. Using as few as 5 training examples the proposed system has classi cation rates as high as 74.3% on a randomly selected 164 sign vocabulary performing at a comparable level to other tracking based systems.