Abstract
Durations of real speech segments do not generally exhibit exponential distributions, as modelled implicitly by the state transitions of Markov processes. Several duration models were considered for integration within a segmental-HMM recognizer: uniform, exponential, Poisson, normal, gamma and discrete. The gamma distribution fitted that measured for silence best, by an order of magnitude. Evaluations determined an appropriate weighting for duration against the acoustic models. Tests showed a reduction of 2 % absolute (6+ % relative) in the phone-classification error rate with gamma and discrete models; exponential ones gave approximately 1 % absolute reduction, and uniform no significant improvement. These gains in performance recommend the wider application of explicit duration models.