Abstract
Audio from broadcast soccer can be used for identifying highlights from the game. We can assume that the basic construction of the auditory scene consists of two additive parallel audio streams, one relating to commentator speech and the other relating to audio captured from the ground level microphones. Audio cues derived from these sources provide valuable information about game events, as can the detection of key words used by the commentators, which are useful for identifying highlights. We investigate word recognition in a connected digit experiment providing additive noise that is present in broadcast soccer audio. A limited set of background soccer noises, extracted from the FIFA World Cup 2006 recordings, were used to create an extension to the Aurora-2 database. The extended data set was tested with various HMM and parallel model combination (PMC) configurations, and compared to the standard baseline, with clean and multi-condition training methods. It was found that incorporating SNR and noise type information into the PMC process was beneficial to recognition performance with a reduction in word error rate from 17.5% to 16.3% over the next best scheme when using the SNR information.Future work will look at non stationary soccer noise types and multiple statenoise models.