Better Transcription of UK Supreme Court Hearings

Hadeel Saadany; Catherine Breslin; Constantin Orăsan; Sophie Walker

doi:10.48550/arxiv.2211.17094

Back

Preprint

Better Transcription of UK Supreme Court Hearings

Hadeel Saadany, Catherine Breslin, Constantin Orăsan and Sophie Walker

arXiv (Cornell University)

29/11/2022

DOI: https://doi.org/10.48550/arxiv.2211.17094

Abstract

Computer Science - Computation and Language

Computer Science - Sound

Transcription of legal proceedings is very important to enable access to justice. However, speech transcription is an expensive and slow process. In this paper we describe part of a combined research and industrial project for building an automated transcription tool designed specifically for the Justice sector in the UK. We explain the challenges involved in transcribing court room hearings and the Natural Language Processing (NLP) techniques we employ to tackle these challenges. We will show that fine-tuning a generic off-the-shelf pre-trained Automatic Speech Recognition (ASR) system with an in-domain language model as well as infusing common phrases extracted with a collocation detection model can improve not only the Word Error Rate (WER) of the transcribed hearings but avoid critical errors that are specific of the legal jargon and terminology commonly used in British courts.

Metrics

18 Record Views

Details

Title: Better Transcription of UK Supreme Court Hearings
Creators: Hadeel Saadany - University of Surrey, School of Literature and Languages
Catherine Breslin - Kingfisher Labs Ltd,United Kingdom
Constantin Orăsan - University of Surrey, School of Literature and Languages
Sophie Walker - Just Access, United Kingdom
Publication Details: arXiv (Cornell University)
Identifiers: 99783689802346
Academic Unit: School of Literature and Languages
Language: English
Resource Type: Preprint

Better Transcription of UK Supreme Court Hearings

Abstract

Metrics

Details

Usage Policy