Book contents
- Frontmatter
- Contents
- Contributors
- Preface
- 1 Scaling Up Machine Learning: Introduction
- Part One Frameworks for Scaling Up Machine Learning
- Part Two Supervised and Unsupervised Learning Algorithms
- Part Three Alternative Learning Settings
- Part Four Applications
- 18 Large-Scale Learning for Vision with GPUs
- 19 Large-Scale FPGA-Based Convolutional Networks
- 20 Mining Tree-Structured Data on Multicore Systems
- 21 Scalable Parallelization of Automatic Speech Recognition
- Subject Index
- References
21 - Scalable Parallelization of Automatic Speech Recognition
from Part Four - Applications
Published online by Cambridge University Press: 05 February 2012
- Frontmatter
- Contents
- Contributors
- Preface
- 1 Scaling Up Machine Learning: Introduction
- Part One Frameworks for Scaling Up Machine Learning
- Part Two Supervised and Unsupervised Learning Algorithms
- Part Three Alternative Learning Settings
- Part Four Applications
- 18 Large-Scale Learning for Vision with GPUs
- 19 Large-Scale FPGA-Based Convolutional Networks
- 20 Mining Tree-Structured Data on Multicore Systems
- 21 Scalable Parallelization of Automatic Speech Recognition
- Subject Index
- References
Summary
Automatic speech recognition (ASR) allows multimedia contents to be transcribed from acoustic waveforms into word sequences. It is an exemplar of a class of machine learning applications where increasing compute capability is enabling new industries such as automatic speech analytics. Speech analytics help customer service call centers search through recorded content, track service quality, and provide early detection of service issues. Fast and efficient ASR enables economic employment of a plethora of text-based data analytics on multimedia contents, opening the door to many possibilities.
In this chapter, we describe our approach for scalable parallelization of the most challenging component of ASR: the speech inference engine. This component takes a sequence of audio features extracted from a speech waveform as input, compares them iteratively to a speech model, and produces the most likely interpretation of the speech waveform as a word sequence. The speech model is a database of acoustic characteristics, word pronunciations, and phrases from a particular language. Speech models for natural languages are represented with large irregular graphs consisting of millions of states and arcs. Referencing these models involves accessing an unpredictable data working set guided by “what was said” in the speech input. The inference process is highly challenging to parallelize efficiently.
We demonstrate that parallelizing an application is much more than recoding the program in another language. It requires careful consideration of data, task, and runtime concerns to successfully exploit the full parallelization potential of an application.
- Type
- Chapter
- Information
- Scaling up Machine LearningParallel and Distributed Approaches, pp. 446 - 470Publisher: Cambridge University PressPrint publication year: 2011