8 - Big data
Published online by Cambridge University Press: 05 June 2016
Summary
The emphasis of this book up to now has been on understanding speech, audio and hearing, and using this knowledge to discern rules for handling and processing this type of content. There are many good reasons to take such an approach, not least being that better understanding can lead to better rules and thus better processing. If an engineer is building a speech-based system, it is highly likely that the effectiveness of that system relates to the knowledge of the engineer. Conversely, a lack of understanding on the part of that engineer might lead to eventual problems with the speech system. However, this type of argument holds true only up to a point: it is no longer true if the subtle details of the content (data) become too complex for a human to understand, or when the amount of data that needs to be examined is more extensive than a human can comprehend. To put it another way, given more and more data, of greater and greater complexity, eventually the characteristics of the data exceed the capabilities of human understanding.
It is often said that we live in a data-rich world. This has been driven in part by the enormous decrease in data storage costs over the past few decades (from something like e100,000 per gigabyte in 1980, e10,000 in 1990, e10 in 2000 to e0.1 in 2010), and in part by the rapid proliferation of sensors, sensing devices and networks. Today, every smartphone, almost every computer, most new cars, televisions, medical devices, alarm systems and countless other devices include multiple sensors of different types backed up by the communications technology necessary to disseminate the sensed information.
Sensing data over a wide area can reveal much about the world in general, such as climate change, pollution, human social behaviour and so on. Over a smaller scale it can reveal much about the individual – witness targeted website advertisements, sales notifications that are driven from analysis of shopping patterns, credit ratings driven by past financial behaviour or job opportunities lost through inadvertent online presence. Data relating to the world as a whole, as well as to individuals, is increasingly available, and increasingly being ‘mined’ for hidden value.
- Type
- Chapter
- Information
- Speech and Audio ProcessingA MATLAB-based Approach, pp. 223 - 266Publisher: Cambridge University PressPrint publication year: 2016