Skip to main content Accessibility help
×
Hostname: page-component-77c89778f8-sh8wx Total loading time: 0 Render date: 2024-07-21T17:20:24.016Z Has data issue: false hasContentIssue false

9 - A Fault-Tolerant Parallel Computer

Published online by Cambridge University Press:  03 October 2009

Yuh-Dauh Lyuu
Affiliation:
NEC Research Institute, New Jersey
Get access

Summary

Man has more than twice the power

that he needs to support himself

—Leonardo da Vinci

In this final chapter, we briefly review techniques and concepts in fault-tolerant computing. Then we sketch the design of a fault-tolerant parallel computer, hpc (“hypercube parallel computer”), based on the results and ideas from previous chapters.

Introduction

A fault-free computer, or any human artifact, has never been built, and will never be. No matter how reliable each component is, there is always possibility, however small, that it will go wrong. Statistical principles dictate that, other things being equal, this possibility increases as the number of components increases. Such an event, if not anticipated and safe-guarded against, will eventually make the computer malfunction and lead to anything from small annoyance and inconvenience to disaster.

Recently, the same enormous decrease in hardware cost which makes parallel computers economically feasible also makes fault tolerance more affordable [297]. In other words, the low cost of hardware makes possible both high degree of fault tolerance using redundancy and high performance. Indeed, most fault-tolerant computers today employ multiple processors; see [241, 254, 317] for good surveys.

It is in the light of these backgrounds that we take this extra step toward designing a hypercube parallel computer (HPC for short). In the HPC processors are grouped into logical clusters consisting of physically close processors, and each program execution is replicated at all members of a cluster. Clusters overlap, however. The concept of cluster — logical or physical — introduces a two-level, instead of flat, organization and can be found in, for example, the Cm [344], Cedar [187], and FTPP (“Fault Tolerant Parallel Processor”) [148] computers.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 1993

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×