Introduction to Parallel Computing

Zbigniew J. Czech

doi:10.1017/9781316795835

References

Ackerman, W. B. 1982. “Dataflow Languages.” IEEE Computer 15(2): 15–25.

Adiga, N. R., Blumrich, M. A., Chen, D., et al. 2005. “Blue Gene/L Torus Interconnection Network.” IBM Journal of Research and Development 49 (2/3): 265–276.

Adve, S. V. and Boehm, H. J.. 2011. “Memory Models.” In Encyclopedia of Parallel Computing, vol. 3, edited by Padua, D. A. (New York: Springer-Verlag), 1107–1110.

Adve, S. V. and Gharachorloo, K.. 1996. “Shared Memory Consistency Models: A Tutorial.” IEEE Computer 29 (12): 66–76.

Agarwal, A. 1991. “Limits on Interconnection Network Performance.” IEEE Transactions on Parallel and Distributed Systems 2 (4): 398–412.

Agerwala, T. and Arvind, N. I.. 1982. “Data Flow Systems: Guest Editor's Introduction.” Computer 15 (2): 10–13.

Aho, A. V., Hopcroft, J. E., and Ullman, J. D.. 1974. The Design and Analysis of Computer Algorithms. Boston, MA: Addison-Wesley.

Ajima, Y., Sumimoto, S., and Shimizu, T.. 2009. “A 6D Mesh/Torus Interconnect for Exascale Computers.” Computer 42 (11): 36–40.

Ajtai, M., Komlós, J., and Szemerédi, E.. 1983. “Sorting in c log(n) Parallel Steps.” Combinatorica 3: 1–19.

Akers, S. B. and Krishnamurthy, B.. 1989. “A Group-theoretic Model for Symmetric Interconnection Networks.” IEEE Transactions on Computers 38 (4): 555–566.

Akl, S. G. 1989. The Design and Analysis of Parallel Algorithms. Englewood Cliffs, NJ: Prentice Hall.

Akl, S. G. 1997. Parallel Computation. Models and Methods. Upper Saddle River, NJ: Prentice Hall.

Alexander, M. and Gardner, W., eds. 2009. Process Algebra for Parallel and Distributed Processing. Boca Raton, FL: Chapman & Hall/CRC.

Alexandrov, A., Ionescu, M. F., Schauser, K. E., and Scheiman, C. 1995. “LogGP: Incorporating Long Messages into the LogP Model.” Proc. 7th ACM Symposium on Parallel Algorithms and Architectures, Santa Barbara, CA, 95–105.

Allen, R. and Kennedy, K.. 2002. Optimizing Compilers for Modern Architectures. San Francisco, CA: Morgan Kaufman.

Alt, H., Hagerup, T., Mehlhorn, K., and Preparata, F. P.. 1987. “Simulation of Idealized Parallel Computers on More Realistic Ones.” SIAM Journal on Computing 16 (5): 808–835.

Amdahl, G. 1967. “Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities.” AFIPS Conference Proc., vol. 30. Washington D.C.: Thompson Books, 483–485.

Anaratone, M., Arnould, E., Gros, T., et al. 1986. “Warp Architecture and Implementation.” Proc. of 13th Annual International Symposium on Computer Architecture, Computer Science Press, Tokyo, 346–356.

Anderson, D. P., Cobb, J., Korpela, E., et al. 2002. “SETI@home. An Experiment in Public-resource Computing.” Communications of the ACM 45 (11): 56–61.

Anderson, T. E., Culler, D. E., and Patterson, D.. 1995. “A Case for NOW (Networks of Workstations).” IEEE Micro 15 (1): 54–56.

Andrews, G. R. 1991. Concurrent Programming: Principles and Practice. Menlo Park, CA: Benjamin/Cummings.

Andrews, G. R. 2000. Foundations of Multithreaded, Parallel, and Distributed Programming. Reading, MA: Addison-Wesley.

Apt, K. R. and Olderog, E-R.. 1991. Verification of Sequential and Concurrent Programs. New York: Springer-Verlag.

Arvind, N. I. and Culler, D. E.. 1986. “Dataflow Architectures.” Annual Review of Computer Science, vol. 1: 225–253.

Arvind, N. I., Gostelow, K. P., and Plouffe, W.. 1978. The ID-Report: An Asynchronous Programming Language and Computing Machine. Technical Report, 114. University of California at Irvine.

Nikhil, R. S. 1990. “Executing a Program on the MIT Tagged-token Dataflow Architecture.” IEEE Transactions on Computers 39 (3): 300–318.

Attiya, H. and Welch, J.. 1998. Distributed Computing: Fundamentals, Simulations and Advanced Topics. London: McGraw-Hill.

Augen, J. 2002. “The Evolving Role of Information Technology in the Drug Discovery Process.” Drug Discovery Today 7 (5): 315–323.

Baase, S. 1988. Computer Algorithms: Introduction to Design and Analysis. Boston, MA: Addison-Wesley.

Bacon, J. and Harris, T.. 2003. Operating Systems. Concurrent and Distributed Systems. Harlow, UK: Pearson Education, Addison-Wesley.

Bader, D. A., ed. 2008. Petascale Computing. Algorithms and Applications. Boca Raton, FL: Chapman & Hall/CRC.

Bader, M., Breuer, A., and Schreiber, M.. 2013. “Parallel Fully Adaptive Tsunami Simulations.” In Facing the Multicore-challenge III. Aspects of New Paradigms and Technologies in Parallel Computing, Lecture Notes in Computer Science. Vol. 7686, edited by Keller, R., Kramer, D., and Weiss, J-P. (Berlin, Heidelberg: Springer-Verlag), 137–138.

Baer, J-L. 2010. Microprocessor Architecture, Cambridge, NY: Cambridge University Press.

Bahi, J. M. 2008. Parallel Iterative Algorithms. From Sequential to Grid Computing. Boca Raton, FL: Chapman & Hall/CRC.

Barnes, G. H., Brown, R. M., Kato, M., et al. 1968. “The Illiac IV Computer.” IEEE Transactions on Computers 17 (8): 746–757.

Barton, M. L. and Withers, G. R.. 1989. “Computing Performance as a Function of the Speed, Quantity and Cost of the Processors.” Supercomputing ’89 Proc., 759–764.

Barz, H. W. 1983. “Implementing Semaphores by Binary Semaphores.” ACM SIG-PLAN Notices 18 (2): 39–45.

Batcher, K. E. 1968. “Sorting Networks and Their Applications.” Spring Joint Computer Conference, AFIPS Proc., 32: 307–314.

BBN Advanced Computers Incorporated. 1968. Butterfly Parallel Processor Overview, BBN Report No. 6148, March.

Beecroft, J., Homewood, M., and McLaren, M.. 1994. “Meiko CS-2 Interconnect Elan-Elite Design.” Parallel Computing 20 (10–11): 1627–1638.

Bell, G. and Gray, J.. 2002. “What's Next in High-performance Computing.” Communications of the ACM 45 (2): 91–95.

Bellman, R. 1957. Dynamic Programming. Princeton, NJ: Princeton University Press.

Ben-Ari, M. 2006. Principles of Concurrent and Distributed Programming, 2nd edn. Boston, MA: Addison-Wesley.

Bharadwaj, V., Ghose, D., Mani, V., and Robertazzi, T. G.. 1996. Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE Computer Society Press, Los Alamitos, CA.

Bhatele, A. 2011. “Topology Aware Task Mapping.” In Encyclopedia of Parallel Computing, vol. 4, edited by Padua, D. A. (New York: Springer-Verlag), 2057–2062.

Bilardi, G., Herley, K. T., Pietracaprina, A., Pucci, G., and Spirakis, P.w. 1996. “BSP vs LogP.” 8th ACM Symposium on Parallel Algorithms and Architectures, Padova, Italy, 25–32.

Bilardi, G., Pietracaprina, A., and Pucci, G.. 2008. “Decomposable BSP: A Bandwidth-latency Model for Parallel and Hierarchical Computation.” In Hand-book of Parallel Computing. Models, Algorithms and Applications, edited by Rajasekaran, S. and Reif, J. (Boca Raton, FL: Chapman & Hall/CRC), 2-1–2-21.

Bilardi, G. and Pietracaprina, A.. 2011. “Models of Computation, Theoretical.” In Encyclopedia of Parallel Computing, vol. 3, edited by Padua, D. A. (New York: Springer-Verlag), 1150–1158.

Bisseling, R. H. 2004. Parallel Scientific Computation. New York: Oxford University Press.

Biswas, R., Aftosmis, M., Kiris, C., and Shen, B-W.. 2008. “Petascale Computing: Impact on Future NASA Missions.” In Petascale Computing. Algorithms and Applications, edited by Bader, D. A. (Boca Raton, FL: Chapman & Hall/CRC), 29–46.

Biswas, R., Thigpen, W., Ciotti, R., Mehrotra, P., et al. 2013. “Pleiades: NASA's First Petascale Supercomputer.” In Contemporary High Performance Computing: From Petascale toward Exascale, edited by Vetter, J. S. (Chapman & Hall/CRC, Boca Raton, FL), 309–338.

Bokhari, S. H. 1987. “Multiprocessing the Sieve of Eratosthenes.” Computer, April: 50–58.

Boppana, R. B. 1989. “Optimal Separations between Concurrent-write Parallel Machines.” Proc. of the ACM Symposium on Theory of Computing, 320–326.

Borkar, S., Cohn, R., and Fox, G.. 1990. “Supporting Systolic and Memory Communication in iWARP.” Proc. of 17th Annual International Symposium on Computer Architecture, Australia, May 1990, 70–81.

Borodin, A. 1977. “On Relating Time and Space to Size and Depth.” SIAM Journal on Computing 6 (4): 733–744.

Borovska, P., Nakov, O., Markov, S., Ivanova, D., and Filipov, F.. 2007. “Performance Evaluation of TOFU System Area Network Design for High-performance Computer Systems.” Proc. 5th European Conference on European Computing Conference, 186–216.

Bovet, D. P. and Crescenzi, P.. 1994. Introduction to the Theory of Complexity. Upper Saddle River, NJ: Prentice Hall.

Brent, R. P. 1974. “The Parallel Evaluation of General Arithmetic Expressions.” Journal of the ACM 21 (2): 201–206.

Brinch Hansen, P. 1975. “The Programming Language Concurrent Pascal.” IEEE Transactions on Software Engineering 2: 199–206.

Brooks, E. D. III. 1986. “The Butterfly Barrier.” International Journal of Parallel Programming 15: 295–307.

Brucker, P. 2010. Scheduling Algorithms, 5th edn. Berlin, Heidelberg: Springer-Verlag.

Bruda, S. D. and Zhang, Y.. 2009. “Relations between Several Parallel Computational Models.” Scalable Computing: Practice and Experience 10 (2): 163–172.

Burns, A. and Wellings, A.. 1998. Concurrency in Ada, 2nd edn. Cambridge: Cambridge University Press.

Buyya, R., Branson, K., Giddy, J., and Abramson, D.. 2003. “The Virtual Laboratory: A Toolset to Enable Distributed Molecular Modelling for Drug Design on the World-wide Grid.” Concurrency and Computation: Practice and Experience 15 (1): 1–25.

Carmona, E. A. and Rice, M. D.. 1991. “Modeling the Serial and Parallel Fractions of a Parallel Algorithm.” Journal of Parallel and Distributed Computing 13: 286–298.

Carver, R. H. and Tai, K-C.. 2006. Modern Multithreading. Implementing, Testing, and Debugging Multi-threaded Java and C++/Pthreads/Win32 Programs. Hoboken, NJ: Wiley-Interscience.

Casanova, H., Legrand, A., and Robert, Y.. 2009. Parallel Algorithms. Boca Raton, FL: CRC Press.

Chaderjian, N. M. and Buning, P. G.. 2011. “High Resolution Navier-Stokes Simulation of Rotor Wakes.” Proceedings of the American Helicopter Society 67th Annual Forum.

Chaderjian, N. M. and Ahmad, J. U.. 2012. “Detached Eddy Simulation of the UH-60 Rotor Wake Using Adaptive Mesh Refinement.” Proceedings of the American Helicopter Society 68th Annual Forum.

Chandra, R., Dagum, L., Kohr, D., et al. 2001. Parallel Programming in OpenMP. San Francisco, CA: Morgan Kaufmann, Academic Press.

Chapman, B., Jost, G., and van der Pas, R.. 2008. Using OpenMP. Portable Shared Memory Parallel Programming. Cambridge, MA: MIT Press.

Cheatham, T. E., Fahmy, A., Stepanescu, D., and Valiant, L.. 1995. “Bulk Synchronous Parallel Computing-A Paradigm for Transportable Software.” Proc. 28th Annual Hawaii Conference on System Sciences, Vol. II. Hoboken, NJ: IEEE Computer Society Press, 268–275.

Chen, S. S., Price, J. F., Zhao, W., Donelana, M. A., and Walsh, E. J.. 2007. “The CBLAST-Hurricane Program and the Next-generation Fully Coupled Atmosphere-wave-ocean Models for Hurricane Research and Prediction.” Bull. Amer. Meteor. Soc. 88 (3): 311–317.

Cheng, J., Grossman, M., and McKercher, T.. 2014. Professional CUDA C Programming. New York: John Wiley & Sons, Inc.

Chlebus, B. S., Diks, K., Hagerup, T., and Radzik, T., 1988. “Efficient Simulations between Concurrent-read Concurrent-write PRAM Models.” Proc. of the Symposium on Mathematical Foundations of Computer Science, 231–239.

Close, P. 1988. “The iPSC/2 Node Architecture.” Proc. of the Conference on Hypercube Concurrent Computers and Applications, 43–55.

Cole, R. 1986. “Parallel Merge Sort.” Proc. of the 27th Annual Symposium on Foundations of Computer Science. Hoboken, NJ: IEEE Computer Society Press, 511–516.

Cole, R. 1988. “Parallel Merge Sort.” SIAM Journal on Computing 4: 770–785.

Cole, R. 1993. “Parallel Merge Sort.” In Synthesis of Parallel Algorithms, edited by Reif, J. H. (San Mateo, CA: Morgan Kaufmann), 453–495.

Collins, W. D., Bitz, M. L., Blackmon, M. L., et al. 2006. “The Community Climate System Model version 3 (CCSM3).” Journal of Climate 19: 2122–2143.

Convex Computer Corporation. 1993. Exemplar Architecture. Richardson, TX: Convex Computer Corporation.

Cook, S. A. 1979. “Deterministic CFL's are Accepted Simultaneously in Polynomial Time and Log Squared Space.” Conference Record of the Eleventh Annual ACM Symposium on Theory of Computing, Atlanta, GA, April–May 1979, 338–345.

Cook, S. A., Dwork, C., and Reischuk, R.. 1986. “Upper and Lower Time Bounds for Parallel Random Access Machines without Simultaneous Writes.” SIAM Journal on Computing 15: 87–97.

Cormen, T. H., Leiserson, C. E., and Rivest, R. L.. 1990. Introduction to Algorithms. Cambridge, MA: MIT Press.

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C.. 2009. Introduction to Algorithms, 3rd edn. Cambridge, MA: MIT Press.

Coulouris, G., Dollmore, J., and Kindberg, T.. 2005. Distributed Systems: Concepts and Design, 4th edn. Boston, MA: Addison-Wesley.

Courtois, P. J., Heymans, F., and Parnas, D. L.. 1971. “Concurrent Control with ‘Readers’ and ‘Writers’.” Communications of the ACM 14 (10): 667–668.

Culler, D., Karp, R., Patterson, D., et al. 1993. “LogP: Towards a Realistic Model of Parallel Computation.” 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993, 1–12.

Culler, D. E., Singh, J. P., and Gupta, A.. 1999. Parallel Computer Architecture. San Francisco, CA: Morgan Kaufamann.

Dally, W. J. 1991. “Performance Analysis of k-ary n-cube Interconnection Networks.” IEEE Transactions on Computers 39 (6): 775–785.

Dally, W. J. and Towles, B.. 2004. Principles and Practices of Interconnection Networks. San Francisco, CA: Morgan Kaufmann.

Darema-Rogers, F., George, D., Norton, V. A., and Pfister, G.. 1984. “VM Parallel Environment.” Proc. of the IBM Kingston Parallel Processing Symposium, November 27–29, 1984 (IBM Confidential).

Darema, F. 2001. “SPMD Model: Past, Present and Future.” Recent Advances in Parallel Virtual Machine and Message Passing Interface, 8th European PVM/MPI Users’ Group Meeting, Santorini/Thera, Greece, LNCS 2131, September 23–26, 2001, p. 1.

Darte, A., Robert Y., Y., and Vivien, F.. 2000. Scheduling and Automatic Parallelization. Boston, MA: Birkhuser.

Dennis, J. B. 1980. “Dataflow Supercomputers.” IEEE Computer 13: 48–56.

Dennis, J. B. 1983. “Maximum Pipelining of Array Operations on Static Data Flow Machines.” Proc. of the International Conference on Parallel Processing, August 1983, 176–184.

Dennis, J. B. and van Horn, E. C.. 1966. “Programming Semantics for Multiprogrammed Computations.” Communications of the ACM 9 (3): 143–155.

Dennis, J., and Loft, R.. 2009. “Optimizing High-resolution Climate Variability Experiments on the Cray XT4 and XT5 Systems at NICS and NERSC.” Proceedings of the 51st Cray User Group Conference (CUG), 1–8.

Dijkstra, E. W. 1968. “Cooperating Sequential Processes.” In Programming Languages, edited by Genuys, F. (New York: Academic Press), 43–112.

Dijkstra, E. W. 1971. “Hierarchical Ordering of Sequential Processes.” Acta Informatica 1 (2): 115–138.

Dijkstra, E. W. and Scholten, C. S.. 1980. “Termination Detection for Diffusing Computations.” Information Processing Letters 11 (1): 1–4.

Dill, K. A., Ozkan, S. B., Weikl, T. R., Chodera, J. D., and Voelz, V. A.. 2007. “The Protein Problem: When Will It Be Solved?” Current Opinion in Structured Biology 17 (3): 342–346.

Domeika, M. 2008. Software Development for Embedded Multi-core Systems. Burlington, MA: Newnes.

Donnellan, A., Mora, P., Matsu'ura, M., and Yin, X-C.. 2004. Computational Earthquake Science. Basel: Birkhuser.

Dongarra, J. 2013. “Visit to the National University for Defense Technology Changsha, China, University of Tennessee, Oak Ridge National Laboratory, June 3, 2013.” http://www.netlib.org/utk/people/JackDongarra/PAPERS/tianhe-2-dongarra-report.pdf.

Dongarra, J., Otto, S. W., Snir, M., and Walker, D.. 1995. An Introduction to the MPI standard, University of Tennessee Technical Report, CS-95-274, January 1995.

Dongarra, J., Foster, I., Fox, G., et al. ed. 2003. Sourcebook of Parallel Computing. San Francisco, CA: Morgan Kaufmann.

Dongarra, J., Sterling, T., Simon, H., and Strohmaier, E.. 2005. “High-performance Computing: Clusters, Constellations, MPPs, and Future Directions.” Computing in Science & Engineering, March/April: 51–59.

Dongarra, J. and Luszczek, P.. 2011. “LINPACK Benchmark.” In Encyclopedia of Parallel Computing, vol. 2, edited by Padua, D. (New York: Springer-Verlag), 1033–1035.

Dorband, E. N., Hemsendorf, M., and Merritt, D.. 2003. “Systolic and Hyper-systolic Algorithms for the Gravitational N-body Problem, with an Application to Brownian Motion.” J. Comput. Phys. 185: 484–511.

Downey, A. B. 2007. “The Little Book of Semaphore,” v. 2.1.2. http://greenteapress.com/semaphores/.

Drake, J. B., Jones, P. W., Vertenstein, M., White, J. B. III, and Worley, P. H.. 2008. “Software Design for Petascale Climate Science.” In Petascale Computing. Algorithms and Applications, edited by Bader, D. A. (Boca Raton, FL: Chapman & Hall/CRC), 125–146.

Drozdowski, M. 2004. “Scheduling Parallel Tasks – Algorithms and Complexity.” In Handbook of Scheduling. Algorithms, Models and Performance Analysis, edited by Leung, J. Y-T. (Boca Raton, FL: Chapman & Hall/CRC), 25-1–25-25.

Dubois, M., Annavaram, M., and Stenstr´’om, P.. 2012. Parallel Computer Organization and Design. Cambridge: Cambridge University Press.

Dumancas, G. G. 2015. “Applications of Supercomputers in Sequence Analysis and Genome Annotation.” In Research and Applications in Global Supercomputing, edited by Segall, R. S., Cook, J. S. and Zhang, Q. (Hershey, PA: IGI Global), 149–175.

Dutot, P-F., Mounié, G., and Trystram, D.. 2004. “Scheduling Parallel Tasks Approximation Algorithms.” In Handbook of Scheduling. Algorithms, Models and Performance Analysis, edited by Leung, J.Y-T. (Boca Raton, FL: Chapman & Hall/CRC), 26-1–26-24.

Science. 2005. “Editorial: So Much More to Know.” Science 309: 78–102.

El-Ghazawi, T., Carlson, W., Stering, T., and Yelick, K,. 2005. UPC. Distributed Shared Memory Programming. Hoboken, NJ: John Wiley & Sons, Inc.

Endy, D. and Brent, R.. 2001. “Modelling Cellular Behaviour.” Nature 409: 391–395.

Fatahalian, K. and Houston, M.. 2008. “A Closer Look at GPUs.” Communications of the ACM 51 (10): 50–57.

Feng, T. Y. 1972. “Some Characteristics of Associative/Parallel Processing.” Proc. of the 1972 Sagamore Computing Conference, 5–16.

Feng, T. Y. 1981. “A Survey of Interconnection Networks.” IEEE Computer, December: 12–27.

Feo, J. T., ed. 1993. A Comparative Study of Parallel Programming Languages: The Salishan Problems. Amsterdam, The Netherlands: North-Holland.

Fich, F. E. 1993. “The Complexity of Computation on the Parallel Random Access Machine.” In Synthesis of Parallel Algorithms, edited by Reif, J. H. (San Mateo, CA: Morgan Kaufmann), 843–899.

Fich, F. E., Ragde, P., and Wigderson, A.. 1988. “Relations between Concurrent-write Models of Parallel Computation.” SIAM Journal on Computing 7: 606–627.

Fishman, G. S. 1996. Monte Carlo: Concepts, Algorithms and Applications. New York: Springer-Verlag.

Flatt, H. P. and Kennedy, K.. 1989. “Performance of Parallel Processors.” Parallel Computing 12: 1–20.

Flynn, M. J. 1966. “Very High Speed Computers.” Proc. IEEE 54: 1901–1909.

Flynn, M. J. 1972. “Some Computer Organizations and Their Effectiveness.” IEEE Transactions on Computing C-21: 948–960.

Flynn, M. J. 2011. “Flynn's Taxonomy.” In Encyclopedia of Parallel Computing, Vols 1–4 (New York: Springer-Verlag), 689–697.

Fortune, S. and Wyllie, J.. 1978. “Parallelism in Random Access Machines.” Proc. 10th Symp. Theory Computing. ACM, New York, 114–118.

Foster, I. T. 1995. Designing and Building Parallel Programs. Concepts and Tools for Parallel Software Engineering. Addison-Wesley, Reading, MA, http://www.mcs.anl.gov/~itf/dbpp/.

Foster, I. and Kesselman, C.. ed. 2004. The Grid 2: Blueprint for a New Computing Infrastructure, 2nd edn. San Francisco, CA: Elsevier.

Fountain, T. J. 1994. Parallel Computing Principles and Practice. Cambridge: Cambridge University Press.

Fox, G. C., Williams, R. D., and Messina, P. C.. 1994. Parallel Computing Works!. San Francisco, CA: Morgan Kaufmann.

Francez, N. 1980. “Distributed Termination.” ACM Trans. Program. Lang. Syst. 2 (1): 42–55.

Frank, S., Burkhardt, H., and Rothnie, J.. 1993. “The KSR1: Bridging the Gap between Shared Memory and MPPs.” Proc. of the COMPCON Digest of Papers, 285–294.

Furst, M., Saxe, J. B., and Sipser, M., 1984. “Parity, Circuits, and the Polynomial-time Hierarchy.” Mathematical Systems Theory 17: 13–27.

Gabriel, E., Fagg, G. E., Bosilca, G., et al. 2004. “Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation.” Proc. 11th European PVM/MPI Users’ Group Meeting, September 2004, Budapest, Hungary, 97–104.

Gajski, D., Padua, D. A., Kuck, D. J., and Kuhn, R. H.. 1982. “A Second Opinion on Data Flow Machines and Languages.” IEEE Computer 15 (2): 58–69.

Galvin, P. B., Gagne, G., and Silberschatz, A.. 2013. Operating System Concepts, 9th edn. New York: John Wiley & Sons, Inc.

Gara, A. 2005. “Overview of the Blue Gene/L System Architecture.” IBM Journal of Research and Development 49 (2/3): 195–212.

Gara, A. and Moreira, J. E.. 2011. “IBM Blue Gene ‘supercomputer’.” In Encyclopedia of Parallel Computing, vol. 2, edited by Padua, D. A. (New York: Springer-Verlag), 891–900.

Garey, M. R. and Johnson, D. S.. 1979. Computers and Intractability. A Guide to the Theory of NP-Completeness. New York: W. H. Freeman and Co.

Garland, M. 2011. “NVIDIA GPU.” In Encyclopedia of Parallel Computing, vol. 3, edited by Padua, D. A (New York: Springer-Verlag), 1339–1345.

Gaudiot, J. and Bic, L.. 1989. Advanced Topics in Data-flow Computing. Englewood Cliffs, NJ: Prentice Hall.

Gebali, F. 2011. Algorithms and Parallel Computing. Hoboken, NJ: John Wiley & Sons, Inc.

Geist, A., Beguelin, A., Dongarra, J., et al. 1994. PVM: Parallel Virtual Machine: A User's Guide and Tutorial for Networked Parallel Computing. Cambridge, MA: The MIT Press.

Geist A. 2011. “PVM (Parallel Virtual Machine).” In Encyclopedia of Parallel Computing, vol. 3, edited by Padua, D. A. (New York: Springer-Verlag), 1647–1651.

Gent, P. R., Danabasoglu, G., and Donner, L. J., et al. 2011. “The Community Climate System Model Version 4.” Journal of Climate 24(19): 4973–4991.

Ghosh, S. 2007. Distributed Systems. An Algorithmic Approach. Boca Raton, FL: Chapman & Hall/CRC.

Gibbons, A. 1993. “An Introduction to Distributed Memory Models of Parallel Computation.” In Lectures on Parallel Computation, edited by Gibbons, A. and Spirakis, P. (Cambridge: Cambridge University Press), 197–226.

Gibbons, A. and Rytter, W.. 1988. Efficient Parallel Algorithms. Cambridge: Cambridge University Press.

Gibbons, A. and Spirakis, P., eds. 1993. Lectures on Parallel Computation. Cambridge: Cambridge University Press.

Gilge, M. 2012. “IBM System Blue Gene Solution: Blue Gene/Q. Application Development.” March. www.ibm.com/redbooks/.

Glauert, J. A. 1978. “A Single Assignment Language for Dataflow Computing.” Master's Thesis, Manchester, UK: University of Manchester.

Goedecker, S. and Hoisie, A.. 2001. Performance Optimization of Numerically Intensive Codes. Philadelphia, PA: SIAM Publishing Company.

Goldschlager, L. M. 1982. “A Universal Interconnection Pattern for Parallel Computers.” Journal of ACM 29: 1073–1086.

Goodman, S. E. and Hedetniemi, S. T.. 1977. Introduction to Design and Analysis of Algorithms. New York: McGraw-Hill.

Gottlieb, A., Grishman, R., Kruskal, C. P., et al. 1983. “The NUY Ultra-computer— Designing a MIMD Shared Memory Parallel Computer.” IEEE Transactions on Parallel and Distributed Systems 32 (2): 175–189.

Gottlieb, A. 2011. “Ultracomputer, NYU.” In Encyclopedia of Parallel Computing, vol. 4, edited by Padua, D. A. (New York: Springer-Verlag), 2095–2103.

Graham, R. L., Shipman, G. M., and Barrett, B. W., et al. 2006. “Open MPI: A High-performance, Heterogeneous MPI.” Proc. 5th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks, September 2006, Barcelona, Spain, 1–9.

Grama, A., Gupta, A., Karypis, G., and Kumar, V.. 2003. Introduction to Parallel Computing, 2nd edn. Harlow, UK: Addison-Wesley.

Grama, A. Y., Gupta, A., and Kumar, V.. 1993. “Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures.” IEEE Parallel and Distributed Technology 1 (3): 12–21.

Grama, A. and Kumar, V.. 2008. “Scalability of Parallel Programs.” In Handbook of Parallel Computing. Models, Algorithms and Applications, edited by Rajasekaran, S. and Reif, J. (Boca Raton, FL: Chapman & Hall/CRC), 43-1–43-16.

Greenlaw, R. 1993. “Polynomial Completeness and Parallel Computation.” In Synthesis of Parallel Algorithms, edited by Reif, J. H. (San Mateo, CA: Morgan Kaufmann), 901–953.

Greenlaw, R., Hoover, H. J., and Ruzzo, W. L.. 1995. Limits to Parallel Computation: P-Completeness Theory. Oxford: Oxford University Press. www.cs.armstrong.edu/-greenlaw/research/PARALLEL/.

Gropp, W. 2011. “MPI (Message Passing Interface).” In Encyclopedia of Parallel Computing, vol. 3, edited by Padua, D. A. (New York: Springer-Verlag), 1184–1190.

Gropp, W., Huss-Lederman, S., Lumsdaine, A., et al. 1998. MPI-The Complete Reference: Vol. 2. The MPI Extensions, 2nd edn. Cambridge, MA: MIT Press.

Gropp, W., Lusk, E., and Skjellum, A.. 1999. Using MPI. Portable Parallel Programming with the Message-passing Interface, 2nd edn, Cambridge, MA: MIT Press.

Gropp, W., Lusk, E., and Thakur, R.. 1999. Using MPI-2. Advanced Features of the Message-passing Interface, 2nd edn. Cambridge, MA: MIT Press.

Gupta, A. and Kumar, V.. 1993. “Performance Properties of Large Scale Parallel Systems.” Journal of Parallel and Distributed Computing 19: 234–244.

Gurd, J. R., Kirkham, C., and Watson, J.. 1985. “The Manchester Prototype Dataflow Computer.” Communications of the ACM 28 (18): 36–45.

Gustafson, J. L. 1988. “Reevaluating Amdahl's Law.” Communications of the ACM 31 (5): 532–533.

Gustafson, J. L., Montry, G. R., and Benner, R. E.. 1988. “Development of Parallel Methods for a 1024-processor Hypercube.” SIAM Journal on Scientific and Statistical Computing 9 (4): 609–638.

Gustafson, J. L. 1992. “The Consequences of Fixed Time Performance Measurement.” Proc. of the 25th Hawaii International Conference on System Sciences, Vol. III, 113–124.

Gustafson, J. L. 2011. “Brent's Theorem.” In Encyclopedia of Parallel Computing, vol. 1, edited by Padua, D. A. (New York: Springer-Verlag), 182–185.

Gustafson, J. L. 2011. “Moore's Law.” In Encyclopedia of Parallel Computing, vol. 3, edited by Padua, D. A. (New York: Springer-Verlag), 1177–1184.

Hager, G. and Wellein, G.. 2011. Introduction to High Performance Computing for Scientists and Engineers. Boca Raton, FL: Chapman & Hall/CRC.

Halfill, T. R. 2008. “Parallel Processing with CUDA.” Microprocessor Report, January 28: 1–8 (www.MPRonline.com).

Hamacher, V. V., Vranesic, Z. G., and Zaky, S. G.. 2001. Computer Organization, 5th edn. New York: McGraw-Hill.

Handler, W. 1977. “The Impact of Classification Schemes on Computer Architecture.” Proc. of the International Conference on Parallel Processing, August, 7–15.

Handy, J. 1998. The Cache Memory Book, 2nd edn. Orlando, FL: Academic Press.

Harris, T. J. 1994. “A Survey of PRAM Simulation Techniques.” ACM Computing Surveys 26: 187–206.

Hennessy, J. L. and Patterson, D. A.. 2007. Computer Architecture. A Quantitative Approach, 4th edn. San Francisco, CA: Morgan Kaufmann.

Hensgen, D., Finkel, R., and Manber, U.. 1988. “Two Algorithms for Barrier Synchronization.” International Journal of Parallel Programming 17 (1): 1–16.

Herley, K. T. and Bilardi, G.. 1988. “Deterministic Simulations of PRAMs on Bounded-degree Networks.” Proc. of 26th Annual Allerton Conference on Communication, Control and Computation, Monticello, IL, 1084–1093.

Herlichy, M. and Shavit, N.. 2008. The Art of Multiprocessor Programming. Burlington, MA: Morgan Kaufmann.

Heroux, M. A., Raghavan, P., and Simon, H. D., eds. 2006. Parallel Processing for Scientific Computing. Philadelphia, PA: SIAM Publishing Company.

Hicks, J., Chiou, D., Ang, B., and Arvind, . 1992. Performance Studies of the Monsoon Dataflow Processor. CSF Memo 345-2, MIT, October.

Hill, M. 1998. “Multiprocessors Should Support Simple Memory-consistency Models.” IEEE Computer Magazine 31: 28–34.

Hillis, D. 1985. The Connection Machine. Cambridge, MA: MIT Press.

Hiraki, K., Nishida, K., Sekiguchi, S., Shimada, T., and Tiba, T., 1987. “The SIGMA-1 Dataflow Supercomputer: A Challenge for New Generation Supercomputing Systems.” Journal of Information Processing 10 (4): 219–226.

Hoare, C.A.R. 1974. “Monitors, an Operating System Structuring Concept.” Communications of the ACM 17: 549–557;

“Erratum.” Communications of the ACM 18 (1975): 95.

Hoare, C. A. R. 1978. “Communicating Sequential Processes.” Communications of the ACM 21 (8): 666–677.

Hoffman, F. M. and Hargrove, W. W.. 1999. “Multivariate Geographic Clustering Using a Beowulf-style Parallel Computer.” Proc. of the International Conference on Parallel and Distributed Processing Techniques and Applications, June, 1292–1298.

Hromkovič, J. 2003. Algorithmics for Hard Problems. Introduction to Combinatorial Optimization, Randomization, Approximation and Heuristics. Berlin: Springer-Verlag.

Hwang, K. 1993. Advanced Computer Architecture, Parallelism, Scalability, Programmability. New York: McGraw-Hill.

Hwang, K. and Xu, Z.. 1998. Scalable Parallel Computing. McGraw-Hill, New York, 1998.

Hwang, K., Fox, G. C., and Dongarra, J. J.. 2012. Distributed and Cloud Computing. Waltham, MA Morgan Kaufman.

Hyndman, Donald and David, Hyndman. 2009. Natural Hazards and Disasters, 2nd edn. Belmont, CA: Brooks/Cole,

Inmos Ltd. 1988. Occam 2 Reference Manual. Englewood Cliffs, NJ: Prentice-Hall.

International Human Genome Sequencing Consortium. 2001. “Initial Sequencing and Analysis of the Human Genome.” Nature 409: 860–921.

International Organization for Standardization, Geneva. 1996. Information Technology-Portable Operating System Interface (POSIX) – Part 1: System Application Program Interface (API) [C Language], December.

JáJ á, J. 1992. An Introduction to Parallel Algorithms. Reading, MA: Addison-Wesley.

Jha, S. K. and Jana, P. K.. 2011. Study and Design of Parallel Algorithms for Interconnection Networks. Saarbr´’ucken, Germany: Lambert Academic Publishing.

Johnson, M. 1991. Superscalar Microprocessor Design. Upper Saddle River, NJ: Prentice-Hall.

Jones, G. A. and Goldsmith, M., 1989. Programming in Occam 2, 2nd edn. Engle-wood Cliffs, NJ: Prentice Hall.

Jordan, H. and Alaghband, G.. 2003. Fundamentals of Parallel Processing. Upper Saddle River, NJ: Prentice Hall.

Kalos, M. H. and Whitlock, P. A.. 2008. Monte Carlo Methods, 2nd edn. Weinheim: Wiley-VCH Verlag.

Kalyanaraman, A., Emrich, S. J., Schnable, P. S., and Aluru, S.. 2007. “Assembling Genomes on Large-scale Parallel Computers.” Journal of Parallel and Distributed Computing 67, 1240–1255.

Karniadakis, G. E. and Kirby, R. M. II. 2007. Parallel Scientific Computing in C++ and MPI. A Seamless Approach to Parallel Algorithms and Their Implementation. New York: Cambridge University Press.

Karp, A. H. and Flatt, H. P.. 1990. “Measuring Parallel Processor Performance.” Communications of the ACM 33 (5): 539–543.

Karp, R. M. and Ramachandran, V.. 1990. “Parallel Algorithms for Shared-memory Machines.” In Handbook of Theoretical Computer Science, vol. A, edited by van Leeuven, J. (Amsterdam, The Netherlands: Elsevier), 870–941.

Keller, R., Kramer, D., Weiss, J-P., eds. 2013. Facing the Multicore-challenge III. Aspects of New Paradigms and Technologies in Parallel Computing. Lecture Notes in Computer Science 7686. Berlin, Heidelberg: Springer-Verlag.

Kennedy, K. and Allen, J. R.. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. San Francisco, CA: Morgan Kaufmann Pub.

Kessler, R. E. and Schwarzmeier, J. L.. 1993. “Cray T3D: A New Dimension for Cray Research.” Proc. of the IEEE Computer Society International Conference, February, 176–182.

Kiris, C., Housman, J., Gusman, M., et al. 2011. “Best Practices for Aero-Database CFD Simulations of Ares V Ascent.” In 49th AIAA Aerospace Sciences Meeting, 1–21.

Kirk, D. B. and Hwu, W-M. W.. 2013. Programming Massively Parallel Processors. A Hands-on Approach, 2nd edn. Waltham, MA: Morgan Kaufmann.

Klie, H., Bangerth, W., Gail, X., et al. 2006. “Models, Methods and Middleware for Grid-enabled Multiphysics Oil Reservoir Management.” Engineering with Computers 22 (3–4): 349–370.

Knuth, D. E. 1971. “Optimum Binary Search Trees.” Acta Informatica 1 (1): 14–25.

Knuth, D. E. 1998. The Art of Computer Programming, Vol. 3. Sorting and Searching, 2nd edn. Reading, MA: Addison-Wesley.

Kodama, C., Terai, M., Noda, A. T., et al. 2014. “Scalable Rank-mapping Algorithm for an Icosahedral Grid System on the Massive Parallel Computer with a 3-D Torus Network.” Parallel Computing 40: 362–373.

Koelbel, C. H., Loveman, D. B., Schreiber, R. S., Steele, G. L. Jr., and Zosel, M. E.. 1997. The High Performance Fortran Handbook. Cambridge, MA: MIT Press.

Komornicki, A., Mullen-Schulz, G., and Landon, D., 2009. Roadrunner: Hardware and Software Overview, IBM Technical Support Organization. www.redbooks.ibm.com/redpapers/pdfs/redp4477.pdf.

Kontoghiorghes, E. J. ed. 2006. Handbook of Parallel Computing and Statistics. Boca Raton, FL: Chapman & Hall/CRC.

Kruskal, C. P. and Snir, M.. 1986. “A Unified Theory of Interconnection Network Structure.” Theoretical Computer Science 48 (3): 75–94.

Kshemkalyani, A. D. and Singhal, M.. 2008. Distributed Computing. Cambridge: Cambridge University Press.

Kučera, L. 1982. “Parallel Computation and Conflicts in Memory Access.” Information Processing Letters 14: 93–96.

Kumar, V., Grama, A., Gupta, A., and Karypis, G., 1994. Introduction to Parallel Computing. Design and Analysis of Algorithms. Redwood City, CA: Benjamin/ Cummings.

Kumar, V. and Gupta, A.. 1994. “Analyzing Scalability of Parallel Algorithms and Architectures.” Journal of Parallel and Distributed Computing 22: 379–391.

Kumar, V. and Singh, V.. 1991. “Scalability of Parallel Algorithms for the All-pairs Shortest-path Problem.” Journal of Parallel and Distributed Computing 13: 124–138.

Kung, H. T. 1988. VLSI Array Processors. Upper Saddle River, NJ: Prentice Hall.

Kung, H. T. and Leiserson, C. E.. 1978. “Systolic Arrays (for VLSI).” In Sparse Matrix Proceedings, Knoxville, TN, SIAM, Philadelphia, edited by Duff, I. S. and Stewart, G. W. (US: Society for Industrial & Applied Mathematics), 256–282.

Kurzak, J., Bader, D. A., and Dongarra, J., eds. 2011. Scientific Computing with Multicore and Accelerators. Boca Raton, FL: Chapman & Hall/CRC.

Kwok, Y-K. and Ahmad, I.. 1999. “Benchmarking and Comparison of the Task Graph Scheduling Algorithms.” Journal of Parallel and Distributed Computing 59: 381–422.

Ladner, R. E. 1975. “The Circuit Value Problem Is Log Space Complete for P.” SIGACT News 7 (1): 18–20.

Lansdowne, S. T., Cousins, R. E., and Wilkinson, D. C.. 1987. “Reprogramming the Sieve of Eratosthenes.” Computer, August: 90–91.

Lastovetsky, A. L. 2003. Parallel Computing on Heterogeneous Networks. Hoboken, NJ: John Wiley & Sons, Inc.

Laudon, J. P. and Lenoski, D.. 1997. “The SGI Origin: A ccNUMA Highly Scalable Server.” Proc. of the 24th International Symposium on Computer Architecture, 241–251.

Lawrie, D. H. 1975. “Access and Alignment of Data in an Array Processor.” IEEE Transactions on Computers C-24 (1): 1145–1155.

Lea, D. 1997. Concurrent Programming in Java. Design Principles and Patterns. Reading, MA: Addison-Wesley.

Karp, R. M. and Ramachandran, V.. 1990. “Parallel Algorithms for Shared-memory Machines.” In Handbook of Theoretical Computer Science, vol. A, edited by van Leeuwen, J. (Amsterdam, The Netherlands: Elsevier), chap. 17;

Vailant, L. G. 1990. “General Purpose Parallel Architectures.” In Handbook of Theoretical Computer Science, vol. A, edited by van Leeuwen, J. (Amsterdam, The Netherlands: Elsevier), chap. 18.

Leighton, F. T. 1992. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. San Mateo, CA: Morgan Kaufmann.

Leiserson, C. E. 1985. “Fat-trees: Universal Networks for Hardware-efficient Supercomputing.” IEEE Transactions on Computers C-34 (10): 892–901.

Leung, J. Y-T., ed. 2004. Handbook of Scheduling. Algorithms, Models and Performance Analysis. Boca Raton, FL: Chapman & Hall/CRC.

Levesque, J. and Wagenbreth, G.. 2011. High Performance Computing. Programming and Applications, Chapman & Hall/CRC, Boca Raton, FL.

Lewis, B. and Berg, D.. 1998. Multithreaded Programming with Pthreads. Mountain View, CA: Sun Microsystems Press.

Li, K. 1986. “Shared Virtual Memory on Loosely Coupled Multiprocessor.” Ph.D. thesis, Department of Computer Science, Yale University.

Li, K. and Hudak, P.. 1989. “Memory Coherence in Shared Virtual Memory Systems.” ACM Transactions on Computer Systems 7: 321–359.

Lillevik, S. L. 1991. “The Touchstone 30 Gigaflop DELTA Prototype.” DMCC April: 671–677.

Lin, C. and Snyder, L.. 2009. Principles of Parallel Programming. Boston, MA: Addison-Wesley.

Lindholm, E., Nickolls, J., Oberman, S., and Mntrym, J.. 2008. “NVIDIA Tesla: A Unified Graphics and Computing Architecture.” IEEE Micro 28 (2): 39–55.

Loft, R., Andersen, A., Bryan, F., et al. 2015. “Yellowstone: A Dedicated Reitalic for Earth System Science.” In Contemporary High Performance Computing: From Petascale toward Exascale, edited by Vetter, J. S. (Chapman & Hall/CRC, Boca Raton, FL), vol. II, 185–224.

Lynch, N. A. 1996. Distributed Algorithms. San Francisco, CA: Morgan Kaufmann.

Lysne, O. and Sem-Jacobsen, F. O.. 2011. “Networks, Multistage.” In Encyclopedia of Parallel Computing, vol. 3, edited by Padua, D. A. (New York: Springer-Verlag), 1316–1321.

Makino, J. 2002. “An Efficient Parallel Algorithm for O(N2 ) Direct Summation Method and Its Variations on Distributed-memory Parallel Machines.” New Astron. 7: 373–384.

Manber, U. 1989. Introduction to Algorithms—A Creative Approach. Boston, MA: Addison-Wesley.

Mandelbrot, B. B. 1980. “Fractal Aspects of the Iteration of z → λz(1 − z) for complex λ, z.” Annals of the New York Academy of Sciences 357: 249–259.

Marinescu, D. C. and Rice, J. R.. 1994. “On High Level Characterization of Parallelism.” Journal of Parallel and Distributed Computing 20: 107–113.

Marsh, D. R., Mills, M. J., Kinnison, D. E., et al. 2013. “Climate change from 1850 to 2005 simulated in CESM1 (WACCM).” Journal of Climate, 26(19): 7372–7391.

Matsu'ura, M., Furumura, T., Okuda, H., et al. 2006. “Integrated Predictive Simulation System for Earthquake and Tsunami Disaster.” SIAM 12th Conference on Parallel Processing for Scientific Computing (PP06), San Francisco, 2006, and also: Annual Report of the Earth Simulator Center, April 2005–March 2006, 407–410.

Mattson, T. G. 2003. “How Good Is OpenMP?” Scientific Programming 11: 81–93.

Mattson, T. G., Sanders, B. A., and Massingill, B. L.. 2005. Patterns for Parallel Programming. Boston, MA: Addison-Wesley.

McKee, S. A. and Wisniewski, R. W.. 2011. “Memory Wall.” In Encyclopedia of Parallel Computing, vol. 3, edited by Padua, D. A. (New York: Springer-Verlag), 1110–1116.

Mellor-Crummey, J. M. and Scott, M. L.. 1991. “Algorithms for Scalable Synchronization on Shared-memory Multiprocessors.” ACM Transactions on Computer Systems 9 (1): 21–65.

Message Passing Interface Forum. 1998. “MPI2: A Message Passing Interface Standard.” International Journal of High Performance Computing Applications 12 (1–2): 1–299.

Message Passing Interface Forum. 2012. “MPI: A Message-Passing Interface Standard, Version 3.0.” High Performance Computing Center Stuttgart (HLRS), September 21.

Milano, J. and Lembke, P., 2012. “IBM system Blue Gene Solution: Blue Gene/Q. Hardware Overview and Installation Planning.” March. www.ibm.com/redbooks.

Miller, R. and Boxer, L.. 2005. Algorithms. Sequential and Parallel. A Unified Approach, 2nd edn. Hingham, MA: Charles River Media Inc.

Mizuta, R., Uchiyama, T., Kamiguchi, K., Kitoh, A., and Noda, A.. 2005. “Changes in Extremes Indices over Japan due to Global Warming Projected by a Global 20-km-mesh Atmospheric Model.” Scientific Online Letters on the Atmosphere (SOLA) 1: 153–156. doi: 10.2151/sola.2005-040.

Mogoules, F., Pan, J., Tan, K-A., and Kumar, A.. 2009. Introduction to Grid Computing. Boca Raton, FL: Chapman & Hall/CRC.

Moin, P. and Kim, J.. 1997. “Tackling Turbulence with Supercomputers.” Scientific American 276: 62–68.

Moldovan, D. I. 1993. Parallel Processing from Applications to Systems. San Mateo, CA: Morgan Kaufmann.

Monacelli, G., Sessa, F., and Milite, A.. 2004. “An Integrated Approach to Evaluate Engineering Simulations and Ergonomic Aspects of a New Vehicle in a Virtual Environment: Physical and Virtual Correlation Methods.” FISITA 2004 30th World Automotive Congress, 2004, Barcelona, Spain, 23–27.

Monien, B. and Sudborough, H.. 1988. “Comparing Interconnection Networks.” Lecture Notes in Computer Science 324: 139–153.

Moore, G. E. 1965. “Cramming More Components onto Integrated Circuits.” Electronics Magazine 38 (8): 114–117.

Morse, H. S. 1994. Practical Parallel Computing. Cambridge, MA: AP Professional.

Mukherjee, S. S., Banno, P., Lang, S., Spink, A., and Webb, D.. 2001. “The Alpha 21364 Network Architecture.” Proc. of the Symposium on Hot Interconnects, August, 113–117.

Nakata, T., Kanoh, Y., Tatsukawa, K., et al. 1998. “Architecture and the Software Environment of Parallel Computer Cenju-4.” NEC Research and Development Journal 39: 385–390.

nCUBE Corporation. 1990. nCUBE Processor Manual.

Nickolls, J. R. 1990. “The Design of the MasPar MP-1: A Cost-effective Massively Parallel Computer.” Proc. COMPCON Digest of Paper, 25–28.

Nicol, D. M. and Willard, F. H.. 1988. “Problem Size, Parallel Architecture, and Optimal Speedup.” Journal of Parallel and Distributed Computing 5: 404–420.

Nikhil, R. S. and Arvind, . 1989. “Can Dataflow Subsume von Neumann Computing?” Proc. of the 16th Annual International Symposium on Computer Architecture, 262–272.

Niphanupudi, M.V., Norton, C. D., and Szymanski, B. K.. 1995. “Plasma Simulation on Networks of Workstations Using the Bulk Synchronous Parallel Model.” Proc. of the Conference on Parallel and Distributed Processing Techniques and Applications, Athens, Georgia, 13–22.

Null, L. and Lobur, J.. 2015. The Essentials of Computer Organization and Architecture, 4th edn. Burlington, MA: Jones & Bartlett Learning.

Nussbaum, D. and Agarwal, A.. 1991. “Scalability of Parallel Machines.” Communications of the ACM 34 (3): 57–61.

Nuth, P. R. and Dally, W. J.. 1992. “The J-machine Network.” Proc. of the International Conference on Computer Design, October 1992, 420–423.

Nvidia, . 2015. CUDA C Programming Guide, PG-02829-001 v7.5, September. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf.

Nyland, L., Harris, M., and Prins, J.. 2007. “Fast N-body Simulations with CUDA.” In GPU Gems 3 (31), edited by Nguyen, H. (Addison-Wesley, eBook-BBL), 677–695.

Oden, J. T., Belytschko, T., Fish, J., et al. 2006. “Revolutionizing Engineering Science through Simulation.” National Science Foundation Blue Ribbon Panel Report 65: 1–66.

OpenMP Application Program Interface, Version 2.5, May 2005. www.openmp.org.

OpenMP Application Program Interface, Version 3.0, May 2008. www.openmp.org.

OpenMP Application Program Interface, Version 3.1, July 2011. www.openmp.org.

OpenMP Application Program Interface, Version 4.0, July 2013. www.openmp.org.

OpenMP Application Program Interface, Version 4.1, July 2015. www.openmp.org.

Pacheco, P. S. 1997. Parallel Programming with MPI. San Francisco, CA: Morgan Kaufmann.

Pacheco, P. S. 2011. “An Introduction to Parallel Programming.” Burlington, MA: Morgan Kaufmann.

Padua, D. A. ed. 2011. Encyclopedia of Parallel Computing, Vols 1–4 (New York: Springer-Verlag).

Palmer, J. F. 1986. “The NCUBE Family of Parallel Supercomputers.” Proc. of the International Conference on Computer Design, p. 107.

Papadimitriou, C. H. 1994. Computational Complexity. Reading, MA: AddisonWesley, chap. 15, “Parallel Computing.”

Parberry, I. 1987. Parallel Complexity Theory. London: Pitman/Wiley.

Parhami, B. 1999. Introduction to Parallel Processing. Algorithms and Architectures. New York: Plenum Press.

Parnas, D. L. 1975. “On a Solution to the Cigarette Smokers’ Problem without Conditional Statements.” Communications of the ACM 18: 181–183.

Paterson, M. S. 1990. “Improved Sorting Networks with O(logN) Depth.” Algorithmica 5 (1–4): 75–92.

Patil, S. 1971. Limitations and Capabilities of Dijkstra's Semaphore Primitives for Coordination among Processes. Technical report, Massachusetts Institute of Technology.

Patterson, D. A. and Hennessy, J. L.. 2013. Computer Organization and Design, 5th edn. Burlington, MA: Morgan Kaufmann.

Peitgen, H.-O. and Richter, P.. 1986. The Beauty of Science. Heidelberg: Springer-Verlag.

Pfister, G. F. 1998. In Search of Clusters. 2nd edn. Upper Saddle River, NJ: Prentice Hall.

Pfister, G. F., Brantley, W. C., George, D. A., et al. 1985. “The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture.” Proc. of 1985 International Conference on Parallel Processing, 764–771.

Preparata, F. P. and Vuillemin, J.. 1981. “The Cube-connected Cycles: A Versatile Network for Parallel Computation.” Communications of the ACM 24 (5): 300–309.

President's Information Technology Committee. 2005. Computational Science: Ensuring America's Competitiveness, June: 1–117.

Quinn, M. J. 1987. Designing Efficient Algorithms for Parallel Computers. New York: McGraw-Hill.

Quinn, M. J. 1994. Parallel Computing. Theory and Practice, 2nd edn. New York: McGraw-Hill.

Quinn, M. J. 2004. Parallel Programming in C with MPI and OpenMP, New York: McGraw-Hill.

Rajasekaran, S. and Reif, J., eds. 2008. Handbook of Parallel Computing. Models, Algorithms and Applications. Boca Raton, FL: Chapman & Hall/CRC.

Rajasekaran, S., Fiondella, L., Ahmed, M., and Ammar, R. A., eds. 2014. Multicore Computing. Boca Raton, FL: Chapman & Hall/CRC.

Ranade, A. G. 1987. “How to Emulate Shared Memory.” Proc. of 28th Annual Symposium on the Foundations of Computer Science, Los Angeles, CA, 1987, 185–192.

Rauber, T. and R´’unger, G.. 2010. Parallel Programming for Multicore and Cluster Systems. Berlin: Springer-Verlag.

Reif, J. H., ed. 1993. Synthesis of Parallel Algorithms. San Mateo, CA: Morgan Kaufmann.

Reinders, J. R. 2011. “Systolic Arrays.” In Encyclopedia of Parallel Computing, vol. 4, edited by Padua, D. A. (New York: Springer-Verlag), 2002–2011.

Reinders, J. R. 2011. “Warp and iWarp.” In Encyclopedia of Parallel Computing, vol. 4, edited by Padua, D. A. (New York: Springer-Verlag), 2150–2159.

Reingold, E. M., Nievergelt, J., and Deo, N.. 1977. Combinatorial Algorithms: Theory and Practice. New York: Prentice Hall.

Riesen, R. and Maccabe, A. B.. 2011. “MIMD (Multiple Instruction, Multiple Data) Machines.” In Encyclopedia of Parallel Computing, vol. 3, edited by Padua, D. A. (New York: Springer-Verlag), 1140–1149.

Robert, Y. 2011. “Task Graph Scheduling.” In Encyclopedia of Parallel Computing, vol. 4, edited by Padua, D. A. (New York: Springer-Verlag), 2013–2025.

Roberts, M. J., Vidale, P. L., Mizielinski, M. S., et al. 2015. “Tropical Cyclones in the UPSCALE Ensemble of High-Resolution Global Climate Models.” Journal of Climate 28(2): 574–596.

Rochkind, M. J. 2004. Advanced UNIX Programming, 2nd edn. Boston, MA: Addison-Wesley.

Roosta, S. H. 2000. Parallel Processing and Parallel Algorithms. Theory and Computation. New York: Springer-Verlag.

Roscoe, A. W. 1998. The Theory and Practice of Concurrency. Upper Saddle River, NJ: Prentice Hall.

Rosner, J. 2015. “Methods of Parallelizing Selected Computer Vision Algorithms for Multi-core Graphics Processors.” Ph.D thesis, Silesian University of Technology, Gliwice, Poland. http://delibra.bg.polsl.pl/dlibra/.

Rumbaugh, J. 1977. “A Dataflow Multiprocessor.” IEEE Transactions on Computers C-26: 1087–1095.

Sakaj, S., Kodama, Y., and Yamaguchi, Y.. 1991. “Prototype Implementation of a Highly Parallel Dataflow Machine EM-4.” Proc. of the International Parallel Processing Symposium, 1991, 278–286.

Sanders, J. and Kandrot, E.. 2010. CUDA by Example. An Introduction to General-purpose GPU Programming. Upper Saddle River, NJ: Addison-Wesley.

Satoh, M., Tomita, H., Yashiro, H., et al. 2014. “The Non-hydrostatic Icosahedral Atmospheric Model: Description and Development.” Progress in Earth and Planetary Science, 1(1): 1.

Savage, J. E. 1998. Models of Computation. Reading, MA: Addison-Wesley.

Savitch, W. J. and Stimson, M. J.. 1979. “Time Bounded Random Access Machines with Parallel Processing.” Journal of the ACM 26: 103–118.

Schauser, K. E. and Scheiman, C. J.. 1995. “Experience with Active Messages on the Meiko CS-2.” Proc. 9th International Symposium on Parallel Processing, April 1995, 140–149.

Schulz, M., Reuding, T., and Ertl, T.. 1998. “Analyzing Engineering Simulations in a Virtual Environment.” IEEE Computer Graphics and Applications 18 (6): 46–52.

Schwartz, J. 1983. A Taxonomic Table of Parallel Computers Based on 55 Designs. New York: Courant Institute, New York University, November 1983.

“Science on a Grand Scale.” 2015. Science & Technology Review, Lawrance Liver-more National Laboratory, September, 4–11.

Scott, L. R., Clark, T., and Bagheri, B.. 2005. Scientific Parallel Computing. Princeton, NJ: Princeton University Press.

Scott, S. and Thorson, G.. 1996. “The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus.” Proc. of the Symposium on Hot Interconnects, August 1996, 147–156.

Seitz, C. L. 1985. “The Cosmic Cube.” Communications of the ACM 28 (1): 22–33.

Sharp, J. A. 1985. Dataflow Computing. New York: John Wiley & Sons, Inc.

Shimokawabe, T., and Aoki, T.. 2010. “Multi-GPU Computing for Next-generation Weather forecasting – 145.0 TFlops 3990 GPUs on TSUBAME 2.0.” TSUBAME e-Science Journal (ESJ) 2: 11–16.

Shiva, S. G. 2006. Advanced Computer Architectures. Boca Raton, FL: CRC Press.

Shonkwiler, R. W. and Lefton, L.. 2006. An Introduction to Parallel and Vector Scientific Computing. New York: Cambridge University Press.

Sima, D. 1997. “Superscalar Instruction Issue.” IEEE Micro Magazine 17: 28–39.

Singh, J. P., Hennessy, J. L., and Gupta, A.. 1993. “Scaling Parallel Programs for Multiprocessors: Methodology and Examples.” IEEE Computer 26 (7): 42–50.

Sinnen, O. 2007. Task Scheduling for Parallel Systems. Hoboken, NJ: John Wiley & Sons, Inc.

Sipser, M. 2006. Introduction to the Theory of Computation, 2nd edn. Boston, MA: Thomson Course Technology.

Skillicorn, D. B. 1988. “A Taxonomy for Computer Architectures.” IEEE Computer 2146–2157.

Skillicorn, D. B. 2005. Foundations of Parallel Programming. Cambridge: Cambridge University Press.

Skillicorn, D., Hill, J. M. D., and McColl, W. F.. 1997. “Questions and Answers about BSP.” Scientific Programming 6 (3): 249–274.

Slotnick, D. L., Borck, W. C., and McReynolds, R. C.. 1967. “The Solomon Computer.” Proc. of the AFIPS Spring Joint Computer Conference, 22, New York, 1967, 97–107.

Smith, J. R. 1993. The Design and Analysis of Parallel Algorithms. New York: Oxford University Press.

Snir, M. 1985. “On Parallel Searching.” SIAM Journal on Computing 15: 688–708.

Snir, M., Otto, S. W., Huss-Lederman, S., Walker, D. W., and Dongarra, J.. 1998. MPI-The Complete Reference: Vol. 1. The MPI Core, 2nd edn. Cambridge, MA: MIT Press.

Snir, M. 2011. “Reduce and Scan.” In Encyclopedia of Parallel Computing, vol. 4, edited by Padua, D. A. (New York: Springer-Verlag), 1728–1736.

Solihin, Y. 2016. Fundamentals of Parallel Multicore Architecture. Boca Raton, FL: Chapman & Hall/CRC.

Sottile, M. J., Mattson, T. G., and Rasmussen, C. E.. 2010. Introduction to Concurrency in Programming Languages. Boca Raton, FL: Chapman & Hall/CRC.

Stallings, W. 2013. Computer Organization and Architecture, 9th edn. Upper Saddle River, NJ: Pearson Education.

Stallings, W. 2012. Operating Systems. Internals and Design Principles, 8th edn. Upper Saddle River, NJ: Pearson Education.

van der Steen, A. J. and Dongarra, J. J.. 2006, 2007. Overview of Recent Supercomputers. www.top500.org/.

Sterling, T. L., Salmon, J., Becker, D. J., and Savarese, D. F.. 1999. How to Build a Beowulf. Cambridge, MA: MIT Press.

Stojmenović, I. 1996. “Direct Interconnection Networks.” In Parallel and Distributed Computing Handbook, edited by Zamoya, A. Y. (New York: McGraw-Hill), 537–567.

Sullivan, H. and Bashkow, T. R.. 1977. “A Large Scale, Homogeneous, Fully Distributed Parallel Machine.” Proc. of the International Symposium on Computer Architecture, 1977, 105–124.

Sun, X-H. and Gustafson, J. L.. 1991. “Toward a Better Parallel Performance Metric.” Parallel Computing 17: 1093–1109.

Sun, X-H. and Ni, L. M.. 1990. “Another View of Parallel Speedup.” Supercomputing ’90 Proceedings, 324–333.

Sun, X-H. and Ni, L. M.. 1993. “Scalable Problems and Memory-bounded Speedup.” Journal of Parallel and Distributed Computing 19: 27–37.

Sun, X-H. and Zhu, J.. 1995. “Performance Considerations of Shared Virtual Memory Machines.” IEEE Transactions on Parallel and Distributed Systems 6 (11): 1185–1194.

Sun, X-H. and Rover, D. T.. 1994. “Scalability of Parallel Algorithm-machine Combinations.” IEEE Transactions on Parallel and Distributed Systems 5 (6): 599–613.

Talbi, E-G. 2006. Parallel Combinatorial Optimization. Hoboken, NJ: Wiley-Interscience.

Tanenbaum, A. S. 2006. Structured Computer Organization, 5th edn. Upper Saddle River, NJ: Pearson Education, Prentice Hall.

Tanenbaum, A. S. 2009. Modern Operating Systems, 3rd edn. Upper Saddle River, NJ: Prentice Hall.

Tanenbaum, A. S. and van Steen, M.. 2007. Distributed Systems. Principles and Paradigms, 2nd edn. Upper Saddle River, NJ: Pearson Education.

Taubenfeld, G. 2006. Synchronization Algorithms and Concurrent Programming. Harlow, UK: Pearson Education, Prentice Hall.

Tel, G. 1994. Introduction to Distributed Algorithms. Cambridge: Cambridge University Press.

Thekkath, R., Singh, A. P., Singh, J. P., Hennessy, J., and John, S.. 1997. “An Application-driven Evaluation of the Convex Exemplar SP-1200.” Proc. of the International Parallel Processing Symposium, June 1997, 8–17.

Thinking Machines Corporation. 1990. The CM-2 Technical Summary. Cambridge, MA: Thinking Machines Corporation.

Torán, J. 1993. “P-completeness.” In Lectures on Parallel Computation, edited by Gibbons, A. and Spirakis, P. (Cambridge: Cambridge University Press), 177–196.

Treleaven, P. C. 1985. “Control-driven, Data-driven and Demand-driven Computer Architecture.” Parallel Computing 2 (3): 287–288.

Trono, J. A. and Taylor, W. E.. 2000. “Further comments on ‘A Correct and Unrestrictive Implementation of General Semaphores’.” ACM SIGOPS Operating Systems Review 34 (3): 5–10.

Ungerer, T., Robiè, B., and Silc, J.. 2003. “A Survey of Processors with Explicit Multithreading.” ACM Computing Surveys 35 (1): 29–63.

Valiant, L. G. 1990. “A Bridging Model for Parallel Computation.” Communications of the ACM 33 (8): 103–111.

Valiant, L. G. 1990. “General Purpose Parallel Architectures.” In Handbook of Theoretical Computer Science, vol. A, edited by van Leeuven, J. (Amsterdam, The Netherlands: Elsevier), 944–971.

Van-Catledge, F. A. 1989. “Towards a General Model for Evaluating the Relative Performance Computer Systems.” International Journal of Supercomputer Applications 3 (2): 100–108.

van Emde Boas, P. 1990. “Machine Models and Simulations.” In Handbook of Theoretical Computer Science, Vol. A, edited by van Leeuven, J. (Amsterdam, The Netherlands: Elsevier), 1–66.

Vazirani, V. V. 2003. Approximation Algorithms. Berlin: Springer-Verlag.

Venter, J. C., Adams, M. D., Myers, E. W., et al. 2001. “The Sequence of the Human Genome.” Science 291: 1304–1351.

Vishkin, U. 1983. “Implementation of Simultaneous Memory Address Access in Models that Forbid It.” Journal of Algorithms 4: 45–50.

Vishkin, U., Caragea, G. C., and Lee, B. C.. 2008. “Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-on-chip Platform.” In Handbook of Parallel Computing. Models, Algorithms and Applications, edited by Rajasekaran, S. and Reif, J. (Boca Raton, FL: Chapman & Hall/CRC): 5-1–5-60.

Vos, J. B., Rizzi, A., Darracq, D., and Hirschel, E. H.. 2002. “Navier-Stokes Solvers in European Aircraft Design.” Progress in Aerospace Sciences 38: 601–697.

Wah, W. and Akl, S. G.. 1992. “Simulating Multiple Memory Accesses in Logarithmic Time and Linear Space.” The Computer Journal 35: 85–88.

Washington, W. M., Buja, L., and Craig, A.. 2009. “The Computational Future for Climate and Earth System Models: On the Path to Petaflop and Beyond.” Phil. Trans. R. Soc. A 367: 833–846. doi:10.1098/rsta.2008.0219.

Wilkinson, B. and Allen, M.. 1999. Parallel Programming. Techniques and Applications Using Networked Workstations and Parallel Computers. Upper Saddle River, NJ: Prentice Hall.

Wilson, G. V. 1993. “A Glossary of Parallel Computing Terminology.” IEEE Parallel & Distributed Technology February: 52–67.

Wilson, G. V. 1995. Practical Parallel Programming. Cambridge, MA: MIT Press.

Wilson, R. J. 1996. Introduction to Graph Theory, 4th edn. Harlow, UK: Addison Wesley Longman Ltd.

Winter, P. C., Hickey, G. J., and Fletcher, H. L.. 2002. Instant Notes. Genetics, 2nd edn. Milton Park, UK: BIOS Scientific Publishers.

Wolfe, M. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley: Redwood City, CA.

Worley, P. H. 1990. “The Effect of Time Constraints on Scaled Speedup.” SIAM Journal on Scientific and Statistical Computing 11 (5): 838–858.

Wulf, W. A. and Bell, C. G.. 1972. “C.mmp-A Multimicroprocessor.” Proc. of AFIPS Conference, 765–777.

Xue, M., Droegemeier, K. K., and Weber, D.. 2008. “Numerical Prediction of High-impact Local Weather: A Driver for Petascale Computing.” In Petascale Computing. Algorithms and Applications, edited by Bader, D. A. (Boca Raton, FL: Chapman & Hall/CRC), 103–124.

Yokokawa, M., Shoji, F., and Hasegawa, Y.. 2015. “The K Computer.” In Contemporary High Performance Computing: From Petascale toward Exascale, edited by Vetter, J. S. (Chapman & Hall/CRC, Boca Raton, FL), vol. II, 115–139.

Zhou, X. 1989. “Bridging the Gap between Amdahl's Law and Sandia Laboratory's Result.” Communications of the ACM 32 (8): 1014–1015.

Zorbas, J. R., Reble, D. J., and VanKooten, R. E.. 1989. “Measuring the Scalability of Parallel Computer Systems.” Supercomputing ’89 Proc., 832–841.

Introduction to Parallel Computing

This Book has been cited by the following publications. This list is generated based on data provided by Crossref.

Book description

Refine List

Actions for selected content:

Contents

Frontmatter
pp i-iv

Contents
pp v-x

List of Figures
pp xi-xviii

List of Tables
pp xix-xx

Preface
pp xxi-xxviii

1 - Concurrent Processes
pp 1-34

2 - Basic Models of Parallel Computation
pp 35-62

3 - Elementary Parallel Algorithms
pp 63-124

4 - Designing Parallel Algorithms
pp 125-174

5 - Architectures of Parallel Computers
pp 175-213

6 - Message-passing Programming
pp 214-242

7 - Shared-memory Programming
pp 243-282

Solutions to Selected Exercises
pp 283-304

Glossary
pp 305-322

References
pp 323-342

Index
pp 343-354

Metrics

Altmetric attention score

Full text views

Book summary page views

Accessibility standard: Unknown

Why this information is here

Accessibility Information

Introduction to Parallel Computing

Book description

Refine List

Actions for selected content:

Save Search

Contents

Metrics

Altmetric attention score

Full text views

Book summary page views

Accessibility standard: Unknown

Why this information is here

Accessibility Information