Translate to another language

Software

High Performance Computing software consists of a few basic software categories as defined and organized on this Web site.
  • The first category is the underlying operating system.
  • The second category is defined as cluster middleware, which includes communication/message passing interface, resource managers, advanced schedulers, workload managers, and other cluster management tools. Additionally, some sites deploy their systems with a package of open source tools which are referred to on this Web site as cluster building kits.
  • The next category is defined as general middleware since its components may also be used in non-HPC environments.
  • Other listings in the software category include information on grid middleware and end-user applications designed to enhance a cluster and perform job or industry specific tasks.

Operating Systems
Cluster Middleware
General Middleware


Operating Systems

While Linux is the most popular operating system category in an HPC environment, Unix based and Mac OS X based clusters have a strong presence as well, with Windows currently being the least adopted in this space.

Cluster Middleware


Having the proper cluster middleware can make the difference between 20% utilization of your system and 99% utilization.
  • Communication/Message Passing Interface – Communication software such as MPI, MPICH and other variants are critical to enable parallel computing, as they are parallel libraries that contain methods needed to establish communication between the processes executed on the clients.
  • Resource Manager – One core piece of software is the Resource Manager. e.g., LSF, TORQUE, PBS Pro, SGE. Its function is to do basic node state monitoring and to receive job submission requests and execute the requests on the compute node. Some resource managers have basic scheduling or policy controls, but none of the current resource managers have the more complete set of workload management and advanced scheduling capabilities found in tools of those categories.
  • Advanced Scheduler – Advanced schedulers such as Moab or Maui add valuable efficiency improvements to a resource manager, often improving system utilization by an additional 10 to 35% above that which can be achieved by a resource manager under any real level of complexity.
  • Workload Manager – Workload managers such as Moab (Note: Moab provides workload management and advanced scheduling) add additional policy controls, automated event management, unification of multiple points of management, diagnostics and other capabilities that simplify the overall administrative, management and end-user experience.
  • Other cluster management tools – Cover areas such as hardware monitors, performance analysis tools and billing and allocation managers.
    • Hardware monitors consist of proprietary tools that are provided by hardware vendors to manage their own hardware but not that of others. Generic open source management tools, such as Ganglia and Supermon, manage multiple types of hardware environments.
    • Performance analysis tools are used to evaluate performance levels and to help diagnose bottlenecks of various types.
    • Billing and allocation managers, such as Gold or QBank, are tools used to manage larger, more complex resource sharing environments by applying a credit system and resource allocation rule set that compliments workload managers or advanced schedulers such as Moab or Maui with additional tracking, and charging mechanisms.
  • Web Portals come in three principle types.
    • The first of these is the generalized job submission portal, which is intended to allow job submission and integrated workload management and advanced scheduling into a single easy-to-use Web-based interface for multiple resource manager types.
    • The second is the resource manager specific job submission portal, which is typically bundled with the resource manager and allows submission to its associated resource manager.
    • The third is the custom job submission portal, which is built manually or integrates a generalized job submission portal or a resource manager specific portal in addition to other site portal capabilities.
  • Cluster Building Kits – Some sites deploy their system with a package of open source tools. These packages are referred to on this Web site as Cluster Building Kits. Kit components usually include all the necessary software needed to install, configure and manage a cluster. e.g., file systems, resource managers, communications, hardware monitors, compilers.

General Middleware
 

General middleware components may also be used in non-HPC environments. These include items such as Compilers, Debuggers, File Systems, Data Staging Tools, License Managers, Provisioning Mangers and System Management tools.