LinuxHPC.org/Cluster Builder 1.3
    TORQUE Resource Manager
Translate to another language

TORQUE* Resource Manager

Terascale Open-Source Resource and QUEue Manager (or TORQUE) is an open source resource manager providing control over batch jobs and distributed compute nodes.

TORQUE is maintained by
Cluster Resources, Inc., and is based upon the source code for OpenPBS version 2.3.12, however TORQUE is neither affiliated with nor endorsed by Altair Engineering Inc. (the maintainers of OpenPBS).

TORQUE incorporates many scalability, flexibility, fault tolerance, and feature extension patches contributed by NCSA, OSC, USC, U.S. Department of Energy, Sandia, PNNL, U of Buffalo, Teragrid, and other HPC centers along side the enhancements provided by
Cluster Resources.

While TORQUE has a built-in scheduler, it is typically used solely as a resource manager, with a scheduler making requests to it. Resources managers provide the low-level functionality to start, hold, cancel and monitor jobs. Without these capabilities, a scheduler alone can not control jobs.


Integration

TORQUE integrates with advanced schedulers and workload managers such as Maui Cluster Scheduler and Moab Workload Manager to improve the overall utilization, scheduling, and administration on a cluster.

TORQUE Supports all variations of Linux and UNIX Platforms including:
  • AIX
  • BSD- Free BSD, Net BSD, Open BSD
  • HP-UX
  • IRIX
  • Linux
  • Solaris
  • Tru64
TORQUE also supports Mac OS X and client-commands on Windows using cygwin.


Features/Benefits
  • Initiates and manages serial and parallel batch jobs remotely (create, route, execute, modify and/or delete jobs)
  • Defines and implements resource policies that determine how much of each resource can be used by a job
  • Applys jobs to resources across multiple servers to accelerate job completion time
  • Collects information about the nodes within the cluster to determine which are in use and which are available
  • Scalability - use in systems with over 2,500 CPUs
  • Job Checkpointing
Improvements Over Open PBS

TORQUE provides enhancements over standard OpenPBS in the following areas:
  • Fault Tolerance
    • Additional failure conditions checked/handled
    • Node health check script support
  • Scheduling Interface
    • Extended query interface providing the scheduler with additional and more accurate information
    • Extended control interface allowing the scheduler increased control over job behavior and attributes
    • Allows the collection of statistics for completed jobs
  • Scalability
    • scales to very large clusters and is currently in use in systems with over 2,500 CPUs. This was accomplished by leveraging architectures designed with the US Department of Energy’s Scalable Systems Software Initiative.
    • Significantly improved server to MOM communication model.
    • Ability to handle larger jobs (over 2000 processors)
    • Ability to support larger server messages
  • Usability
    • Extensive logging additions
    • More human readable logging (ie no more 'error 15038 on command 42')

Status

TORQUE is an open-source distribution and is freely
available for download. Community patches are currently collected and incorporated into the TORQUE distribution. Efforts are currently focused on further functionality and fault tolerance enhancements extending numerous changes already made.

Users who are aware of issues in the current distribution can contribute patches. TORQUE users can subscribe to TORQUE’s mailing list or view the archive for questions, comments or patches. TORQUE is currently in use at many government, academic, and commercial sites throughout the world.

Related Concepts
The information used to create this article was provided by Cluster Resources, Inc.