Fault-tolerant parallel scheduling of tasks on a heterogeneous high-performance workstation cluster

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-3 speed). The CBS algorithm uses a parallel random neighborhood search which works by refining multiple different initial schedules simultaneously using different workstations. The workstations communicate periodically to exchange their best solutions found thus far in order to direct the search to more promising regions in the search space. Heterogeneity of machines is exploited by the biased partitioning of the search space. The parallel random neighborhood search is fault-tolerant in that the workload of a failed workstation is automatically redistributed to other workstations so that the search can continue. We have implemented the CBS algorithm as a core function of our on-going development of SSI middleware for a Sun workstation cluster.

Original languageEnglish
Pages (from-to)299-314
Number of pages16
JournalJournal of Supercomputing
Volume19
Issue number3
DOIs
Publication statusPublished - Jul 2001
Externally publishedYes

Keywords

  • Cluster computing
  • Fault-tolerant scheduler
  • Heterogeneous systems
  • Neighborhood search
  • Parallel algorithms
  • Task graphs

Fingerprint

Dive into the research topics of 'Fault-tolerant parallel scheduling of tasks on a heterogeneous high-performance workstation cluster'. Together they form a unique fingerprint.

Cite this