direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Batch Distribution – A Tool to Run a Batch of Computation Jobs Distributed

Christian Hoene
Telecommunication Networks Group
TU-Berlin

June 8th, 2004

Description
Quantitative stochastic simulations are useful tools for studying performance of stochastic dynamic systems, but they can consume much time and computing resources. To overcome these limits, parallel or distributed computation is needed. The tool BatchDistribution distributes a batch of computation jobs (e.g. simulations) on a multiple computers. BatchDistribution has some advantages:
  1. It is robust against failures of computers or jobs. If a computer fails to compute a job, another computer starts this job. If a computer or a job has failed multiple times, it is skipped.
  2. It has a graphical user interface to monitor the progress and computation performance, and it predicts the time that is needed to finish all jobs.
  3. It is fast. In the ending phase unfinished jobs are distributed among multiple computers. Only the fastest computation of a job is used. Other computers working on the same job are stopped. Thus, we avoid waiting for a slow computer for a long time.
Usage
Download the distribution file from www2.tkn.tu-berlin.de/equipment/bd/release.zip (ZIP, 4,0 KB) (version June 14th, 2004) and unzip it in a directory of your choice. If not yet available, install ssh and java. Create a file, which contains a list of all computers, which you like to use. Ensure that you can log on the computers without any password query and warning message. Make a file containing a list of command lines, which describe the jobs. Next, start “BatchDistribution” with:

java –cp <Dir of BatchDistribution/classes> bd.Main <list of hosts> <list of jobs>

Next, a window pops up (fig 1.) and contains status information. After all jobs have been finished, the working directory contains log files with the standard and error output of the jobs, e.g.

_home_hoene_voip_simus_test2_remote__home_hoene_voip_audio_3_c_m01s42_sw_512309488_on_verleihnix.out.0
_home_hoene_voip_simus_test2_remote__home_hoene_voip_audio_3_c_m01s42_sw_512309488_on_verleihnix.err.0


If no output has been produced, no file is stored.

If you want to run a single command on every computer, use the following command:

java –cp <Dir of BatchDistribution/classes> bd.RunEverywhere <list of hosts> “<command including arguments>”
Requirements
The tool uses ssh to log on the other computer. The input of pass words or phrases is not possible during runtime. Therefore, an automatic log in is required. To allow ssh to connect to another computer, the following steps have to be performed.

  1. Generate a public key with ssh-keygen. Normally each user wishing to use SSH with RSA or DSA authentication runs ssh-keygen once to create the authentication key in $HOME/.ssh/identity, $HOME/.ssh/id_dsa or $HOME/.ssh/id_rsa. Also, a pubic key is stored in a file with the same name but ``.pub'' appended. The program also asks for a passphrase. The passphrase must be empty.
  2. Copy the public key to the file on the $HOME/.ssh/authorized_keys on each remote computer.
  3. Log in to every computer that should be used for simulations with ssh. Ssh might ask for allowance if it log in to the computer the first time. The second log in will happen with out any questions and without any passwords.
Remarks
This software is open source. The software webpage is www2.tkn.tu-berlin.de/equipment/bd/. Please send bugs and remarks to hoene@tkn.tu-berlin.de.
Figure 1: Graphical Monitor of BatchDistribution
Figure 1: Graphical Monitor of BatchDistribution
Lupe

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions