Center for Scientific Computation and Mathematical Modeling

Facilities & Computing > IBM SP/2

The CSCAMM Research SP system

IBM SP/2 Documentation

IBM RS/6000 SP Dawson cluster configuration

 

Announcement

System Information


NERSC SP Resource Center


Other Links

The IBM RS/6000 SP Home Page
The IBM RS/6000 SP Resource Center
IBM Advanced Computing Technology Center


 


 

Announcement

Welcome

Welcome to the CSCAMM  research dawson SP system!  The CSCAMM dawson cluster is a distributed-memory parallel system consisting of one frame of  total  8 wide IBM SP nodes, all nodes are running the same version of AIX operating system (AIX 5.1).

User questions and problems should be directed to CSCAMM system staff by contacting CSCAMM's System Administrator or by email .

    CSCAMM SysAdmin
    RM 4144 CSIC Building
    301-405-8923

     e-mail:

[top]

 

Scheduled Downtime

There is no scheduled downtime for dawson cluster at this time. 

[top]

 

System Information

Introduction

In September of 2001, Center for Scientific Computation and Mathematical Modeling  at University of Maryland acquired an IBM RS/6000 Scalable POWERparallel System(SP) for research computing. It came with 8 4-cpu Power3-II wide nodes. one 1-cpu RS/6000 44p series model 170 for control workstation and one 2-cpu 44p series 270 for front-end  machine.

All nodes on the SP frame are interconnected with SP high-speed switch MX2. Its peak data throughput is 1.5GB/second. The SP system is also connected to the Internet through GigaBit ethernet network.

All SP nodes are accessible through secure shell client. IBM compilers, and several public domain software are available on the SP. In addition, IBM software products designed to use the SP switch for messages transfer are installed, including Parallel ESSL,  Distributed Debugger and Parallel Environment. These products offer tools for compiling, running, monitoring, profiling, tuning and debugging parallel programs.

The SP nodes process batch jobs via the IBM Loadleveler load-balancing batch queueing system. Both serial and parallel jobs may be submitted to the dawson SP nodes. Interactive use of the dawson SP is allowed, but a limit of 20 minutes of elapsed CPU time per interactive process is strictly enforced.

Dawson SP system configuration

[top]


Get Started with Dawson SP

The following section will introduce the basic information for the dawson cluster. Please send email to if you have additional question or would like to have something else to appear here.

  • Request An Account on Dawson SP

Please send email to for more information.
The approval of such request requires a brief research plan which highlights the scope and relevance of the usage of the SP2 machine and supporting documentation by the chair/director of the unit submitting the request for such an account.

  • User Home Directory and other user file system

We  have 16x32 GB SSA disks attached to dawson08 and dawson07 (also attached to dawson06 for failover if dawson08 or dawson07 fails). These disks are divided into 6 RAID-5 arrays and run in parallel using the General Parellel File System (GPFS).  All nodes will be able to access  the /gpfs just like accessing a local file system. The files stored in GPFS striped across each RAID (which in turn are on two nodes). The data path is through high performance IBM SP switch connecting all nodes. It is not uncommon to achieve a sustained data transfer rate of 40 MB/sec or faster for files in GPFS .

  • Job Scheduled System

The dawson SP system uses IBM job schedule system LoadLeveler to manage user batch submission and execution. Based on the user inputs of various batch job requirements, the following job queue classes have been defined for dawson SP system.

Class Name 
Maximum Nodes
CPU Limit (hours)
Limit
pa
4
1
anytime
pb
4
4
midnight --- 8 am
pc
2
4

pd
2
8

pe
1
4

pf
1
8

sa (restricted)
1 processor
24
8 serial jobs (2 node total)
ia

20 minutes
interactive job
  • Submitting Batch Jobs to LoadLeveler

Before you can submit jobs to LoadLeveler, please add the following path to your PATH: /usr/lpp/LoadL/full/bin, for man pages please
add the man path to MANPATH: /usr/lpp/LoadL/full/man . All loadleverler commands are under  /usr/lpp/LoadL/full/bin .

1. SUBMIT JOBS USE BATCH FILE

$llsubmit batch.cmd

an example batch file is as follow: 

# @ executable = /usr/bin/poe
# @ arguments = /gpfs/home/username/mpiexample/main
# @ job_type = parallel
# @ output = out.$(jobid)
# @ error = err.$(jobid)
# @ min_processors = 2
# @ max_processors = 4
# @ wall_clock_limit = 00:01:00
# @ class = pd
# @ queue


2. Running POE  job interactively:

$ poe ./main -hfile ./nodes -nodes 4 -procs 4 -tasks_per_node

where file nodes containing all node hostname you want the job to be running on.

[top]


University of Maryland    

UM Home | Directories | Calendar
Maintained by CSCAMM
Direct questions and comments to

CSCAMM is part of the
College of Computer, Mathematical & Natural Sciences (CMNS)