Table of Contents
Raven
Raven is the new (late 2012) ARCCA cluster. It has 2000 Sandybridge cores with fast interconnect, and an additional 800 cores of Westmere which are set up for serial jobs.
If you have an account, you can log on with:
ssh ravenlogin.arcca.cf.ac.uk
Software
There is a lot of software available. You can see what there is by typing
module avail
You load a module by typing, e.g.
module load python/2.6.7
and can see what you currently have loaded with
module list
LSC Required Software
There is a bunch of LSC software that you need to have installed (before even thinking about installing your analysis software) so that you can run. It is detailed at this DASWG page. Running through the list of things:
- pkg-config: This is available by default on Raven (as /usr/bin/pkg-config). The installed version is 0.23, which seems to suffice
- gsl: This is available as a module. So run
module load gsl
to pick this up. Current version is 1.15, which is more recent than version 1.12 listed on the DASWG page.
- git: This is required to check out the LALSuite software stack. It is available in /usr/bin (version 1.7.1), a newer version (1.7.9) is available with
module load git
- fftw: This should be available as a module, but isn't working yet! I have installed it at /home/spxsf2/opt/lscsoft/. Note: the instructions call for fftw to be built twice – float and double. Not sure if this is still relevant and/or can be set up on Raven by default.
- User environment script: This is to set up your environment, same instructions as for libframe and friends. It seems more complicated than strictly necessary. It winds up making the file lscsoft-user-env.sh. If you want to pick up the software I have installed, you need to run
. /home/spxsf2/opt/lscsoft/lscsoft-user-env.sh
I think that the majority of this is unnecessary. For now, I'll just source it, but if we do try installing our software as modules, then I think we should look at what's really needed here.
LALSuite
The LALSuite software contains much of the LSC's analysis code, including the CBC analysis. Instructions for getting it are at https://www.lsc-group.phys.uwm.edu/daswg/docs/howto/lal-install.html. The key steps are:
- Check out the software with
git clone albert.einstein@ligo-vcs.phys.uwm.edu:/usr/local/git/lalsuite.git
- Then build it
./00boot mkdir build_master cd build_master ../configure --prefix=/home/spxsf2/opt/lalsuite/master make make install
- This seems to work, other than “../../../lalinspiral/src/LALInspiralSBankOverlap.c:22:19: fatal error: fftw3.h: No such file or directory compilation terminated.” which I got around by removing this file from the make.
- To pick up this install, you need to
. /home/spxsf2/opt/lalsuite/master/etc/lscsoftsrc
I have successfully run a piece of LAL code (lalapps_tmpltbank to be precise), and it seems to have worked!
Grid tools
Instructions to install the LIGO Data Grid Client from source are taken from here.
wget http://www.globus.org/ftppub/gt5/5.2/5.2.0/installers/src/gt5.2.0-all-source-installer.tar.gz tar xf gt5.2.0-all-source-installer.tar.gz mkdir gt5.2.0-all export GLOBUS_LOCATION=~/gt5.2.0-all/ export PATH=/bin:/usr/bin; export FLAVOUR=gcc64dbg cd gt5.2.0-all-source-installer ./configure --prefix=$GLOBUS_LOCATION --with-flavor=$FLAVOUR make gsi-openssh make postinstall . $GLOBUS_LOCATION/etc/globus-user-env.sh
The VDT Certificate Bundle can be installed using the instructions from the same page. (Note the link to the archive is out of date). I was also found necessary to update the certificates.
wget http://software.grid.iu.edu/pacman/cadist/1.32/osg-certificates-1.32.tar.gz tar xf osg-certificates-1.32.tar.gz -C $GLOBUS_LOCATION/share globus-update-certificate-dir
Now copy your Grid certificates into the .globus
folder in your home directory and make sure the permissions are correct.
chmod 600 ~/.globus/usercert.pem chmod 400 ~/.globus/userkey.pem
To source the install, you need:
. ~spxph/gt5.2.0-all/etc/globus-user-env.sh
Data
- We have data at
/scratch/LIGO/LDR/
. This can be found using, for example
ligo_data_find --observatory L --url-type file --gps-start-time 832326736 --gps-end-time 832328926 --output L-L1_RDS_C03_L2_CACHE-832326736-2190.lcf --lal-cache --type L1_RDS_C03_L2 --match localhost --server=ldr-arcca.phys.uwm.edu
- I believe that /scratch on Raven is a different file-server from /scratch on Merlin/GEO. Also, I think that LDR is running on GEO. So, the data_find query will likely return the correct answer, but this is only going to be true as long as we don't move the data around or ask for any new data. We need to get LDR running on a machine that can see the new cluster storage.
- We don't have a clear statement of the data that is available on Raven. It would be nice to know what we had on Merlin that's now on Raven. The only page I could find is this one for coma. I expect Merlin/Raven have about the same data, but it would be good to know for sure.
Workflows
- Raven is set up to run under PBSpro. Our workflows are written to output condor dags.
- We had been running by submitting requests to PBS to reserve nodes for condor which then reported back to GEO. This setup should work on the new cluster, provided we have a machine running condor that can talk to the nodes.
- In the medium term, we might set things up differently so that condor talks to the PBS submission machine and gets the right jobs submitted in the PBS queue. This removes the need for the machine running condor to talk to the nodes.
- It would be nice to set up something to run a few jobs as proof of principle. Is there any easy way to do this?
BAM on Raven
- Load these modules:
module load intel/intel module load bullxmpi/bullxmpi-1.1.17.1
- Use a standard MyConfig for bam – you don't need to point it to specific mpi libraries; just use mpicc as the compiler.
- (Re)compile bam:
make clean && make -j 8 bam
- Sample pbs script (adapted from the Merlin version with some changes):
#!/bin/bash #PBS -q workq #PBS -l select=8:ncpus=16:mpiprocs=16 #PBS -l place=scatter:excl #PBS -l walltime=1:00:00 #PBS -N R6_PN_64_128 #PBS -o R6_PN_64.out #PBS -e R6_PN_64.err #PROJECT=PR37 pardir=/home/spxmp/MachineConfig/ARCCA/ parfile=R6_PN_64.par bamexe=/home/spxmp/MachineConfig/bam/exe/bam cd /scratch/spxmp cp $pardir/$parfile . mpirun -np 128 $bamexe ./$parfile
- This pbs-script copies the parameter file over to the lustre filesystem in /scratch/$USER. This is where you should run, i.e. *not* in your home directory.
- Don't forget to mirror the data with rsync to minion, since files will get deleted after some time.
- At the moment we are using queue 'workq'. There is also a dedicated 'ligo' queue and we should maybe use that.