This is an old revision of the document!
−Table of Contents
Raven
Raven is the new (late 2012) ARCCA cluster. It has 2000 Sandybridge cores with fast interconnect, and an additional 800 cores of Westmere which are set up for serial jobs.
If you have an account, you can log on with:
ssh ravenlogin.arcca.cf.ac.uk
Software
There is a lot of software available. You can see what there is by typing
module avail
You load a module by typing, e.g.
module load python/2.6.7
and can see what you currently have loaded with
module list
LSC Required Software
There is a bunch of LSC software that you need to have installed (before even thinking about installing your analysis software) so that you can run. It is detailed at this DASWG page. Running through the list of things:
- pkg-config: This is available by default on Raven (as /usr/bin/pkg-config). The installed version is 0.23, which seems to suffice
- gsl: This is available as a module. So run
module load gsl
to pick this up. Current version is 1.15, which is more recent than version 1.12 listed on the DASWG page.
- git: This is required to check out the LALSuite software stack. It should be available as a module, but isn't working yet. To work around the issue, I've installed version 1.8.1 in /home/spxsf2/opt/lscsoft/non-lsc.
- fftw: This should be available as a module, but isn't working yet! I have installed it at /home/spxsf2/opt/lscsoft/. Note: the instructions call for fftw to be built twice – float and double. Not sure if this is still relevant and/or can be set up on Raven by default.
- User environment script: This is to set up your environment, same instructions as for libframe and friends. It seems more complicated than strictly necessary. It winds up making the file lscsoft-user-env.sh. If you want to pick up the software I have installed, you need to run
. /home/spxsf2/opt/lscsoft/lscsoft-user-env.sh
I think that the majority of this is unnecessary. For now, I'll just source it, but if we do try installing our software as modules, then I think we should look at what's really needed here.
LALSuite
The LALSuite software contains much of the LSC's analysis code, including the CBC analysis. Instructions for getting it are at https://www.lsc-group.phys.uwm.edu/daswg/docs/howto/lal-install.html. The key steps are:
- Check out the software with
git clone albert.einstein@ligo-vcs.phys.uwm.edu:/usr/local/git/lalsuite.git
- Then build it
./00boot mkdir build_master cd build_master ../configure --prefix=/home/spxsf2/opt/lalsuite/master make make install
- This seems to work, other than “../../../lalinspiral/src/LALInspiralSBankOverlap.c:22:19: fatal error: fftw3.h: No such file or directory compilation terminated.” which I got around by removing this file from the make.
- To pick up this install, you need to
. /home/spxsf2/opt/lalsuite/master/etc/lscsoftsrc
I have successfully run a piece of LAL code (lalapps_tmpltbank to be precise), and it seems to have worked!
Grid tools
I'm trying to install the LIGO Data Grid Client, instructions here. It fails with the following issue:
/usr/bin/gcc -DHAVE_CONFIG_H -I. -I. -I.. -I/home/spxsf2/opt/lscsoft/non-lsc/globus/include/globus -I/home/spxsf2/opt/lscsoft/non-lsc/globus/include/globus/gcc64dbg -g -m64 -Wall -DGLOBUS_PREFIX=\"/home/spxsf2/opt/lscsoft/non-lsc/globus\" -DGLOBUS_LIBDIR=\"/home/spxsf2/opt/lscsoft/non-lsc/globus/lib64\" -c globus_thread.c -fPIC -DPIC -o .libs/globus_thread.o globus_thread.c:38:18: error: ltdl.h: No such file or directory globus_thread.c: In function 'globus_i_thread_pre_activate': globus_thread.c:162: error: 'lt_dlhandle' undeclared (first use in this function) globus_thread.c:162: error: (Each undeclared identifier is reported only once globus_thread.c:162: error: for each function it appears in.) globus_thread.c:162: error: expected ';' before 'impl_lib' globus_thread.c:172: warning: implicit declaration of function 'lt_dlinit' globus_thread.c:173: warning: implicit declaration of function 'lt_dladdsearchdir' globus_thread.c:196: error: 'impl_lib' undeclared (first use in this function) globus_thread.c:196: warning: implicit declaration of function 'lt_dlopenext' globus_thread.c:199: warning: implicit declaration of function 'lt_dlerror' globus_thread.c:199: warning: format '%s' expects type 'char *', but argument 3 has type 'int' globus_thread.c:204: warning: implicit declaration of function 'lt_dlsym' globus_thread.c:207: warning: format '%s' expects type 'char *', but argument 2 has type 'int' globus_thread.c: In function 'globus_thread_exit': globus_thread.c:1383: warning: cast from pointer to integer of different size make[3]: *** [globus_thread.lo] Error 1 make[3]: Leaving directory `/home/spxsf2/opt/src/gt5.2.3-all-source-installer/source-trees/common/source/library' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/spxsf2/opt/src/gt5.2.3-all-source-installer/source-trees/common/source' make[1]: *** [all] Error 2 make[1]: Leaving directory `/home/spxsf2/opt/src/gt5.2.3-all-source-installer/source-trees/common/source' make: *** [globus_common-make] Error 2
Any suggestions??
Data
- We have data at
/scratch/LIGO/LDR/
. This can be found using, for example
ligo_data_find --observatory L --url-type file --gps-start-time 832326736 --gps-end-time 832328926 --output L-L1_RDS_C03_L2_CACHE-832326736-2190.lcf --lal-cache --type L1_RDS_C03_L2 --match localhost --server=ldr-arcca.phys.uwm.edu
- I believe that /scratch on Raven is a different file-server from /scratch on Merlin/GEO. Also, I think that LDR is running on GEO. So, the data_find query will likely return the correct answer, but this is only going to be true as long as we don't move the data around or ask for any new data. We need to get LDR running on a machine that can see the new cluster storage.
- We don't have a clear statement of the data that is available on Raven. It would be nice to know what we had on Merlin that's now on Raven. The only page I could find is this one for coma. I expect Merlin/Raven have about the same data, but it would be good to know for sure.
Workflows
- Raven is set up to run under PBSpro. Our workflows are written to output condor dags.
- We had been running by submitting requests to PBS to reserve nodes for condor which then reported back to GEO. This setup should work on the new cluster, provided we have a machine running condor that can talk to the nodes.
- In the medium term, we might set things up differently so that condor talks to the PBS submission machine and gets the right jobs submitted in the PBS queue. This removes the need for the machine running condor to talk to the nodes.
- It would be nice to set up something to run a few jobs as proof of principle. Is there any easy way to do this?
BAM on Raven
- Load these modules:
module load intel/intel module load bullxmpi/bullxmpi-1.1.17.1
- Recompile bam:
make clean && make -j 8 bam
- Sample pbs script (adapted from the Merlin version with some changes):
#!/bin/bash #PBS -q workq #PBS -l select=8:ncpus=16:mpiprocs=16 #PBS -l place=scatter:excl #PBS -l walltime=1:00:00 #PBS -N R6_PN_64_128 #PBS -o R6_PN_64.out #PBS -e R6_PN_64.err #PROJECT=PR37 mpirun -np 128 /home/spxmp/MachineConfig/bam/exe/bam /home/spxmp/MachineConfig/ARCCA/R6_PN_64.par
At the moment we are using queue 'workq'. There is also a dedicated 'ligo' queue and we should maybe use that.