User Tools

Site Tools


arcca:raven

This is an old revision of the document!


Raven

Raven is the new (late 2012) ARCCA cluster. It has 2000 Sandybridge cores with fast interconnect, and an additional 800 cores of Westmere which are set up for serial jobs.

If you have an account, you can log on with:

ssh ravenlogin.arcca.cf.ac.uk

Software

There is a lot of software available. You can see what there is by typing

 module avail

You load a module by typing, e.g.

 module load python/2.6.7

and can see what you currently have loaded with

 module list

LSC Required Software

There is a bunch of LSC software that you need to have installed (before even thinking about installing your analysis software) so that you can run. It is detailed at this DASWG page. Running through the list of things:

  • pkg-config: This is available by default on Raven (as /usr/bin/pkg-config). The installed version is 0.23, which seems to suffice
  • gsl: This is available as a module. So run
     module load gsl

    to pick this up. Current version is 1.15, which is more recent than version 1.12 listed on the DASWG page.

  • git: This is required to check out the LALSuite software stack. It should be available as a module, but isn't working yet. To work around the issue, I've installed version 1.8.1 in /home/spxsf2/opt/lscsoft/non-lsc.
  • fftw: This should be available as a module, but isn't working yet! I have installed it at /home/spxsf2/opt/lscsoft/. Note: the instructions call for fftw to be built twice – float and double. Not sure if this is still relevant and/or can be set up on Raven by default.
  • libframe/metaio/glue: This is LIGO specific software. It is not available as a module, but in the longer term, we should see if we can get it installed. For now, I downloaded the “stable” version from here and have installed (following the instructions here) at /home/spxsf2/opt/lscsoft/.
  • User environment script: This is to set up your environment, same instructions as for libframe and friends. It seems more complicated than strictly necessary. It winds up making the file lscsoft-user-env.sh. If you want to pick up the software I have installed, you need to run
    . /home/spxsf2/opt/lscsoft/lscsoft-user-env.sh

    I think that the majority of this is unnecessary. For now, I'll just source it, but if we do try installing our software as modules, then I think we should look at what's really needed here.

LALSuite

The LALSuite software contains much of the LSC's analysis code, including the CBC analysis. Instructions for getting it are at https://www.lsc-group.phys.uwm.edu/daswg/docs/howto/lal-install.html. The key steps are:

  • Check out the software with
    git clone albert.einstein@ligo-vcs.phys.uwm.edu:/usr/local/git/lalsuite.git
  • Then build it
    ./00boot
    mkdir build_master
    cd build_master
    ../configure --prefix=/home/spxsf2/opt/lalsuite/master
    make
    make install
  • This seems to work, other than “../../../lalinspiral/src/LALInspiralSBankOverlap.c:22:19: fatal error: fftw3.h: No such file or directory compilation terminated.” which I got around by removing this file from the make.
  • To pick up this install, you need to
    . /home/spxsf2/opt/lalsuite/master/etc/lscsoftsrc

I have successfully run a piece of LAL code (lalapps_tmpltbank to be precise), and it seems to have worked!

Grid tools

I'm trying to install the LIGO Data Grid Client, instructions here. It fails with the following issue:

/usr/bin/gcc -DHAVE_CONFIG_H -I. -I. -I.. -I/home/spxsf2/opt/lscsoft/non-lsc/globus/include/globus -I/home/spxsf2/opt/lscsoft/non-lsc/globus/include/globus/gcc64dbg -g -m64 -Wall -DGLOBUS_PREFIX=\"/home/spxsf2/opt/lscsoft/non-lsc/globus\" -DGLOBUS_LIBDIR=\"/home/spxsf2/opt/lscsoft/non-lsc/globus/lib64\" -c globus_thread.c  -fPIC -DPIC -o .libs/globus_thread.o
globus_thread.c:38:18: error: ltdl.h: No such file or directory
globus_thread.c: In function 'globus_i_thread_pre_activate':
globus_thread.c:162: error: 'lt_dlhandle' undeclared (first use in this function)
globus_thread.c:162: error: (Each undeclared identifier is reported only once
globus_thread.c:162: error: for each function it appears in.)
globus_thread.c:162: error: expected ';' before 'impl_lib'
globus_thread.c:172: warning: implicit declaration of function 'lt_dlinit'
globus_thread.c:173: warning: implicit declaration of function 'lt_dladdsearchdir'
globus_thread.c:196: error: 'impl_lib' undeclared (first use in this function)
globus_thread.c:196: warning: implicit declaration of function 'lt_dlopenext'
globus_thread.c:199: warning: implicit declaration of function 'lt_dlerror'
globus_thread.c:199: warning: format '%s' expects type 'char *', but argument 3 has type 'int'
globus_thread.c:204: warning: implicit declaration of function 'lt_dlsym'
globus_thread.c:207: warning: format '%s' expects type 'char *', but argument 2 has type 'int'
globus_thread.c: In function 'globus_thread_exit':
globus_thread.c:1383: warning: cast from pointer to integer of different size
make[3]: *** [globus_thread.lo] Error 1
make[3]: Leaving directory `/home/spxsf2/opt/src/gt5.2.3-all-source-installer/source-trees/common/source/library'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/spxsf2/opt/src/gt5.2.3-all-source-installer/source-trees/common/source'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/spxsf2/opt/src/gt5.2.3-all-source-installer/source-trees/common/source'
make: *** [globus_common-make] Error 2

Any suggestions??

Data

  • We have data at
    /scratch/LIGO/LDR/

    . This can be found using, for example

ligo_data_find --observatory L --url-type file --gps-start-time 832326736 --gps-end-time 832328926 --output L-L1_RDS_C03_L2_CACHE-832326736-2190.lcf --lal-cache  --type L1_RDS_C03_L2 --match localhost --server=ldr-arcca.phys.uwm.edu
  • I believe that /scratch on Raven is a different file-server from /scratch on Merlin/GEO. Also, I think that LDR is running on GEO. So, the data_find query will likely return the correct answer, but this is only going to be true as long as we don't move the data around or ask for any new data. We need to get LDR running on a machine that can see the new cluster storage.
  • We don't have a clear statement of the data that is available on Raven. It would be nice to know what we had on Merlin that's now on Raven. The only page I could find is this one for coma. I expect Merlin/Raven have about the same data, but it would be good to know for sure.

Workflows

  • Raven is set up to run under PBSpro. Our workflows are written to output condor dags.
  • We had been running by submitting requests to PBS to reserve nodes for condor which then reported back to GEO. This setup should work on the new cluster, provided we have a machine running condor that can talk to the nodes.
  • In the medium term, we might set things up differently so that condor talks to the PBS submission machine and gets the right jobs submitted in the PBS queue. This removes the need for the machine running condor to talk to the nodes.
  • It would be nice to set up something to run a few jobs as proof of principle. Is there any easy way to do this?

BAM on Raven

  • Load these modules:
module load intel/intel
module load bullxmpi/bullxmpi-1.1.17.1
  • Recompile bam:
make clean && make -j 8 bam
  • Sample pbs script (adapted from the Merlin version with some changes):
#!/bin/bash
 
#PBS -q workq
#PBS -l select=8:ncpus=16:mpiprocs=16
#PBS -l place=scatter:excl
#PBS -l walltime=1:00:00
 
#PBS -N R6_PN_64_128
#PBS -o R6_PN_64.out
#PBS -e R6_PN_64.err
 
#PROJECT=PR37
 
mpirun -np 128 /home/spxmp/MachineConfig/bam/exe/bam /home/spxmp/MachineConfig/ARCCA/R6_PN_64.par

At the moment we are using queue 'workq'. There is also a dedicated 'ligo' queue and we should maybe use that.

arcca/raven.1354719467.txt.gz · Last modified: 2012/12/05 14:57 by michael.puerrer@LIGO.ORG