Both sides previous revisionPrevious revisionNext revision | Previous revision |
arcca:raven [2012/12/04 12:16] – [Grid tools] stephen.fairhurst@LIGO.ORG | arcca:raven [2012/12/20 13:11] (current) – [Grid tools] stephen.fairhurst@LIGO.ORG |
---|
* pkg-config: This is available by default on Raven (as /usr/bin/pkg-config). The installed version is 0.23, which seems to suffice | * pkg-config: This is available by default on Raven (as /usr/bin/pkg-config). The installed version is 0.23, which seems to suffice |
* gsl: This is available as a module. So run <code bash> module load gsl</code> to pick this up. Current version is 1.15, which is more recent than version 1.12 listed on the DASWG page. | * gsl: This is available as a module. So run <code bash> module load gsl</code> to pick this up. Current version is 1.15, which is more recent than version 1.12 listed on the DASWG page. |
* git: This is required to check out the LALSuite software stack. It **should** be available as a module, but isn't working yet. To work around the issue, I've installed version 1.8.1 in /home/spxsf2/opt/lscsoft/non-lsc. | * git: This is required to check out the LALSuite software stack. It is available in /usr/bin (version 1.7.1), a newer version (1.7.9) is available with <code bash> module load git</code> |
* fftw: This **should** be available as a module, but isn't working yet! I have installed it at /home/spxsf2/opt/lscsoft/. **Note**: the instructions call for fftw to be built twice -- float and double. Not sure if this is still relevant and/or can be set up on Raven by default. | * fftw: This **should** be available as a module, but isn't working yet! I have installed it at /home/spxsf2/opt/lscsoft/. **Note**: the instructions call for fftw to be built twice -- float and double. Not sure if this is still relevant and/or can be set up on Raven by default. |
* libframe/metaio/glue: This is LIGO specific software. It is not available as a module, but in the longer term, we should see if we can get it installed. For now, I downloaded the "stable" version from [[http://www.lsc-group.phys.uwm.edu/daswg/download/software/source/ | here]] and have installed (following the instructions [[https://www.lsc-group.phys.uwm.edu/daswg/docs/howto/lscsoft-install.html | here]]) at /home/spxsf2/opt/lscsoft/. | * libframe/metaio/glue: This is LIGO specific software. It is not available as a module, but in the longer term, we should see if we can get it installed. For now, I downloaded the "stable" version from [[http://www.lsc-group.phys.uwm.edu/daswg/download/software/source/ | here]] and have installed (following the instructions [[https://www.lsc-group.phys.uwm.edu/daswg/docs/howto/lscsoft-install.html | here]]) at /home/spxsf2/opt/lscsoft/. |
===== Grid tools ===== | ===== Grid tools ===== |
| |
I'm trying to install the LIGO Data Grid Client, instructions [[https://www.lsc-group.phys.uwm.edu/lscdatagrid/doc/installclient-source.html | here]]. It fails with the following issue: | Instructions to install the LIGO Data Grid Client from source are taken from [[https://www.lsc-group.phys.uwm.edu/lscdatagrid/doc/installclient-source.html | here]]. |
| |
<code bash>/usr/bin/gcc -DHAVE_CONFIG_H -I. -I. -I.. -I/home/spxsf2/opt/lscsoft/non-lsc/globus/include/globus -I/home/spxsf2/opt/lscsoft/non-lsc/globus/include/globus/gcc64dbg -g -m64 -Wall -DGLOBUS_PREFIX=\"/home/spxsf2/opt/lscsoft/non-lsc/globus\" -DGLOBUS_LIBDIR=\"/home/spxsf2/opt/lscsoft/non-lsc/globus/lib64\" -c globus_thread.c -fPIC -DPIC -o .libs/globus_thread.o | <code> |
globus_thread.c:38:18: error: ltdl.h: No such file or directory | wget http://www.globus.org/ftppub/gt5/5.2/5.2.0/installers/src/gt5.2.0-all-source-installer.tar.gz |
globus_thread.c: In function 'globus_i_thread_pre_activate': | tar xf gt5.2.0-all-source-installer.tar.gz |
globus_thread.c:162: error: 'lt_dlhandle' undeclared (first use in this function) | mkdir gt5.2.0-all |
globus_thread.c:162: error: (Each undeclared identifier is reported only once | export GLOBUS_LOCATION=~/gt5.2.0-all/ |
globus_thread.c:162: error: for each function it appears in.) | export PATH=/bin:/usr/bin; |
globus_thread.c:162: error: expected ';' before 'impl_lib' | export FLAVOUR=gcc64dbg |
globus_thread.c:172: warning: implicit declaration of function 'lt_dlinit' | |
globus_thread.c:173: warning: implicit declaration of function 'lt_dladdsearchdir' | cd gt5.2.0-all-source-installer |
globus_thread.c:196: error: 'impl_lib' undeclared (first use in this function) | ./configure --prefix=$GLOBUS_LOCATION --with-flavor=$FLAVOUR |
globus_thread.c:196: warning: implicit declaration of function 'lt_dlopenext' | make gsi-openssh |
globus_thread.c:199: warning: implicit declaration of function 'lt_dlerror' | make postinstall |
globus_thread.c:199: warning: format '%s' expects type 'char *', but argument 3 has type 'int' | . $GLOBUS_LOCATION/etc/globus-user-env.sh |
globus_thread.c:204: warning: implicit declaration of function 'lt_dlsym' | |
globus_thread.c:207: warning: format '%s' expects type 'char *', but argument 2 has type 'int' | |
globus_thread.c: In function 'globus_thread_exit': | |
globus_thread.c:1383: warning: cast from pointer to integer of different size | |
make[3]: *** [globus_thread.lo] Error 1 | |
make[3]: Leaving directory `/home/spxsf2/opt/src/gt5.2.3-all-source-installer/source-trees/common/source/library' | |
make[2]: *** [all-recursive] Error 1 | |
make[2]: Leaving directory `/home/spxsf2/opt/src/gt5.2.3-all-source-installer/source-trees/common/source' | |
make[1]: *** [all] Error 2 | |
make[1]: Leaving directory `/home/spxsf2/opt/src/gt5.2.3-all-source-installer/source-trees/common/source' | |
make: *** [globus_common-make] Error 2 | |
</code> | </code> |
| |
Any suggestions?? | The VDT Certificate Bundle can be installed using the instructions from [[https://www.lsc-group.phys.uwm.edu/lscdatagrid/doc/installclient-source.html | the same page]]. (Note the link to the archive is out of date). I was also found necessary to update the certificates. |
| |
| <code> |
| wget http://software.grid.iu.edu/pacman/cadist/1.32/osg-certificates-1.32.tar.gz |
| tar xf osg-certificates-1.32.tar.gz -C $GLOBUS_LOCATION/share |
| globus-update-certificate-dir |
| </code> |
| |
| Now copy your Grid certificates into the ''.globus'' folder in your home directory and make sure the permissions are correct. |
| |
| <code> |
| chmod 600 ~/.globus/usercert.pem |
| chmod 400 ~/.globus/userkey.pem |
| </code> |
| |
| To source the install, you need: |
| <code> |
| . ~spxph/gt5.2.0-all/etc/globus-user-env.sh |
| </code> |
| |
===== Data ===== | ===== Data ===== |
| |
Do we have any data and where is it? | * We have data at <code bash>/scratch/LIGO/LDR/</code>. This can be found using, for example |
| <code bash>ligo_data_find --observatory L --url-type file --gps-start-time 832326736 --gps-end-time 832328926 --output L-L1_RDS_C03_L2_CACHE-832326736-2190.lcf --lal-cache --type L1_RDS_C03_L2 --match localhost --server=ldr-arcca.phys.uwm.edu</code> |
| * I **believe** that /scratch on Raven is a different file-server from /scratch on Merlin/GEO. Also, I think that LDR is running on GEO. So, the data_find query will likely return the correct answer, but this is only going to be true as long as we don't move the data around or ask for any new data. We need to get LDR running on a machine that can see the new cluster storage. |
| * We don't have a clear statement of the data that is available on Raven. It would be nice to know what we had on Merlin that's now on Raven. The only page I could find is [[cardiff:computing:available_data | this one]] for coma. I expect Merlin/Raven have about the same data, but it would be good to know for sure. |
| |
| ===== Workflows ===== |
| |
| * Raven is set up to run under PBSpro. Our workflows are written to output condor dags. |
| * We had been running by submitting requests to PBS to reserve nodes for condor which then reported back to GEO. This setup should work on the new cluster, provided we have a machine running condor that can talk to the nodes. |
| * In the medium term, we might set things up differently so that condor talks to the PBS submission machine and gets the right jobs submitted in the PBS queue. This removes the need for the machine running condor to talk to the nodes. |
| * It would be nice to set up something to run a few jobs as proof of principle. Is there any easy way to do this? |
| |
| ====== BAM on Raven ====== |
| * Load these modules: |
| <code bash> |
| module load intel/intel |
| module load bullxmpi/bullxmpi-1.1.17.1 |
| </code> |
| * Use a standard MyConfig for bam -- you don't need to point it to specific mpi libraries; just use mpicc as the compiler. |
| * (Re)compile bam: |
| <code bash>make clean && make -j 8 bam</code> |
| |
| * Sample pbs script (adapted from the Merlin version with some changes): |
| <code bash> |
| #!/bin/bash |
| |
| #PBS -q workq |
| #PBS -l select=8:ncpus=16:mpiprocs=16 |
| #PBS -l place=scatter:excl |
| #PBS -l walltime=1:00:00 |
| |
| #PBS -N R6_PN_64_128 |
| #PBS -o R6_PN_64.out |
| #PBS -e R6_PN_64.err |
| |
| #PROJECT=PR37 |
| |
| pardir=/home/spxmp/MachineConfig/ARCCA/ |
| parfile=R6_PN_64.par |
| bamexe=/home/spxmp/MachineConfig/bam/exe/bam |
| |
| cd /scratch/spxmp |
| cp $pardir/$parfile . |
| |
| mpirun -np 128 $bamexe ./$parfile |
| </code> |
| |
| * This pbs-script copies the parameter file over to the lustre filesystem in /scratch/$USER. This is where you should run, i.e. *not* in your home directory. |
| * Don't forget to mirror the data with rsync to minion, since files will get deleted after some time. |
| * At the moment we are using queue 'workq'. There is also a dedicated 'ligo' queue and we should maybe use that. |
| |