Eiger: Difference between revisions

Eiger (view source)

Revision as of 18:32, 27 February 2021

6,058 bytes removed , 27 February 2021

→‎Less efficient way of processing Eiger data, using conversion to CBF

Kay

Bureaucrats

2,651

edits

@@ Line 3: / Line 3: @@
 == General aspects ==
 # The framecache of XDS uses memory to save on I/O; it saves a frame in RAM after reading it for the first time. By default, each XDS (or mcolspot/mintegrate) job stores NUMBER_OF_IMAGES_IN_CACHE=DELPHI/OSCILLATION_RANGE images in memory which corresponds to one DELPHI-sized batch of data. This requires (number of pixels)*(number of jobs)*4 Bytes per frame which amounts to 72 MB in case of the Eiger 16M when running with MAXIMUM_NUBER_OF_JOBS=1. (If DELPHI=20 and OSCILLATION_RANGE=0.05 your computer thus has to have at least 400*72MB = 29GB of memory for each job!). If memory allocation fails, the fallback is to the old behaviour of reading each frame three times (instead of once).
-# Dectris provides the ''Neggia'' library ([https://github.com/dectris/neggia source],[https://www.dectris.com/support/downloads/sign-in binary]) for native reading of HDF5 files, which can be loaded into XDS at runtime using the <code>[[LIB]]=</code> [http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#LIB= keyword]. With this library (which can also be found at ftp://turn5.biologie.uni-konstanz.de/pub/linux_bin for Linux, and at ftp://turn5.biologie.uni-konstanz.de/pub/mac_bin for MacOS), no conversion to CBF or otherwise is necessary. It is therefore just as fast and efficient to read HDF5 files as any other file format. At Diamond Light Source, a different HDF5 format was developed, and this requires the [https://github.com/DiamondLightSource/durin/releases/latest ''Durin'' plugin]. The latter can also read the HDF5 files written by the Dectris software.
+# Dectris provides the ''Neggia'' library ([https://github.com/dectris/neggia source],[https://www.dectris.com/support/downloads/sign-in binary]) for native reading of HDF5 files, which can be loaded into XDS at runtime using the <code>[[LIB]]=</code> [http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#LIB= keyword]. With this library (which can also be found at https://{{SERVERNAME}}/pub/linux_bin for Linux, and at https://{{SERVERNAME}}/pub/mac_bin for MacOS), no conversion to CBF or otherwise is necessary. It is therefore just as fast and efficient to read HDF5 files as any other file format. At Diamond Light Source, a different HDF5 format was developed, and this requires the [https://github.com/DiamondLightSource/durin/releases/latest ''Durin'' plugin]. The latter can also read the HDF5 files written by the Dectris software.
 A suitable [[XDS.INP]] may have been written by the data collection (beamline) software. Latest [[generate_XDS.INP]] (<code>generate_XDS.INP xxx_master.h5</code>) or the [[Eiger#Script_for_generating_XDS.INP_from_master.h5|XDS_from_H5.py script]] can be used if XDS.INP is not available.
@@ Line 14: / Line 14: @@
 Update 2016-06-05 (Toine Schreurs): a HDF5 file may be compressed with [https://www.hdfgroup.org/HDF5/docNewFeatures/FileSpace/h5repack.htm h5repack], ''e.g.'' by <code>h5repack -i <in.h5> -o <out.h5> -f GZIP=6</code> (6 is the default compression level of gzip). This should be a good way to reduce the size of master files while keeping them compatible with processing, but needs to be tested. Whether h5repack uses parallel gzip is not clear from the docs.
-== A benchmark ==
-Any comparisons should be based on a common dataset. I downloaded from https://www.dectris.com/datasets.html their latest dataset
-ftp://dectris.com/EIGER_16M_Nov2015.tar.bz2 (900 frames) and processed it on a single unloaded CentOS7.2 64bit machine with dual Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz , HT enabled (showing 32 processors in /proc/cpuinfo), on a local XFS filesystem (all defaults), with four JOBs and 12 PROCESSORS (the XDS.INP that Dectris provides suggests 8 JOBs of 12 PROCESSORS, but I changed that).
-On multi-socket machines, there are additional considerations having to do with their NUMA architecture - see [[Performance]].
-=== Xeon Phi (Knights Landing, KNL) ===
-The benchmark was run on a single KNL7210 processor (256 cores) set to quadrant mode and using the MCDRAM as cache. '''The environment variable OMP_PROC_BIND was set to false, or KMP_AFFINITY set to none''' (if this is not done, the scheduler seems to put all threads on one core). XDS was compiled with the -xMIC-AVX512 option of ifort. These benchmarks were performed with "warm" operating system cache, which means that the first run of a given type didn't count because it had to read all data from disk.
-Deviating from the Xeon benchmark setup, BACKGROUND_RANGE was set to a more realistic value of 1 50 (instead of 1 9).
-Using the Dectris library that makes use of the <code>[[LIB]]=</code> [http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#LIB= option] of XDS:
- INIT:            elapsed wall-clock time       30.4 sec
- COLSPOT:         elapsed wall-clock time       40.7 sec
- INTEGRATE: total elapsed wall-clock time       52.9 sec
-Now additionally running with <code>numactl --preferred=1 xds_par</code> after having modified the forkintegrate script such that it starts mintegrate_par with the same numactl parameters:
- INIT.LP:         elapsed wall-clock time       29.8 sec
- COLSPOT:         elapsed wall-clock time       40.0 sec
- INTEGRATE: total elapsed wall-clock time       51.3 sec
-This was running with a 8GB/8GB split (''hybrid'') MCDRAM. The same run, but with 8 JOBS and 32 PROCESSORS, takes
- INIT.LP:         elapsed wall-clock time       25.3 sec
- COLSPOT:         elapsed wall-clock time       40.1 sec
- INTEGRATE: total elapsed wall-clock time       53.1 sec
-Back to 16 JOBS and 16 PROCESSORS, but with MCDRAM in ''flat'' mode und <code>numactl --preferred=1 xds_par</code> (thus using all 16GB for arrays, and nothing for cache):
- INIT.LP:         elapsed wall-clock time       29.5 sec
- COLSPOT:         elapsed wall-clock time       38.6 sec
- INTEGRATE: total elapsed wall-clock time       53.2 sec
-Now setting the KNL to SNC4 mode, and the MCDRAM to cache (using it in flat mode is impractical because the --preferred argument takes only 1 argument; to determine the correct argument requires scripting):
- INIT.LP:         elapsed wall-clock time       29.6 sec
- COLSPOT.LP:      elapsed wall-clock time       37.8 sec
- INTEGRATE: total elapsed wall-clock time       49.6 sec
-If the library is compiled with -mtune=knl, all times are about 1 second less.
-Conclusions: since INIT benefits from more PROCESSORs, one could run XDS twice for fastest turnaround; the first run with JOBS=XYCORR INIT and a high number of processors (99 is maximum). The second run with JOB=COLSPOT IDXREF DEFPIX INTEGRATE CORRECT, and an optimized JOBS/PROCESSORS combination. The SNC4 mode is fastest in this example - to do better than the cache mode of the MCDRAM, one needs to adapt the forkcolspot and forkintegrate script- see [[Performance]]. Other examples (with more frames) confirmed that cache mode is best for quadrant and SNC4, and resulted in quadrant mode being superior to SNC4. To optimally use the latter, one needs to thoroughly understand and properly use the relevant environment variables, in particular KMP_AFFINITY and KMP_PLACE_THREADS.
-For comparison, if these data are stored as CBFs, COLSPOT and INTEGRATE take 34.8 and 45.2 seconds, respectively, in SNC4 mode. However, with a cold cache (i.e. when data are read for the first time), the HDF5 files have an advantage because they are a factor 2.5 smaller, due to the better compression.
 == Troubleshooting ==
 * make sure that master.h5 and the corresponding data.h5 files remain together as collected, and '''don't rename the data.h5 files''' - they are referred to from master.h5.  If you change the names of the data.h5 files or copy them somewhere else, that link is broken unless you fix master.h5.
-== Script for generating XDS.INP from master.h5 ==
+== Script by Andreas Förster (Dectris) for generating XDS.INP from master.h5 ==
 <div class="mw-collapsible mw-collapsed">
 Expand code section below (i.e. click on blue <code>[Expand]</code> at the end of this line if there is no code visible), download it and save as XDS_from_H5.py .
@@ Line 663: / Line 623: @@
 * Set MAXIMUM_NUMBER_OF_JOBS= and MAXIMUM_NUMBER_OF_PROCESSORS= to similar values whose product is slightly smaller than the total number of threads on your system.
-= Less efficient way of processing Eiger data, using conversion to CBF=
+= Less efficient way of processing Eiger data, using conversion to CBF =
-Since the release of Neggia, a plugin for XDS that parallelizes the reading of images from HDF5 data, conversion to H5ToXds should no longer required in most usage scenarios. The sections below nevertheless describe this possibility, since preliminary experience with some less common network file systems (apparently GPFS, but not NFS) seems to indicate low performance of Neggia.
+Since the release of Neggia, a plugin for XDS that parallelizes the reading of images from HDF5 data, conversion by H5ToXds should no longer be required in most usage scenarios. The sections below nevertheless describe this possibility, since preliminary experience with some less common network file systems (apparently GPFS, but not NFS) seems to indicate low performance of Neggia.
-Conversion program options: Dectris provides [https://www.dectris.com/news.html?page=2 H5ToXds] (Linux only!). That program converts (as the name indicates) the HDF5 files to CBF files; however, it does not write the geometry and other information into the CBF header (therefore, [[generate_XDS.INP]] or MOSFLM does not work with these files). Alternatives are GlobalPhasing's hdf2mini-cbf program (needs autoPROC license) or, from http://www.mrc-lmb.cam.ac.uk/harry/imosflm/ver721/downloads, the eiger2cbf-osx or eiger2cbf-linux program written by T. Nakane. The latter programs do write a useful CBF header.
+Conversion program options: Dectris provides [https://www.dectris.com/news.html?page=2 H5ToXds] (Linux only!). That program converts (as the name indicates) the HDF5 files to CBF files; however, it does not write the geometry and other information into the CBF header (therefore, [[generate_XDS.INP]] or MOSFLM does not work with these files). Alternatives are GlobalPhasing's hdf2mini-cbf program (does ''not'' need autoPROC license) or, from http://www.mrc-lmb.cam.ac.uk/harry/imosflm/ver721/downloads, the eiger2cbf-osx or eiger2cbf-linux program written by T. Nakane. The latter programs do write a useful CBF header.
-For faster processing, the [[Eiger#A_script_for_faster_XDS_processing_of_CBF-converted Eiger data|shell script]] below should be copied to /usr/local/bin/H5ToXds and made executable (<code>chmod a+rx /usr/local/bin/H5ToXds*</code>). The binary H5ToXds then should be named e.g. /usr/local/bin/H5ToXds.bin - note the .bin filename extension! The script ''also'' uses RAM to speed up processing; it uses it for fast storage of the temporary CBF file that H5ToXds/eiger2cbf/hdf2mini-cbf writes, and that each parallel thread ("processor") of XDS reads. The amount of additional RAM this requires is modest (about (number of pixels)*(number of threads) bytes).
+H5ToXds and eiger2cbf-osx / eiger2cbf-linux do not work with files produced at Diamond Light Source.
-== Benchmark using H5ToXds ==
+== A script for faster XDS processing of CBF-converted Eiger data (this is only shown out of historic interest) ==
-This was run on a single unloaded CentOS7.2 64bit machine with dual Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz , HT enabled (showing 32 processors in /proc/cpuinfo), on a local XFS filesystem (all defaults), with four JOBs and 12 PROCESSORS. The numbers below refer to the H5ToXds binary as used in the script below.
-The timing, using the XDS (BUILT=20151231), is on the first run
+For faster processing, the [[Eiger#A_script_for_faster_XDS_processing_of_CBF-converted Eiger data|shell script]] below should be copied to /usr/local/bin/H5ToXds and made executable (<code>chmod a+rx /usr/local/bin/H5ToXds*</code>). The binary H5ToXds then should be named e.g. /usr/local/bin/H5ToXds.bin - note the .bin filename extension! The script ''also'' uses RAM to speed up processing; it uses it for fast storage of the temporary CBF file that H5ToXds/eiger2cbf/hdf2mini-cbf writes, and that each parallel thread ("processor") of XDS reads. The amount of additional RAM this requires is modest (about (number of pixels)*(number of threads) bytes).
- INIT:  elapsed wall-clock time       12.0 sec
- COLSPOT: elapsed wall-clock time       44.9 sec
- INTEGRATE: total elapsed wall-clock time       65.1 sec
- CORRECT: elapsed wall-clock time        2.9 sec
- Total elapsed wall-clock time for XDS      133.6 sec
-When I repeat this, I get
- Total elapsed wall-clock time for XDS      128.3 sec
-Repeat once again:
- Total elapsed wall-clock time for XDS      129.3 sec
-So a bit of cache-warming helps, but not much. This machine has 64GB RAM. From the output of "top", the highest memory usage occurs during INTEGRATE, when each of the mintegrate_par processes consumes up to 7.4% of the memory. In other words, in this way less than 20GB of total memory are used. "top" shows a CPU consumption around (on average) 4 times 650%.
-The number of JOBs and PROCESSORs could be optimized. I tried 6 JOBs and get
- Total elapsed wall-clock time for XDS      120.1 sec
-so there's still some room for improvement.
-With program versions as of 2016-03-10, eiger2cbf-linux is practically as fast as the H5ToXds binary; hdf2mini-cbf is somewhat slower.
-When unpacking the .h5 files to .cbf files and processing those, I get on the same machine and with same processing parameters:
- Total elapsed wall-clock time for XDS       96.3 sec
-which indicates a 24% overhead due to the HDF5-to-CBF conversion. However, one has to add to this the time for the HDF5-to-CBF conversion, which is (with 18 parallel H5ToXds jobs each converting 50 frames) 34.2 sec, so overall the "on-the-fly" route using the script below is faster than the "pre-conversion" route, at least on this machine.
-== A script for faster XDS processing of CBF-converted Eiger data ==
 <pre>
 #!/bin/bash

Eiger: Difference between revisions

Eiger (view source)

Revision as of 18:32, 27 February 2021

Navigation menu

Search