Eiger: Difference between revisions

Jump to navigation Jump to search
561 bytes added ,  22 February 2017
Line 48: Line 48:
=== Xeon Phi (Knights Landing, KNL) ===
=== Xeon Phi (Knights Landing, KNL) ===


The benchmark was run on a single KNL7210 processor (256 cores) set to quadrant mode and using the MCDRAM as cache. The environment variable OMP_PROC_BIND was set to false (if this is not done, the scheduler seems to put all threads on one core). XDS was compiled with the -xMIC-AVX512 option of ifort.  
The benchmark was run on a single KNL7210 processor (256 cores) set to quadrant mode and using the MCDRAM as cache. The environment variable OMP_PROC_BIND was set to false (if this is not done, the scheduler seems to put all threads on one core). XDS was compiled with the -xMIC-AVX512 option of ifort. These benchmarks were performed with "warm" operating system cache, which means that the first run of a given type didn't count because it had to read all data from disk.


Deviating from the above benchmark setup, BACKGROUND_RANGE was set to a more realistic value of 1 50 (instead of 1 9).
Deviating from the Xeon benchmark setup (above), BACKGROUND_RANGE was set to a more realistic value of 1 50 (instead of 1 9). The INIT numbers are therefore not comparable.
This gives
This gives
  COLSPOT:        elapsed wall-clock time      48.3 sec
  COLSPOT:        elapsed wall-clock time      48.3 sec
Line 60: Line 60:
  COLSPOT:        elapsed wall-clock time      49.3 sec
  COLSPOT:        elapsed wall-clock time      49.3 sec
  INTEGRATE: total elapsed wall-clock time      59.8 sec
  INTEGRATE: total elapsed wall-clock time      59.8 sec
Using a pre-release library that makes use of the <code>LIB=</code> [http://homes.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xds_parameters.html#LIB= option] of XDS:
Using, instead of the H5ToXds script, a pre-release library that makes use of the <code>LIB=</code> [http://homes.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xds_parameters.html#LIB= option] of XDS:
  INIT:            elapsed wall-clock time      30.4 sec
  INIT:            elapsed wall-clock time      30.4 sec
  COLSPOT:        elapsed wall-clock time      40.7 sec
  COLSPOT:        elapsed wall-clock time      40.7 sec
Line 82: Line 82:


Conclusions: since INIT benefits from more PROCESSORs, one could run XDS twice for fastest turnaround; the first run with JOBS=XYCORR INIT and a high number of processors (99 is maximum). The second run with JOB=COLSPOT IDXREF DEFPIX INTEGRATE CORRECT, and an optimized JOBS/PROCESSORS combination. The SNC4 mode is indeed fastest - to do better than the cache mode of the MCDRAM, one needs to adapt the forkcolspot and forkintegrate script- see [[Performance]].
Conclusions: since INIT benefits from more PROCESSORs, one could run XDS twice for fastest turnaround; the first run with JOBS=XYCORR INIT and a high number of processors (99 is maximum). The second run with JOB=COLSPOT IDXREF DEFPIX INTEGRATE CORRECT, and an optimized JOBS/PROCESSORS combination. The SNC4 mode is indeed fastest - to do better than the cache mode of the MCDRAM, one needs to adapt the forkcolspot and forkintegrate script- see [[Performance]].
For comparison, if these data are stored as CBFs, COLSPOT and INTEGRATE take 34.8 and 45.2 seconds, respectively, in SNC4 mode. However, with a cold cache (i.e. when data are read for the first time), the HDF5 files have an advantage because they are a factor 2.5 smaller, due to the better compression.


== Troubleshooting ==
== Troubleshooting ==
2,651

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu