HACC Simulation Data Portal


Frequently Asked Questions


Q: Who produced this data? How was it produced?
The HACC team produced the data under major allocations granted by the leadership computing facilities. The resulting datasets are being made available on this site for analysis by the broader cosmology community. HACC, the Hardware/Hybrid Accelerated Cosmology Code (https://press3.mcs.anl.gov/cpac/projects/hacc), has been developed to run efficiently on all currently available HPC platforms. The HACC core team is based at Argonne and has worked with many researchers at other institutions.


Q: Can I transfer data to another institution for analysis?
Yes! The intention is that you would transfer this data to another site to use it. The site provides mechanisms for selecting models with particular cosmological parameters, and limiting by redshift or datatype, to reach a size that is easy to transfer, requires less space, and will be easy to analyze. So you only select the data that you really need.

Q: What does the directory structure of the data signify?
Once you have transferred files, the directory structure will take a form such as

MiraTitanU/Grid/M000/L2100/HACC001/analysis/Particles/STEP323/m000.mpicosmo.323
          where
          - MiraTitanU is the simulation campaign
          - M000 is the cosmological model
          - L2100 indicates the box size (2.1GPc)
          - HACC001 is a specific realization
          - STEP323 indicates the 323rd global timestep

This file is a header file that describes the layout of the data files. It will be accompanied by N similarly-named files with a trailing '#N'; these files hold the actual particle data. This subfiling is the result of optimizing I/O to the filesystem during simulation. To view the full dataset, one must use the entire set of N files.

Q: How is the data formatted? How can I read the data?
The data is stored in GenericIO format. GenericIO is a write-optimized library for writing self-describing scientific data files on large-scale parallel file systems. The GenericIO repo (https://trac.alcf.anl.gov/projects/genericio) includes code for two paths to working with data:
  1. C++ code for reading and writing GenericIO-formatted data in parallel
  2. a Python wrapper for reading data serially (suitable for smaller data).
An example of using the Python interface to read a halo properties file follows.

$ python
Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct  6 2017, 12:04:38) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import genericio as gio
>>> gio.inspect('m000-99.fofproperties')
Number of Elements: 1691
[data type] Variable name
---------------------------------------------
[i 32] fof_halo_count
[i 64] fof_halo_tag
[f 32] fof_halo_mass
[f 32] fof_halo_mean_x
[f 32] fof_halo_mean_y
[f 32] fof_halo_mean_z
[f 32] fof_halo_mean_vx
[f 32] fof_halo_mean_vy
[f 32] fof_halo_mean_vz
[f 32] fof_halo_vel_disp

(i=integer,f=floating point, number bits size)
>>> fof_halo_count = gio.read('m000-99.fofproperties','fof_halo_count')
>>> print(fof_halo_count)
[[624 681  69 ... 184  72  97]]
>>> data = gio.read('m000-99.fofproperties',['fof_halo_count','fof_halo_tag','fof_halo_mean_x','fof_halo_mean_y','fof_halo_mean_z'])
# returns parallel arrays in input order
>>> data
array([[6.24000000e+02, 6.81000000e+02, 6.90000000e+01, ...,
        1.84000000e+02, 7.20000000e+01, 9.70000000e+01],
       [1.32871000e+05, 1.03333000e+05, 5.48230000e+04, ...,
        1.90935200e+06, 2.05578600e+06, 7.64180000e+04],
       [1.52459240e+00, 1.43878233e+00, 1.36675692e+00, ...,
        1.14827515e+02, 1.27592453e+02, 1.27921860e+02],
       [1.43614788e+01, 3.65754814e+01, 3.79349136e+01, ...,
        6.43497162e+01, 6.44614944e+01, 8.80533829e+01],
       [3.65939808e+00, 3.32679443e+01, 3.58395233e+01, ...,
        1.05952095e+02, 1.08691956e+02, 1.26013718e+02]])


Q: What are the units?
Length scales are given in Mpc/h, velocities are given in comoving peculiar velocities measured in km/s, and halo masses are given in Msun/h.

Q: How should I acknowledge use of this data in my publications?
Please include the following papers and acknowledgements in publications that use this data.

Cite in all publications
Habib, S., Pope, A., Finkel, H., Frontiere, N., Heitmann, K., Daniel, D., Fasel, P., Morozov, V., Zagaris, G., Peterka, T. and Vishwanath, V., 2016. HACC: Simulating sky surveys on state-of-the-art supercomputing architectures. New Astronomy, 42, pp.49-65.
Heitmann, K., Uram, T., Finkel, H., Frontiere, N., Habib, S., Pope, A., Rangel, E., Hollowed, J., Korytov, D., Larsen, P., Allen, B., Chard, K., Foster, I., 2019. HACC Cosmological Simulations: First Data Release, arXiv:1904.11966

For MiraTitanU data
Heitmann, K., Bingham, D., Lawrence, E., Bergner, S., Habib, S., Higdon, D., Pope, A., Biswas, R., Finkel, H., Frontiere, N., Bhattacharya, S., 2016. The Mira–Titan Universe: Precision Predictions for Dark Energy Surveys. The Astrophysical Journal, 820(2), p.108.
An award of computer time was provided by the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under contract DE-AC02-06CH11357. This research also used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.

For OuterRim data
Heitmann, K., Finkel, H., Pope, A., Morozov, V., Frontiere, N., Habib, S., Rangel, E., Uram, T., Korytov, D., Child, H., Flender, S., Insley, J., Rizzi, S., 2019. The Outer Rim Simulation: A Path to Many-Core Supercomputers, arXiv:1904.11970
An award of computer time was provided by the INCITE program. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

For QContinuum data
Heitmann, K., Frontiere, N., Sewell, C., Habib, S., Pope, A., Finkel, H., Rizzi, S., Insley, J., Bhattacharya, S., 2015. The Q continuum simulation: harnessing the power of GPU accelerated supercomputers. The Astrophysical Journal Supplement Series, 219(2), p.34.
An award of computer time was provided by the INCITE program. This research also used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.