Analysis Preservation with HEPData and Rivet

The goal is to preserve analyses in such a way that the data is easily accessible and reproducible, while providing an interface for the generator tuning community. This can be achieved by storing the results of the analysis in a standardized and machine-readable format on HEPData while using Rivet to provide an interface for the generator tuning community, which includes applied cuts and selections of the analysis.

This document describes the process of preserving an analysis by the example of the analysis LHCB-PAPER-2021-010, based on a presentation given by Lars Kolk in the Simulation Meeting.

HEPData

HEPData is an open-access repository used to preserve analyses in high energy physics. It is used to store chosen data-points and tables of a published paper in a machine-readable way. The service provided by the HEP community and is hosted by CERN. The website offers a huge catalogue of preserved analyses already and ensures long-term preservation of experimental results. Each analysis is stored in a record, which is a collection of files containing the analysis results. Each record consists of the following files:

  • A submission.yaml file containing the metadata of the analysis. This file also links to tables which store the actual data of the analysis.

  • A .yaml file for each table in the record. These files contain the data of the analysis in a standardized format.

For in-detail instructions on how to set up a record, please refer to the official HEPData documentation. However, it is also good practice to have a look at existing records and use them as a template for new records. To do so, you can simply download the record from HEPData by clicking on the Download all button on the record page and then clicking YAML with resource files. This will download a zip file containing all the files of the record.

../../_images/yaml_with_resource_files.png

The following code snippets show part of the record from the analysis LHCB-PAPER-2021-010. For the complete files, please refer to the corresponding HEPData record. In this example, the file cross_section_0.yaml stores the differential cross-section of prompt long-lived particles with negative charge, measured in bins of transverse momentum and eta. The measured cross-section is stored in a HEPData-table containing several 1D histograms. The corresponding correlation-matrix is stored in cross_section_correlation_0.yaml as a 2D histogram.

---
additional_resources: 
  - { location: "https://lhcbproject.web.cern.ch/Publications/p/LHCb-PAPER-2021-010.html", description: "LHCb page dedicated to the measurement" }
comment: ''
---
name: "Table 1"
additional_resources: # additional references
  - {
      location: "https://cds.cern.ch/record/2777220",
      description: "web page with related paper and auxiliary material",
    }
location: Supplementary information
description: Double differential cross-sections of prompt inclusive production of long-lived negatively charged particles as a function of transverse momentum and pseudorapidity.
keywords: # used for searching, possibly multiple values for each keyword
  - { name: reactions, values: [p p --> charged X] }
  - { name: observables, values: [DSIG/DETARAP/DPT] }
  - { name: cmenergies, values: [13000] }
  - {
      name: phrases,
      values:
        [
          QCD,
          forward physics,
          particle and resonance production,
          minimum bias,
          cross section,
        ],
    }
data_file: cross_section_0.yaml
---
name: "Table 3"
additional_resources: # additional references
  - {
      location: "https://cds.cern.ch/record/2777220",
      description: "web page with related paper and auxiliary material",
    }

location: Supplementary information
description: Correlation for the uncertainties of the differential cross-section of prompt inclusive production of long-lived charged particles.
keywords: # used for searching, possibly multiple values for each keyword
  - { name: reactions, values: [p p --> charged X] }
  - { name: observables, values: [CORR, DSIG/DETARAP/DPT] }
  - { name: cmenergies, values: [13000] }
  - {
      name: phrases,
      values:
        [
          QCD,
          forward physics,
          particle and resonance production,
          minimum bias,
          cross section,
          correlation,
        ],
    }
data_file: cross_section_correlation_0.yaml

When developing a Rivet plugin, it is preferable to have 1D histograms, as this makes the plugin easier to implement. It is therefore recommended to avoid 2D histograms for data-points which will be used in the final Rivet plugin. In this case, the 2D histogram is a correlation matrix which will not be part of the Rivet plugin, so it is not a problem.

To validate your new record, you can use HEPData validator or HEPData sandbox. Once your files are validated, you can initiate a submission by contacting the HEPData coordinator responsible for your experiment.

Once you have initiated a submission, you can upload your files, which will then be reviewed by the HEPData team. Your record will be published and available to the public once the review is complete. In the meantime, you can already download the yoda-file needed for the Rivet plugin. To do so, click on the Download all button on the record page and then click YODA. This will download a zip file containing the yoda-file.

../../_images/yoda.png

Rivet

Introduction

Rivet is a system for validating Monte Carlo event generators. It is a service provided by the HEP community and already offers a large catalogue of Rivet analyses. The goal of the Rivet project is to provide a convenient way to preserve experimental analyses in a standardized format so that they can be used for generator tuning. The Rivet project provides a wishlist of analyses which should be implemented as Rivet plugins. Be sure to check it out, your analysis may be on the list!

Developing a plugin

In order to develop a Rivet plugin, you first need to install Rivet. To do so, please refer to the official installation instructions. Once Rivet is installed, you can start to develop your plugin.

Each Rivet plugin consists of

  • A <AnalysisName>.cc file containing the analysis code

  • A <AnalysisName>.info file containing the metadata of the analysis

  • A <AnalysisName>.yoda file containing the data of the analysis, which you can get from the HEPData record

  • A <AnalysisName>.plot file containing the plot definitions of the analysis

Here, AnalysisName follows the following naming-convention: <Experiment>_<publication_year>_I<InspireID>, where InspireID is a unique identifier that can be found in the URL of the analysis. For example, the corresponding inspirehep record of LHCB-PAPER-2021-010 is https://inspirehep.net/literature/1889335. Therefore, the resulting name of this analysis is LHCB_2021_I1889335. If you are unsure about the name of your plugin, you can also check the .yoda file of your HEPData record, as it contains the name of the plugin in the header.

Once you know the name of your plugin, you can either create the four source files manually or use the rivet-mkanalysis <AnalysisName> command to create them automatically.

For in-detail instructions on how to develop a Rivet plugin, please refer to the official Rivet documentation. However, it is also good practice to have a look at existing plugins and use them as a template for new plugins. Already published Rivet plugins can either be found on the Rivet website or in the Rivet repository.

Analysis Code

In this section, the main components of an analysis code will be briefly explained, followed by an explicit example of the plugin LHCB_2021_I1889335. As indirectly mentioned in the previous section, the analysis code of an analysis is stored in just one file, i.e. without header declaration. The Rivet team chose this approach as analyses are almost never inherited from. Furthermore, this makes the code more compact and thus easier to read. Every Rivet analysis is composed of three parts:

  • A no-argument constructor

  • A minimal hook into the plugin system

  • Three analysis methods: init, analyze and finalize

The minimal hook into the plugin system is used to register the analysis with the Rivet plugin system. The analysis methods are used to load the data from the .yoda file and book the histograms in the init method, loop over the events and fill the histograms in the analyze method and then normalize them in the finalize method.

The following code snippets show the source code of the plugin LHCB_2021_I1889335 as an example, which’s analysis code will be explained in the following.

// -*-C++ - *-
#include "Rivet/Analysis.hh"
#include "Rivet/Projections/AliceCommon.hh"

namespace Rivet
{

  /// @brief Inelastic section in pp collisions at 13 TeV for charged particles in LHCb acceptance
  class LHCB_2021_I1889335 : public Analysis
  {
  public:
    /// Constructor
    RIVET_DEFAULT_ANALYSIS_CTOR(LHCB_2021_I1889335);

    /// @name Analysis methods
    //@{

    /// Book histograms and initialise projections before the run
    void init()
    {

      // Register projection for primary particles
      declare(ALICE::PrimaryParticles(Cuts::etaIn(ETAMIN, ETAMAX) && Cuts::abscharge > 0), "APRIM");

      {Histo1DPtr tmp; _h_ppInel_neg.add(2.0, 2.5, book(tmp, 1, 1, 1));}
      {Histo1DPtr tmp; _h_ppInel_neg.add(2.5, 3.0, book(tmp, 1, 1, 2));}
      {Histo1DPtr tmp; _h_ppInel_neg.add(3.0, 3.5, book(tmp, 1, 1, 3));}
      {Histo1DPtr tmp; _h_ppInel_neg.add(3.5, 4.0, book(tmp, 1, 1, 4));}
      {Histo1DPtr tmp; _h_ppInel_neg.add(4.0, 4.5, book(tmp, 1, 1, 5));}
      {Histo1DPtr tmp; _h_ppInel_neg.add(4.5, 4.8, book(tmp, 1, 1, 6));}
      
      {Histo1DPtr tmp; _h_ppInel_pos.add(2.0, 2.5, book(tmp, 2, 1, 1));}
      {Histo1DPtr tmp; _h_ppInel_pos.add(2.5, 3.0, book(tmp, 2, 1, 2));}
      {Histo1DPtr tmp; _h_ppInel_pos.add(3.0, 3.5, book(tmp, 2, 1, 3));}
      {Histo1DPtr tmp; _h_ppInel_pos.add(3.5, 4.0, book(tmp, 2, 1, 4));}
      {Histo1DPtr tmp; _h_ppInel_pos.add(4.0, 4.5, book(tmp, 2, 1, 5));}
      {Histo1DPtr tmp; _h_ppInel_pos.add(4.5, 4.8, book(tmp, 2, 1, 6));}
    }

    void analyze(const Event &event)
    {

      const Particles cfs = apply<ALICE::PrimaryParticles>(event, "APRIM").particles();

      for (const Particle& myp : cfs)
      {
        if (myp.charge() < 0)
        {
          _h_ppInel_neg.fill(myp.pseudorapidity(), myp.momentum().pT());
        }
        else
        {
          _h_ppInel_pos.fill(myp.pseudorapidity(), myp.momentum().pT());
        }
      }
    }

    /// Normalise histograms etc., after the run
    void finalize()
    {
      const double scale_factor = crossSection() / millibarn / sumOfWeights();
      std::vector<double> binWidths = {0.5, 0.5, 0.5, 0.5, 0.5, 0.3};
      for (size_t i = 0; i < binWidths.size(); i++)
      {
        _h_ppInel_neg.histos()[i]->scaleW(scale_factor / binWidths[i]);
        _h_ppInel_pos.histos()[i]->scaleW(scale_factor / binWidths[i]);
      }
    }

    /// @name Histogram
    BinnedHistogram _h_ppInel_neg;
    BinnedHistogram _h_ppInel_pos;

    /// Cut constants
    const double ETAMIN = 2.0, ETAMAX = 4.8;
  };

  RIVET_DECLARE_PLUGIN(LHCB_2021_I1889335);
}

In LHCB_2021_I1889335.cc the class LHCB_2021_I1889335 is defined, which inherits from Analysis. The default constructor is then declared via the RIVET_DEFAULT_ANALYSIS_CTOR method, while the hook into the plugin system is realized via the RIVET_DECLARE_PLUGIN method. The histograms _h_ppInel_neg and _h_ppInel_neg, as well as the cut constants ETAMIN and ETAMAX are declared as public members of the class, alongside the three analysis methods init, analyze and finalize.

The init method will be called once at the beginning of the analysis. It is used to book the histograms of the analysis and load the data from the .yoda file. The init method also declares a projection, which will later be called in the plugin’s apply method. These projections are used to select certain particles from an event. In this case, the ALICE::PrimaryParticles projection is used to select the primary particles of an event. This projection was chosen as ALICE’s definition of primary particles is identical to this analysis’ definition of prompt particles. In the alternative version of the plugin, the ChargedFinalState projection is used instead, which selects all charged final states of an event. This however, requires us to select prompt particles by hand in the analyze method.

The analyse method calls the projection defined in the init method and loops over all particles in the event. For each particle, the analyze method checks whether the particle is a prompt particle and fills the corresponding histograms. While looping over all particles it is possible to apply additional cuts, for example on the transverse momentum of the particle. It is also possible to apply handwritten selections such as the selection for prompt particles in the alternative version of the plugin.

Similar to the init method, the finalize method is called once at the end of the analysis. It is used to normalize the histograms which where filled in the analyze method.

Running a plugin

The code can be compiled by running rivet-build AnalysisName.cc, which will generate a RivetAnalysis.so file. Once this file is generated, the plugin can be used by running rivet -a AnalysisName <inputfile>.hepmc. Here, inputfile is a HepMC file containing the events to be analysed. Rivet provides example HepMC files which can be used to test the analysis. You can also use the (modified) Monte Carlo generator of your choice to generate some events for you. Running your analysis will generate a Rivet.yoda file containing the histograms of the analysis, which can than be plotted by running rivet-mkhtml --errs Rivet.yoda. This will generate a Rivet_plots folder containing the plots of the analysis. Remember that these plots can be further customized by modifying the .plot file of the analysis.

Below is an example plot generated by the plugin LHCB_2021_I1889335 on the LHC-13-Minbias.hepmc.gz dataset, which is provided by Rivet. If you do not see a ratio plot, you may have run export RIVET_ANALYSIS_PATH=$PWD before running your plugin.

../../_images/example_plot.png

Submitting a plugin

Once the plots look good, you can create a Merge Request in LbRivetPlugins for internal review. Once your Merge Request is accepted, your plugin will be included in a Merge Request to the official Rivet repository. As soon as this Merge Request is accepted, your plugin will be published and will be publicly available.