Accelerating RooFit with GPUs

Fitting data distributions with analytical models is a common task in CMS analyses. Depending on the size of the dataset and the number of free parameters in the model, the fitting can take a significant amount of time and become a bottleneck in the analysis.

The standard tool to perform such fits is RooFit, distributed with the ROOT framework. Since ROOT version 6.26, RooFit supports GPU-accelerated fitting using CUDA backend, which allows to speed up the fitting process by up to an order of magnitude.

Purdue Analysis Facility now supports this feature, allowing users to leverage available GPU resources to speed up their RooFit code. The feature is supported in both Jupyter Notebooks and Terminals, and for both C++ ROOT interface and PyROOT.

Pre-requisites

  1. Start your AF session with a GPU.

  2. Load the LCG view with CUDA-enabled ROOT build. LCG “releases” and “views” are software stacks distributed by CERN.

    1. If using a Jupyter Notebook: simply select the LCG_106b_cuda kernel.

    2. If using a Terminal, run the following command:

      source /cvmfs/sft.cern.ch/lcg/views/LCG_106b_cuda/x86_64-el8-gcc11-opt/setup.sh
      

Warning

The CUDA-enabled ROOT build is currently available only via the LCG software stack. It is not available in other kernels such as the “default” Python3 kernel or coffea-latest.

The only supported ROOT version at the moment is 6.32.08.

Enabling CUDA backend in RooFit

To enable CUDA backend in RooFit, the only thing you need to do is to pass the rt.RooFit.EvalBackend.Cuda() argument to fitTo() command in RooFit.

Below is an example of the code that uses CUDA backend for fitting a Z-boson mass spectrum.

import ROOT as rt

inputfile = "workspace_ggh_All_Zfit_no_e_cut_UL_calib_cat5.root"

rt.EnableImplicitMT()

file = rt.TFile.Open(inputfile)

canvas = rt.TCanvas()
canvas.cd()

mass =  rt.RooRealVar("mh_ggh","mass (GeV)",100,85,99)
frame = mass.frame()

# Breit Wigner
bwWidth = rt.RooRealVar("bwz_Width" , "widthZ", 2.5, 0, 30)
bwmZ = rt.RooRealVar("bwz_mZ" , "mZ", 91.2, 90, 92)
sigma = rt.RooRealVar("sigma" , "sigma", 2, 0.0, 5.0)
bwWidth.setConstant(True)
model1_1 = rt.RooBreitWigner("bwz", "BWZ",mass, bwmZ, bwWidth)

# Double Sided Crystal Ball
mean = rt.RooRealVar("mean" , "mean", 0, -10, 10) # mean is mean relative to BW
sigma = rt.RooRealVar("sigma" , "sigma", 2, .2, 4.0)
alpha1 = rt.RooRealVar("alpha1" , "alpha1", 2, 0.01, 45)
n1 = rt.RooRealVar("n1" , "n1", 10, 0.01, 185)
alpha2 = rt.RooRealVar("alpha2" , "alpha2", 2.0, 0.01, 65)
n2 = rt.RooRealVar("n2" , "n2", 25, 0.01, 385)
model1_2 = rt.RooCrystalBall("dcb","dcb",mass, mean, sigma, alpha1, n1, alpha2, n2)

mass.setBins(10000,"cache") # cache is repre of the varibale only used in FFT
mass.setBins(200) # bin to 100 bins otherwise, fitting with FFT conv is gonna take forever
mass.setMin("cache",50.5)
mass.setMax("cache",130.5)
model1 = rt.RooFFTConvPdf("BWxDCB", "BWxDCB", mass, model1_1, model1_2)

# Exponential background
coeff = rt.RooRealVar("coeff", "coeff", 0.001, 0.000001, 1.0)
shift = rt.RooRealVar("shift", "Offset", 91, 75, 105)
shifted_mass = rt.RooFormulaVar("shifted_mass", "@0-@1", rt.RooArgList(mass, shift))
model2 = rt.RooExponential("bkg", "bkg", shifted_mass, coeff)

sigfrac = rt.RooRealVar("sigfrac", "sigfrac", 0.99, 0, 1.0)
model = rt.RooAddPdf("model3", "model3", [model1, model2],sigfrac)

data = file.w.data("data_Zfit_no_e_cut_UL_calib_cat5")

model.fitTo(data, rt.RooFit.Save(), rt.RooFit.EvalBackend.Cuda()) #GPU

data.plotOn(frame)
model.plotOn(frame, rt.RooFit.LineColor(rt.kRed))
model.plotOn(frame, rt.RooFit.Components("BWxDCB"),rt.RooFit.LineColor(rt.kBlue))
model.plotOn(frame, rt.RooFit.Components("bkg"),rt.RooFit.LineColor(rt.kGreen))


frame.Draw()
canvas.Update()
canvas.Draw()

To run this code, you can download the input file workspace_ggh_All_Zfit_no_e_cut_UL_calib_cat5.root from https://cernbox.cern.ch/s/zKjJHZxRbDkADPf.