Back to home page

Project CMSSW displayed by LXR

 
 

    


Warning, /HeterogeneousCore/AlpakaCore/README.md is written in an unsupported language. File is not indexed.

0001 # Alpaka algorithms and modules in CMSSW
0002 
0003 ## Introduction
0004 
0005 This page documents the Alpaka integration within CMSSW. For more information about Alpaka itself see the [Alpaka documentation](https://alpaka.readthedocs.io/en/latest/).
0006 
0007 ### Compilation model
0008 
0009 The code in `Package/SubPackage/{interface,src,plugins,test}/alpaka` is compiled once for each enabled Alpaka backend. The `ALPAKA_ACCELERATOR_NAMESPACE` macro is substituted with a concrete, backend-specific namespace name in order to guarantee different symbol names for all backends, that allows for `cmsRun` to dynamically load any set of the backend libraries.
0010 
0011 The source files with `.dev.cc` suffix are compiled with the backend-specific device compiler. The other `.cc` source files are compiled with the host compiler.
0012 
0013 The `BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="1"/>` to enable the behavior described above.
0014 
0015 ## Overall guidelines
0016 
0017 * Minimize explicit blocking synchronization calls
0018   * Avoid `alpaka::wait()`, non-cached memory buffer allocations
0019 * If you can, use `global::EDProducer` base class
0020   * If you need per-stream storage
0021     * For few objects consider using [`edm::StreamCache<T>`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface#edm_StreamCacheT) with the global module, or
0022     * Use `stream::EDProducer`
0023   * If you need to transfer some data back to host, use `stream::SynchronizingEDProducer`
0024 * All code using `ALPAKA_ACCELERATOR_NAMESPACE` should be placed in `Package/SubPackage/{interface,src,plugins,test}/alpaka` directory
0025   * Alpaka-dependent code that uses templates instead of the namespace macro can be placed in `Package/SubPackage/interface` directory
0026 * All source files (not headers) using Alpaka device code (such as kernel call, functions called by kernels) must have a suffic `.dev.cc`, and be placed in the aforementioned `alpaka` subdirectory
0027 * Any code that `#include`s a header from the framework or from the `HeterogeneousCore/AlpakaCore` must be separated from the Alpaka device code, and have the usual `.cc` suffix.
0028   * Some framework headers are allowed to be used in `.dev.cc` files:
0029     * Any header containing only macros, e.g. `FWCore/Utilities/interface/CMSUnrollLoop.h`, `FWCore/Utilities/interface/stringize.h`
0030     * `FWCore/Utilities/interface/Exception.h`
0031     * `FWCore/MessageLogger/interface/MessageLogger.h`, although it is preferred to issue messages only in the `.cc` files
0032     * `HeterogeneousCore/AlpakaCore/interface/EventCache.h` and `HeterogeneousCore/AlpakaCore/interface/QueueCache.h` can, in principle, be used in `.dev.cc` files, even if there should be little need to use them explicitly
0033 
0034 ## Data formats
0035 
0036 Data formats, for both Event and EventSetup, should be placed following their usual rules. The Alpaka-specific conventions are
0037 * There must be a host-only flavor of the data format that is either independent of Alpaka, or depends only on Alpaka's Serial backend
0038   * The host-only data format must be defined in `Package/SubPackage/interface/` directory
0039   * If the data format is to be serialized (with ROOT), it must be serialized in a way that the on-disk format does not depend on Alpaka, i.e. it can be read without Alpaka
0040   * For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/classes{.h,_def.xml}`
0041     * As usual, the `classes_def.xml` should declare the dictionaries for the data product type `T` and `edm::Wrapper<T>`. These data products can be declared as persistent (default) or transient (`persistent="false"` attribute).
0042   * For EventSetup data products [the registration macro `TYPELOOKUP_DATA_REG`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToRegisterESData) should be placed in `Package/SubPackage/src/ES_<type name>.cc`.
0043 * The device-side data formats are defined in `Package/SubPackage/interface/alpaka/` directory
0044   * The device-side data format classes should be either templated over the device type, or defined in the `ALPAKA_ACCELERATOR_NAMESPACE` namespace.
0045   * For host backends (`serial`), the "device-side" data format class must be the same as the aforementioned host-only data format class
0046     * Use `ASSERT_DEVICE_MATCHES_HOST_COLLECTION(<device collection type>, <host collection type>);` macro to ensure that, see an example in [../../DataFormats/PortableTestObjects/interface/alpaka/TestDeviceCollection.h](TestDeviceCollection.h)
0047     * This equality is necessary for the [implicit data transfers](#implicit-data-transfers) to function properly
0048   * For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/alpaka/classes_<platform>{.h,_def.xml}`
0049     * The `classes_<platform>_def.xml` should declare the dictionaries for the data product type `T`, `edm::DeviceProduct<T>`, and `edm::Wrapper<edm::DeviceProduct<T>>`. All these dictionaries must be declared as transient with `persistent="false"` attribute.
0050     * The list of `<platform>` includes currently: `cuda`, `rocm`
0051   * For EventSetup data products the registration macro should be placed in `Package/SubPackage/src/alpaka/ES_<type name>.cc`
0052      * Data products defined in `ALPAKA_ACCELERATOR_NAMESPACE` should use `TYPELOOKUP_ALPAKA_DATA_REG` macro
0053      * Data products templated over the device type should use `TYPELOOKUP_ALPAKA_TEMPLATED_DATA_REG` macro
0054 * For Event data products the `DataFormats/SubPackage/BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="!serial"/>`
0055   * unless the package has something that is really specific for `serial` backend that is not generally applicable on host
0056 
0057 Note that even if for Event data formats the examples above used `DataFormats` package, Event data formats are allowed to be defined in other packages too in some circumstances. For full details please see [SWGuideCreatingNewProducts](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts).
0058 
0059 ### Implicit data transfers
0060 
0061 Both EDProducers and ESProducers make use of implicit data transfers. In CPU backends these data transfers are omitted, and the host-side and the "device-side" data products are the same.
0062 
0063 #### Data copy definitions
0064 
0065 The implicit host-to-device and device-to-host copies rely on specialization of `cms::alpakatools::CopyToDevice` and `cms::alpakatools::CopyToHost` class templates, respectively. These have to be specialized along
0066 ```cpp
0067 #include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
0068 
0069 namespace cms::alpakatools {
0070   template<>
0071   struct CopyToDevice<TSrc> {
0072     template <typename TQueue>
0073       requires alpaka::isQueue<TQueue>
0074     static auto copyAsync(TQueue& queue, TSrc const& hostProduct) -> TDst {
0075       // code to construct TDst object, and launch the asynchronous memcpy from the host to the device of TQueue
0076       return ...;
0077     }
0078   };
0079 }
0080 ```
0081 or
0082 ```cpp
0083 #include "HeterogeneousCore/AlpakaInterface/interface/CopyToHost.h"
0084 
0085 namespace cms::alpakatools {
0086   template <>
0087   struct CopyToHost<TSrc> {
0088     template <typename TQueue>
0089       requires alpaka::isQueue<TQueue>
0090     static auto copyAsync(TQueue& queue, TSrc const& deviceProduct) -> TDst {
0091       // code to construct TDst object, and launch the asynchronous memcpy from the device of TQueue to the host
0092       return ...;
0093     }
0094   };
0095 }
0096 ```
0097 respectively. 
0098 
0099 Note that the destination (device-side/host-side) type `TDst` can be different from or the same as the source (host-side/device-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer in a non-blocking way.
0100 
0101 Both `CopyToDevice` and `CopyToHost` class templates are partially specialized for all `PortableObject` and `PortableCollection` instantiations.
0102 
0103 
0104 ##### Data products with `memcpy()`ed pointers
0105 
0106 If the data product in question contains pointers to memory elsewhere within the data product, after the `alpaka::memcpy()` calls in the `copyAsync()` those pointers still point to device memory, and need to be updated. **Such data products are generally discouraged.** Nevertheless, such pointers can be updated without any additional synchronization by implementing a `postCopy()` function in the `CopyToHost` specialization along (extending the `CopyToHost` example [above](#data-copy-definitions))
0107 ```cpp
0108 namespace cms::alpakatools {
0109   template <>
0110   struct CopyToHost<TSrc> {
0111     // copyAsync() definition from above
0112 
0113     static void postCopy(TDst& obj) {
0114       // modify obj
0115       // any modifications must be such that the postCopy() can be
0116       // skipped when the obj originates from the host (i.e. on CPU backends)
0117     }
0118   };
0119 }
0120 ```
0121 The `postCopy()` is called after the operations enqueued in the `copyAsync()` have finished. The code in `postCopy()` must be such that the call to `postCopy()` can be omitted on CPU backends.
0122 
0123 Note that for `CopyToDevice` such `postCopy()` functionality is **not** provided. It should be possible to a issue kernel call from the `CopyToDevice::copyAsync()` function to achieve the same effect.
0124 
0125 
0126 #### EDProducer
0127 
0128 In EDProducers for each device-side data product a transfer from the device memory space to the host memory space is registered automatically. The data product is copied only if the job has another EDModule that consumes the host-side data product. For each device-side data product a specialization of `cms::alpakatools::CopyToHost` is required to exist.
0129 
0130 In addition, for each host-side data product a transfer from the host memory space to the device meory space is registered autmatically **if** a `cms::alpakatools::CopyToDevice` specialization exists. The data product is copied only if the job has another EDModule that consumes the device-side data product.
0131 
0132 #### ESProducer
0133 
0134 In ESProducers for each host-side data product a transfer from the host memory space to the device memory space (of the backend of the ESProducer) is registered automatically. The data product is copied only if the job has another ESProducer or EDModule that consumes the device-side data product. For each host-side data product a specialization of `cms::alpakatools::CopyToDevice` is required to exist.
0135 
0136 ### `PortableCollection`
0137 
0138 For more information see [`DataFormats/Portable/README.md`](../../DataFormats/Portable/README.md) and [`DataFormats/SoATemplate/README.md`](../../DataFormats/SoATemplate/README.md).
0139 
0140 
0141 ## Modules
0142 
0143 ### Base classes
0144 
0145 The Alpaka-based EDModules should use one of the following base classes (that are defined in the `ALPAKA_ACCELERATOR_NAMESPACE`):
0146 
0147 * `global::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"`)
0148    * A [global EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface) that launches (possibly) asynchronous work
0149 * `stream::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/EDProducer.h"`)
0150    * A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that launches (possibly) asynchronous work
0151 * `stream::SynchronizingEDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/SynchronizingEDProducer.h"`)
0152    * A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that may launch (possibly) asynchronous work, and synchronizes the asynchronous work on the device with the host
0153       * The base class uses the [`edm::ExternalWork`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface#edm_ExternalWork) for the non-blocking synchronization
0154 
0155 The `...` can in principle be any of the module abilities listed in the linked TWiki pages, except the `edm::ExternalWork`. The majority of the Alpaka EDProducers should be `global::EDProducer` or `stream::EDProducer`, with `stream::SynchronizingEDProducer` used only in cases where some data to be copied from the device to the host, that requires synchronization, for different reason than copying an Event data product from the device to the host.
0156 
0157 New base classes (or other functionality) can be added based on new use cases that come up.
0158 
0159 The Alpaka-based ESProducers should use the `ESProducer` base class (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESProducer.h"`).
0160 
0161 Note that both the Alpaka-based EDProducer and ESProducer constructors must pass the argument `edm::ParameterSet` object to the constructor of their base class.
0162 
0163 Note that currently Alpaka-based ESSources are not supported. If you need to produce EventSetup data products into a Record for which there is no ESSource yet, use [`EmptyESSource`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMParametersForModules#EmptyESSource).
0164 
0165 
0166 ### Event, EventSetup, Records
0167 
0168 The Alpaka-based modules have a notion of a _host memory space_ and _device memory space_ for the Event and EventSetup data products. The data products in the host memory space are accessible for non-Alpaka modules, whereas the data products in device memory space are available only for modules of the specific Alpaka backend. The host backend(s) use the host memory space directly.
0169 
0170 The EDModules get `device::Event` and `device::EventSetup` from the framework, from which data products in both host memory space and device memory space can be accessed. Data products can also be produced to either memory space. As discussed [above](#edproducer), for each data product produced into the device memory space an implicit data copy from the device memory space to the host memory space is registered, and for each data produced produced into the host memory space for which `cms::alpakatools::CopyToDevice` is specialized an implicit data copy from the host memory space to the device memory space is registered. The `device::Event::queue()` returns the Alpaka `Queue` object into which all work in the EDModule must be enqueued.
0171 
0172 The ESProducer can have two different `produce()` function signatures
0173 * If the function has the usual `TRecord const&` parameter, the function can read an ESProduct from the host memory space, and produce another product into the host memory space. An implicit copy of the data product from the host memory space to the device memory space (of the backend of the ESProducer) is registered as discussed above.
0174 * If the function has `device::Record<TRecord> const&` parameter, the function can read an ESProduct from the device memory space, and produce another product into the device memory space. No further copies are made by the framework. The `device::Record<TRecord>::queue()` gives the Alpaka `Queue` object into which all work in the ESProducer must be enqueued. 
0175 
0176 ### Tokens
0177 
0178 The memory spaces of the consumed and (in EDProducer case) produced data products are driven by the tokens. The token types to be used in different cases are summarized below. 
0179 
0180 
0181 |                                                                | Host memory space             | Device memory space              |
0182 |----------------------------------------------------------------|-------------------------------|----------------------------------|
0183 | Access Event data product of type `T`                          | `edm::EDGetTokenT<T>`         | `device::EDGetToken<T>`          |
0184 | Produce Event data product of type `T`                         | `edm::EDPutTokenT<T>`         | `device::EDPutToken<T>`          |
0185 | Access EventSetup data product of type `T` in Record `TRecord` | `edm::ESGetToken<T, TRecord>` | `device::ESGetToken<T, TRecord>` |
0186 
0187 With the device memory space tokens the type-deducing `consumes()`, `produces()`, and `esConsumes()` calls must be used (i.e. do not specify the data product type as part of the function call). For more information on these registration functions see
0188 * [`consumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMGetDataFromEvent#consumes)
0189 * [`produces()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts#Producing_the_EDProduct)
0190 * [`esConsumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ED_module)
0191 * [`consumes()` in ESProducers](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ESProducer)
0192 
0193 
0194 ### `fillDescriptions()`
0195 
0196 In the [`fillDescriptions()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp) function specifying the module label automatically with the [`edm::ConfigurationDescriptions::addWithDefaultLabel()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp#Automatic_module_labels_from_plu) is strongly recommended. Currently a `cfi` file is generated for a module for each Alpaka backend such that the backend namespace is explicitly used in the module definition. An additional `cfi` file is generated for the ["module type resolver"](#module-type-resolver-portable) functionality, where the module type has `@alpaka` postfix.
0197 
0198 Also note that the `fillDescription()` function must have the same content for all backends, i.e. any backend-specific behavior with e.g. `#ifdef` or `if constexpr` are forbidden.
0199 
0200 ### Copy e.g. configuration data to all devices in EDProducer
0201 
0202 While the EventSetup can be used to handle copying data to all devices
0203 of an Alpaka backend, for data used only by one EDProducer a simpler
0204 way would be to use one of
0205 * `cms::alpakatools::MoveToDeviceCache<TDevice, THostObject>` (recommended)
0206   * `#include "HeterogeneousCore/AlpakaCore/interface/MoveToDeviceCache.h"`
0207   * Moves the `THostObject` to all devices using `cms::alpakatools::CopyToDevice<THostObject>` synchronously. On host backends the argument `THostObject` is moved around, but not copied.
0208   * The `THostObject` must not be copyable
0209     * This is to avoid easy mistakes with objects that follow copy semantics of `std::shared_ptr` (that includes Alpaka buffers), that would allow the source memory buffer to be used via another copy during the asynchronous data copy to the device.
0210   * The constructor argument `THostObject` object may not be used, unless it is initialized again e.g. by assigning another `THostObject` into it.
0211   * The corresponding device-side object can be obtained with `get()` member function using either alpaka Device or Queue object. It can be used immediately after the constructor returns.
0212 * `cms::alpakatools::CopyToDeviceCache<TDevice, THostObject>` (use only if **must** use copyable `THostObject`)
0213   * `#include "HeterogeneousCore/AlpakaCore/interface/CopyToDeviceCache.h"`
0214   * Copies the `THostObject` to all devices using `cms::alpakatools::CopyToDevice<THostObject>` synchronously. Also host backends do a copy.
0215   * The constructor argument `THostObject` object can be used for other purposes immediately after the constructor returns
0216   * The corresponding device-side object can be obtained with `get()` member function using either alpaka Device or Queue object. It can be used immediately after the constructor returns.
0217 
0218 For examples see [`HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerCopyToDeviceCache.cc`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerCopyToDeviceCache.cc) and [`HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerMoveToDeviceCache.cc`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerMoveToDeviceCache.cc).
0219 
0220 ## Guarantees
0221 
0222 * All Event data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed through the `device::Event`.
0223 * All Event data products in the host memory space are guaranteed to be accessible for all operations (after the data product has been obtained from the `edm::Event` or `device::Event`).
0224 * All EventSetup data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed via the `device::EventSetup` (ED modules), or by `device::Record<TRecord>::queue()` when accessed via the `device::Record<TRecord>` (ESProducers).
0225 * The EDM Stream does not proceed to the next Event until after all asynchronous work of the current Event has finished.
0226   * **Note**: this implies if an EDProducer in its `produce()` function uses the `Event::queue()` or gets a device-side data product, and does not produce any device-side data products, the `produce()` call will be synchronous (i.e. will block the CPU thread until the asynchronous work finishes)
0227 
0228 ## Examples
0229 
0230 For concrete examples see code in [`HeterogeneousCore/AlpakaTest`](../../HeterogeneousCore/AlpakaTest) and [`DataFormats/PortableTestObjects`](../../DataFormats/PortableTestObjects).
0231 
0232 ### EDProducer
0233 
0234 This example shows a mixture of behavior from test code in [`HeterogeneousCore/AlpakaTest/plugins/alpaka/`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/)
0235 ```cpp
0236 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDGetToken.h"
0237 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDPutToken.h"
0238 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESGetToken.h"
0239 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/Event.h"
0240 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EventSetup.h"
0241 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"
0242 #include "HeterogeneousCore/AlpakaInterface/interface/config.h"
0243 // + usual #includes for the used framework components, data format(s), record(s)
0244 
0245 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0246 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0247 
0248   // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0249   class ExampleAlpakaProducer : public global::EDProducer<> {
0250   public:
0251     ExampleAlpakaProducer(edm::ParameterSet const& iConfig)
0252         : EDProducer<>(iConfig),
0253           // produces() must not specify the product type, it is deduced from deviceToken_
0254           deviceToken_{produces()},
0255           size_{iConfig.getParameter<int32_t>("size")} {}
0256 
0257     // device::Event and device::EventSetup are defined in ALPAKA_ACCELERATOR_NAMESPACE as well
0258     void produce(edm::StreamID sid, device::Event& iEvent, device::EventSetup const& iSetup) const override {
0259       // get input data products
0260       auto const& hostInput = iEvent.get(getTokenHost_);
0261       auto const& deviceInput = iEvent.get(getTokenDevice_);
0262       auto const& deviceESData = iSetup.getData(esGetTokenDevice_);
0263     
0264       // run the algorithm, potentially asynchronously
0265       portabletest::TestDeviceCollection deviceProduct{size_, event.queue()};
0266       algo_.fill(event.queue(), hostInput, deviceInput, deviceESData, deviceProduct);
0267 
0268       // put the asynchronous product into the event without waiting
0269       // must use EDPutToken with emplace() or put()
0270       //
0271       // for a product produced with device::EDPutToken<T> the base class registers
0272       // a separately scheduled transformation function for the copy to host
0273       // the transformation function calls
0274       // cms::alpakatools::CopyToDevice<portabletest::TestDeviceCollection>::copyAsync(Queue&, portabletest::TestDeviceCollection const&)
0275       // function
0276       event.emplace(deviceToken_, std::move(deviceProduct));
0277     }
0278 
0279     static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0280       // All backends must have exactly the same fillDescriptions() content!
0281       edm::ParameterSetDescription desc;
0282       desc.add<int32_t>("size");
0283       descriptions.addWithDefaultLabel(desc);
0284     }
0285 
0286   private:
0287     // use edm::EGetTokenT<T> to read from host memory space
0288     edm::EDGetTokenT<FooProduct> const getTokenHost_;
0289     
0290     // use device::EDGetToken<T> to read from device memory space
0291     device::EDGetToken<BarProduct> const getTokenDevice_;
0292 
0293     // use device::ESGetToken<T, TRecord> to read from device memory space
0294     device::ESGetToken<TestProduct, TestRecord> const esGetTokenDevice_;
0295 
0296     // use device::EDPutToken<T> to place the data product in the device memory space
0297     device::EDPutToken<portabletest::TestDeviceCollection> const deviceToken_;
0298     int32_t const size_;
0299 
0300     // implementation of the algorithm
0301     TestAlgo algo_;
0302   };
0303 
0304 }  // namespace ALPAKA_ACCELERATOR_NAMESPACE
0305 
0306 #include "HeterogeneousCore/AlpakaCore/interface/MakerMacros.h"
0307 DEFINE_FWK_ALPAKA_MODULE(TestAlpakaProducer);
0308 
0309 ```
0310 
0311 ### ESProducer to reformat an existing ESProduct for use in device
0312 
0313 ```cpp
0314 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0315 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0316 
0317   // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0318   class ExampleAlpakaESProducer : public ESProducer {
0319   public:
0320     ExampleAlpakaESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
0321       // register the production function
0322       auto cc = setWhatProduced(this);
0323       // register consumed ESProduct(s)
0324       token_ = cc.consumes();
0325     }
0326 
0327     static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0328       // All backends must have exactly the same fillDescriptions() content!
0329       edm::ParameterSetDescription desc;
0330       descriptions.addWithDefaultLabel(desc);
0331     }
0332 
0333     // return type can be
0334     // - std::optional<T> (T is cheap to move),
0335     // - std::unique_ptr<T> (T is not cheap to move),
0336     // - std::shared_ptr<T> (allows sharing between IOVs)
0337     //
0338     // the base class registers a separately scheduled function to copy the product on device memory
0339     // the function calls
0340     // cms::alpakatools::CopyToDevice<SimpleProduct>::copyAsync(Queue&, SimpleProduct const&)
0341     // function
0342     std::optional<SimpleProduct> produce(TestRecord const& iRecord) {
0343       // get input data
0344       auto const& hostInput = iRecord.get(token_);
0345 
0346       // allocate data product on the host memory
0347       SimpleProduct hostProduct;
0348 
0349       // fill the hostProduct from hostInput
0350 
0351       return hostProduct;
0352     }
0353 
0354   private:
0355     edm::ESGetToken<TestProduct, TestRecord> token_;
0356   };
0357 }  // namespace ALPAKA_ACCELERATOR_NAMESPACE
0358 
0359 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
0360 DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaESProducer);
0361 ```
0362 
0363 ### ESProducer to derive a new ESProduct from an existing device-side ESProduct
0364 
0365 ```cpp
0366 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0367 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0368 
0369   // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0370   class ExampleAlpakaDeriveESProducer : public ESProducer {
0371   public:
0372     ExampleAlpakaDeriveESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
0373       // register the production function
0374       auto cc = setWhatProduced(this);
0375       // register consumed ESProduct(s)
0376       token_ = cc.consumes();
0377     }
0378 
0379     static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0380       // All backends must have exactly the same fillDescriptions() content!
0381       edm::ParameterSetDescription desc;
0382       descriptions.addWithDefaultLabel(desc);
0383     }
0384 
0385     std::optional<OtherProduct> produce(device::Record<TestRecord> const& iRecord) {
0386       // get input data in the device memory space
0387       auto const& deviceInput = iRecord.get(token_);
0388 
0389       // allocate data product on the device memory
0390       OtherProduct deviceProduct(iRecord.queue());
0391 
0392       // run the algorithm, potentially asynchronously
0393       algo_.fill(iRecord.queue(), deviceInput, deviceProduct);
0394 
0395       // return the product without waiting
0396       return deviceProduct;
0397     }
0398 
0399   private:
0400     device::ESGetToken<SimpleProduct, TestRecord> token_;
0401     
0402     OtherAlgo algo_;
0403   };
0404 
0405 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
0406 DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaDeviceESProducer);
0407 ```
0408 
0409 ## Configuration
0410 
0411 There are a few different options for using Alpaka-based modules in the CMSSW configuration.
0412 
0413 In all cases the configuration must load the necessary `ProcessAccelerator` objects (see below) For accelerators used in production, these are aggregated in `Configuration.StandardSequences.Accelerators_cff`. The `runTheMatrix.py` handles the loading of this `Accelerators_cff` automatically. The HLT menus also load the necessary `ProcessAccelerator`s.
0414 ```python
0415 ## Load explicitly
0416 # One ProcessAccelerator for each accelerator technology, plus a generic one for Alpaka
0417 process.load("Configuration.StandardSequences.Accelerators_cff")
0418 ```
0419 
0420 ### Explicit module type (non-portable)
0421 
0422 The Alpaka modules can be used in the python configuration with their explicit, full type names
0423 ```python
0424 process.producerCPU = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...)
0425 process.producerGPU = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
0426 ```
0427 Obviously this kind of configuration can be run only on machines that provide the necessary hardware. The configuration is thus explicitly non-portable.
0428 
0429 
0430 ### SwitchProducerCUDA (semi-portable)
0431 
0432 A step towards a portable configuration is to use the `SwitchProcucer` mechanism, for which currently the only concrete implementation is [`SwitchProducerCUDA`](../../HeterogeneousCore/CUDACore/README.md#automatic-switching-between-cpu-and-gpu-modules). The modules for different Alpaka backends still need to be specified explicitly
0433 ```python
0434 from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
0435 process.producer = SwitchProducerCUDA(
0436     cpu = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...),
0437     cuda = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
0438 )
0439 
0440 # or
0441 
0442 process.producer = SwitchProducerCUDA(
0443     cpu = cms.EDAlias(producerCPU = cms.EDAlias.allProducts(),
0444     cuda = cms.EDAlias(producerGPU = cms.EDAlias.allProducts()
0445 )
0446 ```
0447 This kind of configuration can be run on any machine (a given CMSSW build supports), but is limited to CMSSW builds where the modules for all the Alpaka backends declared in the configuration can be built (`alpaka_serial_sync` and `alpaka_cuda_async` in this example). Therefore the `SwitchProducer` approach is here called "semi-portable".
0448 
0449 ### Module type resolver (portable)
0450 
0451 A fully portable way to express a configuration can be achieved with "module type resolver" approach. The module is specified in the configuration without the backend-specific namespace, and with `@alpaka` postfix
0452 ```python
0453 process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka", ...)
0454 
0455 # backend can also be set explicitly
0456 process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0457     ...
0458     alpaka = cms.untracked.PSet(
0459         backend = cms.untracked.string("serial_sync")
0460     )
0461 )
0462 ```
0463 The `@alpaka` postfix in the module type tells the system the module's exact class type should be resolved at run time. The type (or backend) is set according to the value of `process.options.accelerators` and the set of accelerators available in the machine. If the backend is set explicitly in the module's `alpaka` PSet, the module of that backend will be used.
0464 
0465 This approach is portable also across CMSSW builds that support different sets of accelerators, as long as only the host backends (if any) are specified explicitly in the `alpaka` PSet.
0466 
0467 
0468 #### Examples on explicitly setting the backend
0469 
0470 ##### For individual modules
0471 
0472 The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This setting overrides the `ProcessAcceleratorAlpaka` setting described in the next paragraph.
0473 
0474 ```python
0475 process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0476     ...
0477     alpaka = cms.untracked.PSet(
0478         backend = cms.untracked.string("serial_sync") # or "cuda_async" or "rocm_async"
0479     )
0480 )
0481 ```
0482 
0483 ##### For all Alpaka modules
0484 
0485 The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This `ProcessAcceleratorAlpaka` setting can be further overridden for individual modules as described in the previous paragraph.
0486 
0487 ```python
0488 process.ProcessAcceleratorAlpaka.setBackend("serial_sync") # or "cuda_async" or "rocm_async"
0489 ```
0490 
0491 ##### For entire job (i.e. also for non-Alpaka modules)
0492 ```python
0493 process.options.accelerators = ["cpu"] # or "gpu-nvidia" or "gpu-amd"
0494 ```
0495 
0496 ### Blocking synchronization (for testing)
0497 
0498 While the general approach is to favor asynchronous operations with non-blocking synchronization, for testing purposes it can be useful to synchronize the EDModule's `acquire()` / `produce()` or ESProducer's production functions in a blocking way. Such a blocking synchronization can be specified for individual modules via the `alpaka` `PSet` along
0499 ```python
0500 process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0501     ...
0502     alpaka = cms.untracked.PSet(
0503         synchronize = cms.untracked.bool(True)
0504     )
0505 )
0506 ```
0507 
0508 The blocking synchronization can be specified for all Alpaka modules via the `ProcessAcceleratorAlpaka` along
0509 ```python
0510 process.ProcessAcceleratorAlpaka.setSynchronize(True)
0511 ```
0512 Note that the possible per-module parameter overrides this global setting.
0513 
0514 
0515 ## Unit tests
0516 
0517 Unit tests that depend on Alpaka and define `<flags ALPAKA_BACKENDS="1"/>`, e.g. as a binary along
0518 ```xml
0519 <bin name="<unique test binary name>" file="<comma-separated list of files">
0520   <use name="alpaka"/>
0521   <flags ALPAKA_BACKENDS="1"/>
0522 </bin>
0523 ```
0524 or as a command (e.g. `cmsRun` or a shell script) to run
0525 
0526 ```xml
0527 <test name="<unique name of the test>" command="<command to run>">
0528   <use name="alpaka"/>
0529   <flags ALPAKA_BACKENDS="1"/>
0530 </test>
0531 ```
0532 
0533 will be run as part of `scram build runtests` according to the
0534 availability of the hardware:
0535 - `serial_sync` version is run always
0536 - `cuda_async` version is run if NVIDIA GPU is present (i.e. `cudaIsEnabled` returns 0)
0537 - `rocm_async` version is run if AMD GPU is present (i.e. `rocmIsEnabled` returns 0)
0538 
0539 Tests for specific backend (or hardware) can be explicitly specified to be run by setting `USER_UNIT_TESTS=cuda` or `USER_UNIT_TESTS=rocm` environment variable. Tests not depending on the hardware are skipped. If the corresponding hardware is not available, the tests will fail.
0540