Back to home page

Project CMSSW displayed by LXR

 
 

    


Warning, /HeterogeneousCore/AlpakaCore/README.md is written in an unsupported language. File is not indexed.

0001 # Alpaka algorithms and modules in CMSSW
0002 
0003 ## Introduction
0004 
0005 This page documents the Alpaka integration within CMSSW. For more information about Alpaka itself see the [Alpaka documentation](https://alpaka.readthedocs.io/en/latest/).
0006 
0007 ### Compilation model
0008 
0009 The code in `Package/SubPackage/{interface,src,plugins,test}/alpaka` is compiled once for each enabled Alpaka backend. The `ALPAKA_ACCELERATOR_NAMESPACE` macro is substituted with a concrete, backend-specific namespace name in order to guarantee different symbol names for all backends, that allows for `cmsRun` to dynamically load any set of the backend libraries.
0010 
0011 The source files with `.dev.cc` suffix are compiled with the backend-specific device compiler. The other `.cc` source files are compiled with the host compiler.
0012 
0013 The `BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="1"/>` to enable the behavior described above.
0014 
0015 ## Overall guidelines
0016 
0017 * Minimize explicit blocking synchronization calls
0018   * Avoid `alpaka::wait()`, non-cached memory buffer allocations
0019 * If you can, use `global::EDProducer` base class
0020   * If you need per-stream storage
0021     * For few objects consider using [`edm::StreamCache<T>`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface#edm_StreamCacheT) with the global module, or
0022     * Use `stream::EDProducer`
0023   * If you need to transfer some data back to host, use `stream::SynchronizingEDProducer`
0024 * All code using `ALPAKA_ACCELERATOR_NAMESPACE` should be placed in `Package/SubPackage/{interface,src,plugins,test}/alpaka` directory
0025   * Alpaka-dependent code that uses templates instead of the namespace macro can be placed in `Package/SubPackage/interface` directory
0026 * All source files (not headers) using Alpaka device code (such as kernel call, functions called by kernels) must have a suffic `.dev.cc`, and be placed in the aforementioned `alpaka` subdirectory
0027 * Any code that `#include`s a header from the framework or from the `HeterogeneousCore/AlpakaCore` must be separated from the Alpaka device code, and have the usual `.cc` suffix.
0028   * Some framework headers are allowed to be used in `.dev.cc` files:
0029     * Any header containing only macros, e.g. `FWCore/Utilities/interface/CMSUnrollLoop.h`, `FWCore/Utilities/interface/stringize.h`
0030     * `FWCore/Utilities/interface/Exception.h`
0031     * `FWCore/MessageLogger/interface/MessageLogger.h`, although it is preferred to issue messages only in the `.cc` files
0032     * `HeterogeneousCore/AlpakaCore/interface/EventCache.h` and `HeterogeneousCore/AlpakaCore/interface/QueueCache.h` can, in principle, be used in `.dev.cc` files, even if there should be little need to use them explicitly
0033 
0034 ## Data formats
0035 
0036 Data formats, for both Event and EventSetup, should be placed following their usual rules. The Alpaka-specific conventions are
0037 * There must be a host-only flavor of the data format that is either independent of Alpaka, or depends only on Alpaka's Serial backend
0038   * The host-only data format must be defined in `Package/SubPackage/interface/` directory
0039   * If the data format is to be serialized (with ROOT), it must be serialized in a way that the on-disk format does not depend on Alpaka, i.e. it can be read without Alpaka
0040   * For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/classes{.h,_def.xml}`
0041     * As usual, the `classes_def.xml` should declare the dictionaries for the data product type `T` and `edm::Wrapper<T>`. These data products can be declared as persistent (default) or transient (`persistent="false"` attribute).
0042   * For EventSetup data products [the registration macro `TYPELOOKUP_DATA_REG`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToRegisterESData) should be placed in `Package/SubPackage/src/ES_<type name>.cc`.
0043 * The device-side data formats are defined in `Package/SubPackage/interface/alpaka/` directory
0044   * The device-side data format classes should be either templated over the device type, or defined in the `ALPAKA_ACCELERATOR_NAMESPACE` namespace.
0045   * For host backends (`serial`), the "device-side" data format class must be the same as the aforementioned host-only data format class
0046     * Use `ASSERT_DEVICE_MATCHES_HOST_COLLECTION(<device collection type>, <host collection type>);` macro to ensure that, see an example in [../../DataFormats/PortableTestObjects/interface/alpaka/TestDeviceCollection.h](TestDeviceCollection.h)
0047     * This equality is necessary for the [implicit data transfers](#implicit-data-transfers) to function properly
0048   * For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/alpaka/classes_<platform>{.h,_def.xml}`
0049     * The `classes_<platform>_def.xml` should declare the dictionaries for the data product type `T`, `edm::DeviceProduct<T>`, and `edm::Wrapper<edm::DeviceProduct<T>>`. All these dictionaries must be declared as transient with `persistent="false"` attribute.
0050     * The list of `<platform>` includes currently: `cuda`, `rocm`
0051   * For EventSetup data products the registration macro should be placed in `Package/SubPackage/src/alpaka/ES_<type name>.cc`
0052      * Data products defined in `ALPAKA_ACCELERATOR_NAMESPACE` should use `TYPELOOKUP_ALPAKA_DATA_REG` macro
0053      * Data products templated over the device type should use `TYPELOOKUP_ALPAKA_TEMPLATED_DATA_REG` macro
0054 * For Event data products the `DataFormats/SubPackage/BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="!serial"/>`
0055   * unless the package has something that is really specific for `serial` backend that is not generally applicable on host
0056 
0057 Note that even if for Event data formats the examples above used `DataFormats` package, Event data formats are allowed to be defined in other packages too in some circumstances. For full details please see [SWGuideCreatingNewProducts](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts).
0058 
0059 ### Implicit data transfers
0060 
0061 Both EDProducers and ESProducers make use of implicit data transfers.
0062 
0063 #### EDProducer
0064 
0065 In EDProducers for each device-side data product a transfer from the device memory space to the host memory space is registered automatically. The data product is copied only if the job has another EDModule that consumes the host-side data product. The framework code to issue the transfer makes use of `cms::alpakatools::CopyToHost` class template that must be specialized along
0066 ```cpp
0067 #include "HeterogeneousCore/AlpakaInterface/interface/CopyToHost.h"
0068 
0069 namespace cms::alpakatools {
0070   template <>
0071   struct CopyToHost<TSrc> {
0072     template <typename TQueue>
0073     static auto copyAsync(TQueue& queue, TSrc const& deviceProduct) -> TDst {
0074       // code to construct TDst object, and launch the asynchronous memcpy from the device of TQueue to the host
0075       return ...;
0076     }
0077   };
0078 }
0079 ```
0080 Note that the destination (host-side) type `TDst` can be different from or the same as the source (device-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer in a non-blocking way.
0081 
0082 The `CopyToHost` class template is partially specialized for all `PortableCollection` instantiations.
0083 
0084 #### ESProducer
0085 
0086 In ESProducers for each host-side data product a transfer from the host memory space to the device memory space (of the backend of the ESProducer) is registered automatically. The data product is copied only if the job has another ESProducer or EDModule that consumes the device-side data product. The framework code to issue makes use of `cms::alpakatools::CopyToDevice` class template that must be specialized along 
0087 ```cpp
0088 #include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
0089 
0090 namespace cms::alpakatools {
0091   template<>
0092   struct CopyToDevice<TSrc> {
0093     template <typename TQueue>
0094     static auto copyAsync(TQueue& queue, TSrc const& hostProduct) -> TDst {
0095       // code to construct TDst object, and launch the asynchronous memcpy from the host to the device of TQueue
0096       return ...;
0097     }
0098   };
0099 }
0100 ```
0101 Note that the destination (device-side) type `TDst` can be different from or the same as the source (host-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer (currently the synchronization blocks, but work is ongoing to make it non-blocking).
0102 
0103 The `CopyToDevice` class template is partially specialized for all `PortableCollection` instantiations.
0104 
0105 ### `PortableCollection`
0106 
0107 For more information see [`DataFormats/Portable/README.md`](../../DataFormats/Portable/README.md) and [`DataFormats/SoATemplate/README.md`](../../DataFormats/SoATemplate/README.md).
0108 
0109 
0110 ## Modules
0111 
0112 ### Base classes
0113 
0114 The Alpaka-based EDModules should use one of the following base classes (that are defined in the `ALPAKA_ACCELERATOR_NAMESPACE`):
0115 
0116 * `global::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"`)
0117    * A [global EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface) that launches (possibly) asynchronous work
0118 * `stream::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/EDProducer.h"`)
0119    * A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that launches (possibly) asynchronous work
0120 * `stream::SynchronizingEDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/SynchronizingEDProducer.h"`)
0121    * A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that may launch (possibly) asynchronous work, and synchronizes the asynchronous work on the device with the host
0122       * The base class uses the [`edm::ExternalWork`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface#edm_ExternalWork) for the non-blocking synchronization
0123 
0124 The `...` can in principle be any of the module abilities listed in the linked TWiki pages, except the `edm::ExternalWork`. The majority of the Alpaka EDProducers should be `global::EDProducer` or `stream::EDProducer`, with `stream::SynchronizingEDProducer` used only in cases where some data to be copied from the device to the host, that requires synchronization, for different reason than copying an Event data product from the device to the host.
0125 
0126 New base classes (or other functionality) can be added based on new use cases that come up.
0127 
0128 The Alpaka-based ESProducers should use the `ESProducer` base class (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESProducer.h"`). Note that the Alpaka-based ESProducer constructor must pass the argument `edm::ParameterSet` object to the constructor of the `ESProducer` base class.
0129 
0130 Note that currently Alpaka-based ESSources are not supported. If you need to produce EventSetup data products into a Record for which there is no ESSource yet, use [`EmptyESSource`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMParametersForModules#EmptyESSource).
0131 
0132 
0133 ### Event, EventSetup, Records
0134 
0135 The Alpaka-based modules have a notion of a _host memory space_ and _device memory space_ for the Event and EventSetup data products. The data products in the host memory space are accessible for non-Alpaka modules, whereas the data products in device memory space are available only for modules of the specific Alpaka backend. The host backend(s) use the host memory space directly.
0136 
0137 The EDModules get `device::Event` and `device::EventSetup` from the framework, from which data products in both host memory space and device memory space can be accessed. Data products can also be produced to either memory space. For all data products produced in the device memory space an implicit data copy from the device memory space to the host memory space is registered as discussed above. The `device::Event::queue()` returns the Alpaka `Queue` object into which all work in the EDModule must be enqueued. 
0138 
0139 The ESProducer can have two different `produce()` function signatures
0140 * If the function has the usual `TRecord const&` parameter, the function can read an ESProduct from the host memory space, and produce another product into the host memory space. An implicit copy of the data product from the host memory space to the device memory space (of the backend of the ESProducer) is registered as discussed above.
0141 * If the function has `device::Record<TRecord> const&` parameter, the function can read an ESProduct from the device memory space, and produce another product into the device memory space. No further copies are made by the framework. The `device::Record<TRecord>::queue()` gives the Alpaka `Queue` object into which all work in the ESProducer must be enqueued. 
0142 
0143 ### Tokens
0144 
0145 The memory spaces of the consumed and (in EDProducer case) produced data products are driven by the tokens. The token types to be used in different cases are summarized below. 
0146 
0147 
0148 |                                                                | Host memory space             | Device memory space              |
0149 |----------------------------------------------------------------|-------------------------------|----------------------------------|
0150 | Access Event data product of type `T`                          | `edm::EDGetTokenT<T>`         | `device::EDGetToken<T>`          |
0151 | Produce Event data product of type `T`                         | `edm::EDPutTokenT<T>`         | `device::EDPutToken<T>`          |
0152 | Access EventSetup data product of type `T` in Record `TRecord` | `edm::ESGetToken<T, TRecord>` | `device::ESGetToken<T, TRecord>` |
0153 
0154 With the device memory space tokens the type-deducing `consumes()`, `produces()`, and `esConsumes()` calls must be used (i.e. do not specify the data product type as part of the function call). For more information on these registration functions see
0155 * [`consumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMGetDataFromEvent#consumes)
0156 * [`produces()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts#Producing_the_EDProduct)
0157 * [`esConsumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ED_module)
0158 * [`consumes()` in ESProducers](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ESProducer)
0159 
0160 
0161 ### `fillDescriptions()`
0162 
0163 In the [`fillDescriptions()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp) function specifying the module label automatically with the [`edm::ConfigurationDescriptions::addWithDefaultLabel()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp#Automatic_module_labels_from_plu) is strongly recommended. Currently a `cfi` file is generated for a module for each Alpaka backend such that the backend namespace is explicitly used in the module definition. An additional `cfi` file is generated for the ["module type resolver"](#module-type-resolver-portable) functionality, where the module type has `@alpaka` postfix.
0164 
0165 Also note that the `fillDescription()` function must have the same content for all backends, i.e. any backend-specific behavior with e.g. `#ifdef` or `if constexpr` are forbidden.
0166 
0167 ## Guarantees
0168 
0169 * All Event data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed through the `device::Event`.
0170 * All Event data products in the host memory space are guaranteed to be accessible for all operations (after the data product has been obtained from the `edm::Event` or `device::Event`).
0171 * All EventSetup data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed via the `device::EventSetup` (ED modules), or by `device::Record<TRecord>::queue()` when accessed via the `device::Record<TRecord>` (ESProducers).
0172 * The EDM Stream does not proceed to the next Event until after all asynchronous work of the current Event has finished.
0173   * **Note**: this implies if an EDProducer in its `produce()` function uses the `Event::queue()` or gets a device-side data product, and does not produce any device-side data products, the `produce()` call will be synchronous (i.e. will block the CPU thread until the asynchronous work finishes)
0174 
0175 ## Examples
0176 
0177 For concrete examples see code in [`HeterogeneousCore/AlpakaTest`](../../HeterogeneousCore/AlpakaTest) and [`DataFormats/PortableTestObjects`](../../DataFormats/PortableTestObjects).
0178 
0179 ### EDProducer
0180 
0181 This example shows a mixture of behavior from test code in [`HeterogeneousCore/AlpakaTest/plugins/alpaka/`](HeterogeneousCore/AlpakaTest/plugins/alpaka/)
0182 ```cpp
0183 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDGetToken.h"
0184 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDPutToken.h"
0185 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESGetToken.h"
0186 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/Event.h"
0187 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EventSetup.h"
0188 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"
0189 #include "HeterogeneousCore/AlpakaInterface/interface/config.h"
0190 // + usual #includes for the used framework components, data format(s), record(s)
0191 
0192 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0193 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0194 
0195   // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0196   class ExampleAlpakaProducer : public global::EDProducer<> {
0197   public:
0198     ExampleAlpakaProducer(edm::ParameterSet const& iConfig)
0199         // produces() must not specify the product type, it is deduced from deviceToken_
0200         : deviceToken_{produces()}, size_{iConfig.getParameter<int32_t>("size")} {}
0201 
0202     // device::Event and device::EventSetup are defined in ALPAKA_ACCELERATOR_NAMESPACE as well
0203     void produce(edm::StreamID sid, device::Event& iEvent, device::EventSetup const& iSetup) const override {
0204       // get input data products
0205       auto const& hostInput = iEvent.get(getTokenHost_);
0206       auto const& deviceInput = iEvent.get(getTokenDevice_);
0207       auto const& deviceESData = iSetup.getData(esGetTokenDevice_);
0208     
0209       // run the algorithm, potentially asynchronously
0210       portabletest::TestDeviceCollection deviceProduct{size_, event.queue()};
0211       algo_.fill(event.queue(), hostInput, deviceInput, deviceESData, deviceProduct);
0212 
0213       // put the asynchronous product into the event without waiting
0214       // must use EDPutToken with emplace() or put()
0215       //
0216       // for a product produced with device::EDPutToken<T> the base class registers
0217       // a separately scheduled transformation function for the copy to host
0218       // the transformation function calls
0219       // cms::alpakatools::CopyToDevice<portabletest::TestDeviceCollection>::copyAsync(Queue&, portabletest::TestDeviceCollection const&)
0220       // function
0221       event.emplace(deviceToken_, std::move(deviceProduct));
0222     }
0223 
0224     static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0225       // All backends must have exactly the same fillDescriptions() content!
0226       edm::ParameterSetDescription desc;
0227       desc.add<int32_t>("size");
0228       descriptions.addWithDefaultLabel(desc);
0229     }
0230 
0231   private:
0232     // use edm::EGetTokenT<T> to read from host memory space
0233     edm::EDGetTokenT<FooProduct> const getTokenHost_;
0234     
0235     // use device::EDGetToken<T> to read from device memory space
0236     device::EDGetToken<BarProduct> const getTokenDevice_;
0237 
0238     // use device::ESGetToken<T, TRecord> to read from device memory space
0239     device::ESGetToken<TestProduct, TestRecord> const esGetTokenDevice_;
0240 
0241     // use device::EDPutToken<T> to place the data product in the device memory space
0242     device::EDPutToken<portabletest::TestDeviceCollection> const deviceToken_;
0243     int32_t const size_;
0244 
0245     // implementation of the algorithm
0246     TestAlgo algo_;
0247   };
0248 
0249 }  // namespace ALPAKA_ACCELERATOR_NAMESPACE
0250 
0251 #include "HeterogeneousCore/AlpakaCore/interface/MakerMacros.h"
0252 DEFINE_FWK_ALPAKA_MODULE(TestAlpakaProducer);
0253 
0254 ```
0255 
0256 ### ESProducer to reformat an existing ESProduct for use in device
0257 
0258 ```cpp
0259 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0260 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0261 
0262   // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0263   class ExampleAlpakaESProducer : public ESProducer {
0264   public:
0265     ExampleAlpakaESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
0266       // register the production function
0267       auto cc = setWhatProduced(this);
0268       // register consumed ESProduct(s)
0269       token_ = cc.consumes();
0270     }
0271 
0272     static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0273       // All backends must have exactly the same fillDescriptions() content!
0274       edm::ParameterSetDescription desc;
0275       descriptions.addWithDefaultLabel(desc);
0276     }
0277 
0278     // return type can be
0279     // - std::optional<T> (T is cheap to move),
0280     // - std::unique_ptr<T> (T is not cheap to move),
0281     // - std::shared_ptr<T> (allows sharing between IOVs)
0282     //
0283     // the base class registers a separately scheduled function to copy the product on device memory
0284     // the function calls
0285     // cms::alpakatools::CopyToDevice<SimpleProduct>::copyAsync(Queue&, SimpleProduct const&)
0286     // function
0287     std::optional<SimpleProduct> produce(TestRecord const& iRecord) {
0288       // get input data
0289       auto const& hostInput = iRecord.get(token_);
0290 
0291       // allocate data product on the host memory
0292       SimpleProduct hostProduct;
0293 
0294       // fill the hostProduct from hostInput
0295 
0296       return std::move(hostProduct);
0297     }
0298 
0299   private:
0300     edm::ESGetToken<TestProduct, TestRecord> token_;
0301   };
0302 }  // namespace ALPAKA_ACCELERATOR_NAMESPACE
0303 
0304 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
0305 DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaESProducer);
0306 ```
0307 
0308 ### ESProducer to derive a new ESProduct from an existing device-side ESProduct
0309 
0310 ```cpp
0311 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0312 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0313 
0314   // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0315   class ExampleAlpakaDeriveESProducer : public ESProducer {
0316   public:
0317     ExampleAlpakaDeriveESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
0318       // register the production function
0319       auto cc = setWhatProduced(this);
0320       // register consumed ESProduct(s)
0321       token_ = cc.consumes();
0322     }
0323 
0324     static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0325       // All backends must have exactly the same fillDescriptions() content!
0326       edm::ParameterSetDescription desc;
0327       descriptions.addWithDefaultLabel(desc);
0328     }
0329 
0330     std::optional<OtherProduct> produce(device::Record<TestRecord> const& iRecord) {
0331       // get input data in the device memory space
0332       auto const& deviceInput = iRecord.get(token_);
0333 
0334       // allocate data product on the device memory
0335       OtherProduct deviceProduct(iRecord.queue());
0336 
0337       // run the algorithm, potentially asynchronously
0338       algo_.fill(iRecord.queue(), deviceInput, deviceProduct);
0339 
0340       // return the product without waiting
0341       return std::move(deviceProduct);
0342     }
0343 
0344   private:
0345     device::ESGetToken<SimpleProduct, TestRecord> token_;
0346     
0347     OtherAlgo algo_;
0348   };
0349 
0350 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
0351 DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaDeviceESProducer);
0352 ```
0353 
0354 ## Configuration
0355 
0356 There are a few different options for using Alpaka-based modules in the CMSSW configuration.
0357 
0358 In all cases the configuration must load the necessary `ProcessAccelerator` objects (see below) For accelerators used in production, these are aggregated in `Configuration.StandardSequences.Accelerators_cff`. The `runTheMatrix.py` handles the loading of this `Accelerators_cff` automatically. The HLT menus also load the necessary `ProcessAccelerator`s.
0359 ```python
0360 ## Load explicitly
0361 # One ProcessAccelerator for each accelerator technology, plus a generic one for Alpaka
0362 process.load("Configuration.StandardSequences.Accelerators_cff")
0363 ```
0364 
0365 ### Explicit module type (non-portable)
0366 
0367 The Alpaka modules can be used in the python configuration with their explicit, full type names
0368 ```python
0369 process.producerCPU = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...)
0370 process.producerGPU = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
0371 ```
0372 Obviously this kind of configuration can be run only on machines that provide the necessary hardware. The configuration is thus explicitly non-portable.
0373 
0374 
0375 ### SwitchProducerCUDA (semi-portable)
0376 
0377 A step towards a portable configuration is to use the `SwitchProcucer` mechanism, for which currently the only concrete implementation is [`SwitchProducerCUDA`](../../HeterogeneousCore/CUDACore/README.md#automatic-switching-between-cpu-and-gpu-modules). The modules for different Alpaka backends still need to be specified explicitly
0378 ```python
0379 from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
0380 process.producer = SwitchProducerCUDA(
0381     cpu = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...),
0382     cuda = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
0383 )
0384 
0385 # or
0386 
0387 process.producer = SwitchProducerCUDA(
0388     cpu = cms.EDAlias(producerCPU = cms.EDAlias.allProducts(),
0389     cuda = cms.EDAlias(producerGPU = cms.EDAlias.allProducts()
0390 )
0391 ```
0392 This kind of configuration can be run on any machine (a given CMSSW build supports), but is limited to CMSSW builds where the modules for all the Alpaka backends declared in the configuration can be built (`alpaka_serial_sync` and `alpaka_cuda_async` in this example). Therefore the `SwitchProducer` approach is here called "semi-portable".
0393 
0394 ### Module type resolver (portable)
0395 
0396 A fully portable way to express a configuration can be achieved with "module type resolver" approach. The module is specified in the configuration without the backend-specific namespace, and with `@alpaka` postfix
0397 ```python
0398 process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka", ...)
0399 
0400 # backend can also be set explicitly
0401 process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0402     ...
0403     alpaka = cms.untracked.PSet(
0404         backend = cms.untracked.string("serial_sync")
0405     )
0406 )
0407 ```
0408 The `@alpaka` postfix in the module type tells the system the module's exact class type should be resolved at run time. The type (or backend) is set according to the value of `process.options.accelerators` and the set of accelerators available in the machine. If the backend is set explicitly in the module's `alpaka` PSet, the module of that backend will be used.
0409 
0410 This approach is portable also across CMSSW builds that support different sets of accelerators, as long as only the host backends (if any) are specified explicitly in the `alpaka` PSet.
0411 
0412 
0413 #### Examples on explicitly setting the backend
0414 
0415 ##### For individual modules
0416 
0417 The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This setting overrides the `ProcessAcceleratorAlpaka` setting described in the next paragraph.
0418 
0419 ```python
0420 process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0421     ...
0422     alpaka = cms.untracked.PSet(
0423         backend = cms.untracked.string("serial_sync") # or "cuda_async" or "rocm_async"
0424     )
0425 )
0426 ```
0427 
0428 ##### For all Alpaka modules
0429 
0430 The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This `ProcessAcceleratorAlpaka` setting can be further overridden for individual modules as described in the previous paragraph.
0431 
0432 ```python
0433 process.ProcessAcceleratorAlpaka.setBackend("serial_sync") # or "cuda_async" or "rocm_async"
0434 ```
0435 
0436 ##### For entire job (i.e. also for non-Alpaka modules)
0437 ```python
0438 process.options.accelerators = ["cpu"] # or "gpu-nvidia" or "gpu-amd"
0439 ```
0440 
0441 
0442 ## Unit tests
0443 
0444 Unit tests that depend on Alpaka and define `<flags ALPAKA_BACKENDS="1"/>`, e.g. as a binary along
0445 ```xml
0446 <bin name="<unique test binary name>" file="<comma-separated list of files">
0447   <use name="alpaka"/>
0448   <flags ALPAKA_BACKENDS="1"/>
0449 </bin>
0450 ```
0451 or as a command (e.g. `cmsRun` or a shell script) to run
0452 
0453 ```xml
0454 <test name="<unique name of the test>" command="<command to run>">
0455   <use name="alpaka"/>
0456   <flags ALPAKA_BACKENDS="1"/>
0457 </test>
0458 ```
0459 
0460 will be run as part of `scram build runtests` according to the
0461 availability of the hardware:
0462 - `serial_sync` version is run always
0463 - `cuda_async` version is run if NVIDIA GPU is present (i.e. `cudaIsEnabled` returns 0)
0464 - `rocm_async` version is run if AMD GPU is present (i.e. `rocmIsEnabled` returns 0)
0465 
0466 Tests for specific backend (or hardware) can be explicitly specified to be run by setting `USER_UNIT_TESTS=cuda` or `USER_UNIT_TESTS=rocm` environment variable. Tests not depending on the hardware are skipped. If the corresponding hardware is not available, the tests will fail.
0467