Back to home page

Project CMSSW displayed by LXR

 
 

    


Warning, /HeterogeneousCore/AlpakaCore/README.md is written in an unsupported language. File is not indexed.

0001 # Alpaka algorithms and modules in CMSSW
0002 
0003 ## Introduction
0004 
0005 This page documents the Alpaka integration within CMSSW. For more information about Alpaka itself see the [Alpaka documentation](https://alpaka.readthedocs.io/en/latest/).
0006 
0007 ### Compilation model
0008 
0009 The code in `Package/SubPackage/{interface,src,plugins,test}/alpaka` is compiled once for each enabled Alpaka backend. The `ALPAKA_ACCELERATOR_NAMESPACE` macro is substituted with a concrete, backend-specific namespace name in order to guarantee different symbol names for all backends, that allows for `cmsRun` to dynamically load any set of the backend libraries.
0010 
0011 The source files with `.dev.cc` suffix are compiled with the backend-specific device compiler. The other `.cc` source files are compiled with the host compiler.
0012 
0013 The `BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="1"/>` to enable the behavior described above.
0014 
0015 ## Overall guidelines
0016 
0017 * Minimize explicit blocking synchronization calls
0018   * Avoid `alpaka::wait()`, non-cached memory buffer allocations
0019 * If you can, use `global::EDProducer` base class
0020   * If you need per-stream storage
0021     * For few objects consider using [`edm::StreamCache<T>`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface#edm_StreamCacheT) with the global module, or
0022     * Use `stream::EDProducer`
0023   * If you need to transfer some data back to host, use `stream::SynchronizingEDProducer`
0024 * All code using `ALPAKA_ACCELERATOR_NAMESPACE` should be placed in `Package/SubPackage/{interface,src,plugins,test}/alpaka` directory
0025   * Alpaka-dependent code that uses templates instead of the namespace macro can be placed in `Package/SubPackage/interface` directory
0026 * All source files (not headers) using Alpaka device code (such as kernel call, functions called by kernels) must have a suffic `.dev.cc`, and be placed in the aforementioned `alpaka` subdirectory
0027 * Any code that `#include`s a header from the framework or from the `HeterogeneousCore/AlpakaCore` must be separated from the Alpaka device code, and have the usual `.cc` suffix.
0028   * Some framework headers are allowed to be used in `.dev.cc` files:
0029     * Any header containing only macros, e.g. `FWCore/Utilities/interface/CMSUnrollLoop.h`, `FWCore/Utilities/interface/stringize.h`
0030     * `FWCore/Utilities/interface/Exception.h`
0031     * `FWCore/MessageLogger/interface/MessageLogger.h`, although it is preferred to issue messages only in the `.cc` files
0032     * `HeterogeneousCore/AlpakaCore/interface/EventCache.h` and `HeterogeneousCore/AlpakaCore/interface/QueueCache.h` can, in principle, be used in `.dev.cc` files, even if there should be little need to use them explicitly
0033 
0034 ## Data formats
0035 
0036 Data formats, for both Event and EventSetup, should be placed following their usual rules. The Alpaka-specific conventions are
0037 * There must be a host-only flavor of the data format that is either independent of Alpaka, or depends only on Alpaka's Serial backend
0038   * The host-only data format must be defined in `Package/SubPackage/interface/` directory
0039   * If the data format is to be serialized (with ROOT), it must be serialized in a way that the on-disk format does not depend on Alpaka, i.e. it can be read without Alpaka
0040   * For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/classes{.h,_def.xml}`
0041     * As usual, the `classes_def.xml` should declare the dictionaries for the data product type `T` and `edm::Wrapper<T>`. These data products can be declared as persistent (default) or transient (`persistent="false"` attribute).
0042   * For EventSetup data products [the registration macro `TYPELOOKUP_DATA_REG`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToRegisterESData) should be placed in `Package/SubPackage/src/ES_<type name>.cc`.
0043 * The device-side data formats are defined in `Package/SubPackage/interface/alpaka/` directory
0044   * The device-side data format classes should be either templated over the device type, or defined in the `ALPAKA_ACCELERATOR_NAMESPACE` namespace.
0045   * For host backends (`serial`), the "device-side" data format class must be the same as the aforementioned host-only data format class
0046     * Use `ASSERT_DEVICE_MATCHES_HOST_COLLECTION(<device collection type>, <host collection type>);` macro to ensure that, see an example in [../../DataFormats/PortableTestObjects/interface/alpaka/TestDeviceCollection.h](TestDeviceCollection.h)
0047     * This equality is necessary for the [implicit data transfers](#implicit-data-transfers) to function properly
0048   * For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/alpaka/classes_<platform>{.h,_def.xml}`
0049     * The `classes_<platform>_def.xml` should declare the dictionaries for the data product type `T`, `edm::DeviceProduct<T>`, and `edm::Wrapper<edm::DeviceProduct<T>>`. All these dictionaries must be declared as transient with `persistent="false"` attribute.
0050     * The list of `<platform>` includes currently: `cuda`, `rocm`
0051   * For EventSetup data products the registration macro should be placed in `Package/SubPackage/src/alpaka/ES_<type name>.cc`
0052      * Data products defined in `ALPAKA_ACCELERATOR_NAMESPACE` should use `TYPELOOKUP_ALPAKA_DATA_REG` macro
0053      * Data products templated over the device type should use `TYPELOOKUP_ALPAKA_TEMPLATED_DATA_REG` macro
0054 * For Event data products the `DataFormats/SubPackage/BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="!serial"/>`
0055   * unless the package has something that is really specific for `serial` backend that is not generally applicable on host
0056 
0057 Note that even if for Event data formats the examples above used `DataFormats` package, Event data formats are allowed to be defined in other packages too in some circumstances. For full details please see [SWGuideCreatingNewProducts](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts).
0058 
0059 ### Implicit data transfers
0060 
0061 Both EDProducers and ESProducers make use of implicit data transfers.
0062 
0063 #### EDProducer
0064 
0065 In EDProducers for each device-side data product a transfer from the device memory space to the host memory space is registered automatically. The data product is copied only if the job has another EDModule that consumes the host-side data product. The framework code to issue the transfer makes use of `cms::alpakatools::CopyToHost` class template that must be specialized along
0066 ```cpp
0067 #include "HeterogeneousCore/AlpakaInterface/interface/CopyToHost.h"
0068 
0069 namespace cms::alpakatools {
0070   template <>
0071   struct CopyToHost<TSrc> {
0072     template <typename TQueue>
0073     static auto copyAsync(TQueue& queue, TSrc const& deviceProduct) -> TDst {
0074       // code to construct TDst object, and launch the asynchronous memcpy from the device of TQueue to the host
0075       return ...;
0076     }
0077   };
0078 }
0079 ```
0080 Note that the destination (host-side) type `TDst` can be different from or the same as the source (device-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer in a non-blocking way.
0081 
0082 The `CopyToHost` class template is partially specialized for all `PortableCollection` instantiations.
0083 
0084 #### ESProducer
0085 
0086 In ESProducers for each host-side data product a transfer from the host memory space to the device memory space (of the backend of the ESProducer) is registered automatically. The data product is copied only if the job has another ESProducer or EDModule that consumes the device-side data product. The framework code to issue makes use of `cms::alpakatools::CopyToDevice` class template that must be specialized along 
0087 ```cpp
0088 #include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
0089 
0090 namespace cms::alpakatools {
0091   template<>
0092   struct CopyToDevice<TSrc> {
0093     template <typename TQueue>
0094     static auto copyAsync(TQueue& queue, TSrc const& hostProduct) -> TDst {
0095       // code to construct TDst object, and launch the asynchronous memcpy from the host to the device of TQueue
0096       return ...;
0097     }
0098   };
0099 }
0100 ```
0101 Note that the destination (device-side) type `TDst` can be different from or the same as the source (host-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer (currently the synchronization blocks, but work is ongoing to make it non-blocking).
0102 
0103 The `CopyToDevice` class template is partially specialized for all `PortableCollection` instantiations.
0104 
0105 #### Data products with `memcpy()`ed pointers
0106 
0107 If the data product in question contains pointers to memory elsewhere within the data product, after the `alpaka::memcpy()` calls in the `copyAsync()` those pointers still point to device memory, and need to be updated. **Such data products are generally discouraged.** Nevertheless, such pointers can be updated without any additional synchronization by implementing a `postCopy()` function in the `CopyToHost` specialization along (extending the `CopyToHost` example [above](#edproducer))
0108 ```cpp
0109 namespace cms::alpakatools {
0110   template <>
0111   struct CopyToHost<TSrc> {
0112     // copyAsync() definition from above
0113 
0114     static void postCopy(TDst& obj) {
0115       // modify obj
0116       // any modifications must be such that the postCopy() can be
0117       // skipped when the obj originates from the host (i.e. on CPU backends)
0118     }
0119   };
0120 }
0121 ```
0122 The `postCopy()` is called after the operations enqueued in the `copyAsync()` have finished. The code in `postCopy()` must be such that the call to `postCopy()` can be omitted on CPU backends.
0123 
0124 Note that for `CopyToDevice` such `postCopy()` functionality is **not** provided. It should be possible to a issue kernel call (via an intermediate host-side function) from the `CopyToDevice::copyAsync()` function to achieve the same effect.
0125 
0126 ### `PortableCollection`
0127 
0128 For more information see [`DataFormats/Portable/README.md`](../../DataFormats/Portable/README.md) and [`DataFormats/SoATemplate/README.md`](../../DataFormats/SoATemplate/README.md).
0129 
0130 
0131 ## Modules
0132 
0133 ### Base classes
0134 
0135 The Alpaka-based EDModules should use one of the following base classes (that are defined in the `ALPAKA_ACCELERATOR_NAMESPACE`):
0136 
0137 * `global::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"`)
0138    * A [global EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface) that launches (possibly) asynchronous work
0139 * `stream::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/EDProducer.h"`)
0140    * A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that launches (possibly) asynchronous work
0141 * `stream::SynchronizingEDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/SynchronizingEDProducer.h"`)
0142    * A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that may launch (possibly) asynchronous work, and synchronizes the asynchronous work on the device with the host
0143       * The base class uses the [`edm::ExternalWork`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface#edm_ExternalWork) for the non-blocking synchronization
0144 
0145 The `...` can in principle be any of the module abilities listed in the linked TWiki pages, except the `edm::ExternalWork`. The majority of the Alpaka EDProducers should be `global::EDProducer` or `stream::EDProducer`, with `stream::SynchronizingEDProducer` used only in cases where some data to be copied from the device to the host, that requires synchronization, for different reason than copying an Event data product from the device to the host.
0146 
0147 New base classes (or other functionality) can be added based on new use cases that come up.
0148 
0149 The Alpaka-based ESProducers should use the `ESProducer` base class (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESProducer.h"`). Note that the Alpaka-based ESProducer constructor must pass the argument `edm::ParameterSet` object to the constructor of the `ESProducer` base class.
0150 
0151 Note that currently Alpaka-based ESSources are not supported. If you need to produce EventSetup data products into a Record for which there is no ESSource yet, use [`EmptyESSource`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMParametersForModules#EmptyESSource).
0152 
0153 
0154 ### Event, EventSetup, Records
0155 
0156 The Alpaka-based modules have a notion of a _host memory space_ and _device memory space_ for the Event and EventSetup data products. The data products in the host memory space are accessible for non-Alpaka modules, whereas the data products in device memory space are available only for modules of the specific Alpaka backend. The host backend(s) use the host memory space directly.
0157 
0158 The EDModules get `device::Event` and `device::EventSetup` from the framework, from which data products in both host memory space and device memory space can be accessed. Data products can also be produced to either memory space. For all data products produced in the device memory space an implicit data copy from the device memory space to the host memory space is registered as discussed above. The `device::Event::queue()` returns the Alpaka `Queue` object into which all work in the EDModule must be enqueued. 
0159 
0160 The ESProducer can have two different `produce()` function signatures
0161 * If the function has the usual `TRecord const&` parameter, the function can read an ESProduct from the host memory space, and produce another product into the host memory space. An implicit copy of the data product from the host memory space to the device memory space (of the backend of the ESProducer) is registered as discussed above.
0162 * If the function has `device::Record<TRecord> const&` parameter, the function can read an ESProduct from the device memory space, and produce another product into the device memory space. No further copies are made by the framework. The `device::Record<TRecord>::queue()` gives the Alpaka `Queue` object into which all work in the ESProducer must be enqueued. 
0163 
0164 ### Tokens
0165 
0166 The memory spaces of the consumed and (in EDProducer case) produced data products are driven by the tokens. The token types to be used in different cases are summarized below. 
0167 
0168 
0169 |                                                                | Host memory space             | Device memory space              |
0170 |----------------------------------------------------------------|-------------------------------|----------------------------------|
0171 | Access Event data product of type `T`                          | `edm::EDGetTokenT<T>`         | `device::EDGetToken<T>`          |
0172 | Produce Event data product of type `T`                         | `edm::EDPutTokenT<T>`         | `device::EDPutToken<T>`          |
0173 | Access EventSetup data product of type `T` in Record `TRecord` | `edm::ESGetToken<T, TRecord>` | `device::ESGetToken<T, TRecord>` |
0174 
0175 With the device memory space tokens the type-deducing `consumes()`, `produces()`, and `esConsumes()` calls must be used (i.e. do not specify the data product type as part of the function call). For more information on these registration functions see
0176 * [`consumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMGetDataFromEvent#consumes)
0177 * [`produces()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts#Producing_the_EDProduct)
0178 * [`esConsumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ED_module)
0179 * [`consumes()` in ESProducers](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ESProducer)
0180 
0181 
0182 ### `fillDescriptions()`
0183 
0184 In the [`fillDescriptions()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp) function specifying the module label automatically with the [`edm::ConfigurationDescriptions::addWithDefaultLabel()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp#Automatic_module_labels_from_plu) is strongly recommended. Currently a `cfi` file is generated for a module for each Alpaka backend such that the backend namespace is explicitly used in the module definition. An additional `cfi` file is generated for the ["module type resolver"](#module-type-resolver-portable) functionality, where the module type has `@alpaka` postfix.
0185 
0186 Also note that the `fillDescription()` function must have the same content for all backends, i.e. any backend-specific behavior with e.g. `#ifdef` or `if constexpr` are forbidden.
0187 
0188 ## Guarantees
0189 
0190 * All Event data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed through the `device::Event`.
0191 * All Event data products in the host memory space are guaranteed to be accessible for all operations (after the data product has been obtained from the `edm::Event` or `device::Event`).
0192 * All EventSetup data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed via the `device::EventSetup` (ED modules), or by `device::Record<TRecord>::queue()` when accessed via the `device::Record<TRecord>` (ESProducers).
0193 * The EDM Stream does not proceed to the next Event until after all asynchronous work of the current Event has finished.
0194   * **Note**: this implies if an EDProducer in its `produce()` function uses the `Event::queue()` or gets a device-side data product, and does not produce any device-side data products, the `produce()` call will be synchronous (i.e. will block the CPU thread until the asynchronous work finishes)
0195 
0196 ## Examples
0197 
0198 For concrete examples see code in [`HeterogeneousCore/AlpakaTest`](../../HeterogeneousCore/AlpakaTest) and [`DataFormats/PortableTestObjects`](../../DataFormats/PortableTestObjects).
0199 
0200 ### EDProducer
0201 
0202 This example shows a mixture of behavior from test code in [`HeterogeneousCore/AlpakaTest/plugins/alpaka/`](HeterogeneousCore/AlpakaTest/plugins/alpaka/)
0203 ```cpp
0204 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDGetToken.h"
0205 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDPutToken.h"
0206 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESGetToken.h"
0207 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/Event.h"
0208 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EventSetup.h"
0209 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"
0210 #include "HeterogeneousCore/AlpakaInterface/interface/config.h"
0211 // + usual #includes for the used framework components, data format(s), record(s)
0212 
0213 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0214 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0215 
0216   // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0217   class ExampleAlpakaProducer : public global::EDProducer<> {
0218   public:
0219     ExampleAlpakaProducer(edm::ParameterSet const& iConfig)
0220         // produces() must not specify the product type, it is deduced from deviceToken_
0221         : deviceToken_{produces()}, size_{iConfig.getParameter<int32_t>("size")} {}
0222 
0223     // device::Event and device::EventSetup are defined in ALPAKA_ACCELERATOR_NAMESPACE as well
0224     void produce(edm::StreamID sid, device::Event& iEvent, device::EventSetup const& iSetup) const override {
0225       // get input data products
0226       auto const& hostInput = iEvent.get(getTokenHost_);
0227       auto const& deviceInput = iEvent.get(getTokenDevice_);
0228       auto const& deviceESData = iSetup.getData(esGetTokenDevice_);
0229     
0230       // run the algorithm, potentially asynchronously
0231       portabletest::TestDeviceCollection deviceProduct{size_, event.queue()};
0232       algo_.fill(event.queue(), hostInput, deviceInput, deviceESData, deviceProduct);
0233 
0234       // put the asynchronous product into the event without waiting
0235       // must use EDPutToken with emplace() or put()
0236       //
0237       // for a product produced with device::EDPutToken<T> the base class registers
0238       // a separately scheduled transformation function for the copy to host
0239       // the transformation function calls
0240       // cms::alpakatools::CopyToDevice<portabletest::TestDeviceCollection>::copyAsync(Queue&, portabletest::TestDeviceCollection const&)
0241       // function
0242       event.emplace(deviceToken_, std::move(deviceProduct));
0243     }
0244 
0245     static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0246       // All backends must have exactly the same fillDescriptions() content!
0247       edm::ParameterSetDescription desc;
0248       desc.add<int32_t>("size");
0249       descriptions.addWithDefaultLabel(desc);
0250     }
0251 
0252   private:
0253     // use edm::EGetTokenT<T> to read from host memory space
0254     edm::EDGetTokenT<FooProduct> const getTokenHost_;
0255     
0256     // use device::EDGetToken<T> to read from device memory space
0257     device::EDGetToken<BarProduct> const getTokenDevice_;
0258 
0259     // use device::ESGetToken<T, TRecord> to read from device memory space
0260     device::ESGetToken<TestProduct, TestRecord> const esGetTokenDevice_;
0261 
0262     // use device::EDPutToken<T> to place the data product in the device memory space
0263     device::EDPutToken<portabletest::TestDeviceCollection> const deviceToken_;
0264     int32_t const size_;
0265 
0266     // implementation of the algorithm
0267     TestAlgo algo_;
0268   };
0269 
0270 }  // namespace ALPAKA_ACCELERATOR_NAMESPACE
0271 
0272 #include "HeterogeneousCore/AlpakaCore/interface/MakerMacros.h"
0273 DEFINE_FWK_ALPAKA_MODULE(TestAlpakaProducer);
0274 
0275 ```
0276 
0277 ### ESProducer to reformat an existing ESProduct for use in device
0278 
0279 ```cpp
0280 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0281 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0282 
0283   // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0284   class ExampleAlpakaESProducer : public ESProducer {
0285   public:
0286     ExampleAlpakaESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
0287       // register the production function
0288       auto cc = setWhatProduced(this);
0289       // register consumed ESProduct(s)
0290       token_ = cc.consumes();
0291     }
0292 
0293     static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0294       // All backends must have exactly the same fillDescriptions() content!
0295       edm::ParameterSetDescription desc;
0296       descriptions.addWithDefaultLabel(desc);
0297     }
0298 
0299     // return type can be
0300     // - std::optional<T> (T is cheap to move),
0301     // - std::unique_ptr<T> (T is not cheap to move),
0302     // - std::shared_ptr<T> (allows sharing between IOVs)
0303     //
0304     // the base class registers a separately scheduled function to copy the product on device memory
0305     // the function calls
0306     // cms::alpakatools::CopyToDevice<SimpleProduct>::copyAsync(Queue&, SimpleProduct const&)
0307     // function
0308     std::optional<SimpleProduct> produce(TestRecord const& iRecord) {
0309       // get input data
0310       auto const& hostInput = iRecord.get(token_);
0311 
0312       // allocate data product on the host memory
0313       SimpleProduct hostProduct;
0314 
0315       // fill the hostProduct from hostInput
0316 
0317       return std::move(hostProduct);
0318     }
0319 
0320   private:
0321     edm::ESGetToken<TestProduct, TestRecord> token_;
0322   };
0323 }  // namespace ALPAKA_ACCELERATOR_NAMESPACE
0324 
0325 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
0326 DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaESProducer);
0327 ```
0328 
0329 ### ESProducer to derive a new ESProduct from an existing device-side ESProduct
0330 
0331 ```cpp
0332 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0333 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0334 
0335   // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0336   class ExampleAlpakaDeriveESProducer : public ESProducer {
0337   public:
0338     ExampleAlpakaDeriveESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
0339       // register the production function
0340       auto cc = setWhatProduced(this);
0341       // register consumed ESProduct(s)
0342       token_ = cc.consumes();
0343     }
0344 
0345     static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0346       // All backends must have exactly the same fillDescriptions() content!
0347       edm::ParameterSetDescription desc;
0348       descriptions.addWithDefaultLabel(desc);
0349     }
0350 
0351     std::optional<OtherProduct> produce(device::Record<TestRecord> const& iRecord) {
0352       // get input data in the device memory space
0353       auto const& deviceInput = iRecord.get(token_);
0354 
0355       // allocate data product on the device memory
0356       OtherProduct deviceProduct(iRecord.queue());
0357 
0358       // run the algorithm, potentially asynchronously
0359       algo_.fill(iRecord.queue(), deviceInput, deviceProduct);
0360 
0361       // return the product without waiting
0362       return std::move(deviceProduct);
0363     }
0364 
0365   private:
0366     device::ESGetToken<SimpleProduct, TestRecord> token_;
0367     
0368     OtherAlgo algo_;
0369   };
0370 
0371 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
0372 DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaDeviceESProducer);
0373 ```
0374 
0375 ## Configuration
0376 
0377 There are a few different options for using Alpaka-based modules in the CMSSW configuration.
0378 
0379 In all cases the configuration must load the necessary `ProcessAccelerator` objects (see below) For accelerators used in production, these are aggregated in `Configuration.StandardSequences.Accelerators_cff`. The `runTheMatrix.py` handles the loading of this `Accelerators_cff` automatically. The HLT menus also load the necessary `ProcessAccelerator`s.
0380 ```python
0381 ## Load explicitly
0382 # One ProcessAccelerator for each accelerator technology, plus a generic one for Alpaka
0383 process.load("Configuration.StandardSequences.Accelerators_cff")
0384 ```
0385 
0386 ### Explicit module type (non-portable)
0387 
0388 The Alpaka modules can be used in the python configuration with their explicit, full type names
0389 ```python
0390 process.producerCPU = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...)
0391 process.producerGPU = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
0392 ```
0393 Obviously this kind of configuration can be run only on machines that provide the necessary hardware. The configuration is thus explicitly non-portable.
0394 
0395 
0396 ### SwitchProducerCUDA (semi-portable)
0397 
0398 A step towards a portable configuration is to use the `SwitchProcucer` mechanism, for which currently the only concrete implementation is [`SwitchProducerCUDA`](../../HeterogeneousCore/CUDACore/README.md#automatic-switching-between-cpu-and-gpu-modules). The modules for different Alpaka backends still need to be specified explicitly
0399 ```python
0400 from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
0401 process.producer = SwitchProducerCUDA(
0402     cpu = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...),
0403     cuda = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
0404 )
0405 
0406 # or
0407 
0408 process.producer = SwitchProducerCUDA(
0409     cpu = cms.EDAlias(producerCPU = cms.EDAlias.allProducts(),
0410     cuda = cms.EDAlias(producerGPU = cms.EDAlias.allProducts()
0411 )
0412 ```
0413 This kind of configuration can be run on any machine (a given CMSSW build supports), but is limited to CMSSW builds where the modules for all the Alpaka backends declared in the configuration can be built (`alpaka_serial_sync` and `alpaka_cuda_async` in this example). Therefore the `SwitchProducer` approach is here called "semi-portable".
0414 
0415 ### Module type resolver (portable)
0416 
0417 A fully portable way to express a configuration can be achieved with "module type resolver" approach. The module is specified in the configuration without the backend-specific namespace, and with `@alpaka` postfix
0418 ```python
0419 process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka", ...)
0420 
0421 # backend can also be set explicitly
0422 process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0423     ...
0424     alpaka = cms.untracked.PSet(
0425         backend = cms.untracked.string("serial_sync")
0426     )
0427 )
0428 ```
0429 The `@alpaka` postfix in the module type tells the system the module's exact class type should be resolved at run time. The type (or backend) is set according to the value of `process.options.accelerators` and the set of accelerators available in the machine. If the backend is set explicitly in the module's `alpaka` PSet, the module of that backend will be used.
0430 
0431 This approach is portable also across CMSSW builds that support different sets of accelerators, as long as only the host backends (if any) are specified explicitly in the `alpaka` PSet.
0432 
0433 
0434 #### Examples on explicitly setting the backend
0435 
0436 ##### For individual modules
0437 
0438 The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This setting overrides the `ProcessAcceleratorAlpaka` setting described in the next paragraph.
0439 
0440 ```python
0441 process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0442     ...
0443     alpaka = cms.untracked.PSet(
0444         backend = cms.untracked.string("serial_sync") # or "cuda_async" or "rocm_async"
0445     )
0446 )
0447 ```
0448 
0449 ##### For all Alpaka modules
0450 
0451 The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This `ProcessAcceleratorAlpaka` setting can be further overridden for individual modules as described in the previous paragraph.
0452 
0453 ```python
0454 process.ProcessAcceleratorAlpaka.setBackend("serial_sync") # or "cuda_async" or "rocm_async"
0455 ```
0456 
0457 ##### For entire job (i.e. also for non-Alpaka modules)
0458 ```python
0459 process.options.accelerators = ["cpu"] # or "gpu-nvidia" or "gpu-amd"
0460 ```
0461 
0462 
0463 ## Unit tests
0464 
0465 Unit tests that depend on Alpaka and define `<flags ALPAKA_BACKENDS="1"/>`, e.g. as a binary along
0466 ```xml
0467 <bin name="<unique test binary name>" file="<comma-separated list of files">
0468   <use name="alpaka"/>
0469   <flags ALPAKA_BACKENDS="1"/>
0470 </bin>
0471 ```
0472 or as a command (e.g. `cmsRun` or a shell script) to run
0473 
0474 ```xml
0475 <test name="<unique name of the test>" command="<command to run>">
0476   <use name="alpaka"/>
0477   <flags ALPAKA_BACKENDS="1"/>
0478 </test>
0479 ```
0480 
0481 will be run as part of `scram build runtests` according to the
0482 availability of the hardware:
0483 - `serial_sync` version is run always
0484 - `cuda_async` version is run if NVIDIA GPU is present (i.e. `cudaIsEnabled` returns 0)
0485 - `rocm_async` version is run if AMD GPU is present (i.e. `rocmIsEnabled` returns 0)
0486 
0487 Tests for specific backend (or hardware) can be explicitly specified to be run by setting `USER_UNIT_TESTS=cuda` or `USER_UNIT_TESTS=rocm` environment variable. Tests not depending on the hardware are skipped. If the corresponding hardware is not available, the tests will fail.
0488