Warning, /HeterogeneousCore/AlpakaCore/README.md is written in an unsupported language. File is not indexed.
0001 # Alpaka algorithms and modules in CMSSW
0002
0003 ## Introduction
0004
0005 This page documents the Alpaka integration within CMSSW. For more information about Alpaka itself see the [Alpaka documentation](https://alpaka.readthedocs.io/en/latest/).
0006
0007 ### Compilation model
0008
0009 The code in `Package/SubPackage/{interface,src,plugins,test}/alpaka` is compiled once for each enabled Alpaka backend. The `ALPAKA_ACCELERATOR_NAMESPACE` macro is substituted with a concrete, backend-specific namespace name in order to guarantee different symbol names for all backends, that allows for `cmsRun` to dynamically load any set of the backend libraries.
0010
0011 The source files with `.dev.cc` suffix are compiled with the backend-specific device compiler. The other `.cc` source files are compiled with the host compiler.
0012
0013 The `BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="1"/>` to enable the behavior described above.
0014
0015 ## Overall guidelines
0016
0017 * Minimize explicit blocking synchronization calls
0018 * Avoid `alpaka::wait()`, non-cached memory buffer allocations
0019 * If you can, use `global::EDProducer` base class
0020 * If you need per-stream storage
0021 * For few objects consider using [`edm::StreamCache<T>`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface#edm_StreamCacheT) with the global module, or
0022 * Use `stream::EDProducer`
0023 * If you need to transfer some data back to host, use `stream::SynchronizingEDProducer`
0024 * All code using `ALPAKA_ACCELERATOR_NAMESPACE` should be placed in `Package/SubPackage/{interface,src,plugins,test}/alpaka` directory
0025 * Alpaka-dependent code that uses templates instead of the namespace macro can be placed in `Package/SubPackage/interface` directory
0026 * All source files (not headers) using Alpaka device code (such as kernel call, functions called by kernels) must have a suffic `.dev.cc`, and be placed in the aforementioned `alpaka` subdirectory
0027 * Any code that `#include`s a header from the framework or from the `HeterogeneousCore/AlpakaCore` must be separated from the Alpaka device code, and have the usual `.cc` suffix.
0028 * Some framework headers are allowed to be used in `.dev.cc` files:
0029 * Any header containing only macros, e.g. `FWCore/Utilities/interface/CMSUnrollLoop.h`, `FWCore/Utilities/interface/stringize.h`
0030 * `FWCore/Utilities/interface/Exception.h`
0031 * `FWCore/MessageLogger/interface/MessageLogger.h`, although it is preferred to issue messages only in the `.cc` files
0032 * `HeterogeneousCore/AlpakaCore/interface/EventCache.h` and `HeterogeneousCore/AlpakaCore/interface/QueueCache.h` can, in principle, be used in `.dev.cc` files, even if there should be little need to use them explicitly
0033
0034 ## Data formats
0035
0036 Data formats, for both Event and EventSetup, should be placed following their usual rules. The Alpaka-specific conventions are
0037 * There must be a host-only flavor of the data format that is either independent of Alpaka, or depends only on Alpaka's Serial backend
0038 * The host-only data format must be defined in `Package/SubPackage/interface/` directory
0039 * If the data format is to be serialized (with ROOT), it must be serialized in a way that the on-disk format does not depend on Alpaka, i.e. it can be read without Alpaka
0040 * For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/classes{.h,_def.xml}`
0041 * As usual, the `classes_def.xml` should declare the dictionaries for the data product type `T` and `edm::Wrapper<T>`. These data products can be declared as persistent (default) or transient (`persistent="false"` attribute).
0042 * For EventSetup data products [the registration macro `TYPELOOKUP_DATA_REG`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToRegisterESData) should be placed in `Package/SubPackage/src/ES_<type name>.cc`.
0043 * The device-side data formats are defined in `Package/SubPackage/interface/alpaka/` directory
0044 * The device-side data format classes should be either templated over the device type, or defined in the `ALPAKA_ACCELERATOR_NAMESPACE` namespace.
0045 * For host backends (`serial`), the "device-side" data format class must be the same as the aforementioned host-only data format class
0046 * Use `ASSERT_DEVICE_MATCHES_HOST_COLLECTION(<device collection type>, <host collection type>);` macro to ensure that, see an example in [../../DataFormats/PortableTestObjects/interface/alpaka/TestDeviceCollection.h](TestDeviceCollection.h)
0047 * This equality is necessary for the [implicit data transfers](#implicit-data-transfers) to function properly
0048 * For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/alpaka/classes_<platform>{.h,_def.xml}`
0049 * The `classes_<platform>_def.xml` should declare the dictionaries for the data product type `T`, `edm::DeviceProduct<T>`, and `edm::Wrapper<edm::DeviceProduct<T>>`. All these dictionaries must be declared as transient with `persistent="false"` attribute.
0050 * The list of `<platform>` includes currently: `cuda`, `rocm`
0051 * For EventSetup data products the registration macro should be placed in `Package/SubPackage/src/alpaka/ES_<type name>.cc`
0052 * Data products defined in `ALPAKA_ACCELERATOR_NAMESPACE` should use `TYPELOOKUP_ALPAKA_DATA_REG` macro
0053 * Data products templated over the device type should use `TYPELOOKUP_ALPAKA_TEMPLATED_DATA_REG` macro
0054 * For Event data products the `DataFormats/SubPackage/BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="!serial"/>`
0055 * unless the package has something that is really specific for `serial` backend that is not generally applicable on host
0056
0057 Note that even if for Event data formats the examples above used `DataFormats` package, Event data formats are allowed to be defined in other packages too in some circumstances. For full details please see [SWGuideCreatingNewProducts](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts).
0058
0059 ### Implicit data transfers
0060
0061 Both EDProducers and ESProducers make use of implicit data transfers.
0062
0063 #### EDProducer
0064
0065 In EDProducers for each device-side data product a transfer from the device memory space to the host memory space is registered automatically. The data product is copied only if the job has another EDModule that consumes the host-side data product. The framework code to issue the transfer makes use of `cms::alpakatools::CopyToHost` class template that must be specialized along
0066 ```cpp
0067 #include "HeterogeneousCore/AlpakaInterface/interface/CopyToHost.h"
0068
0069 namespace cms::alpakatools {
0070 template <>
0071 struct CopyToHost<TSrc> {
0072 template <typename TQueue>
0073 static auto copyAsync(TQueue& queue, TSrc const& deviceProduct) -> TDst {
0074 // code to construct TDst object, and launch the asynchronous memcpy from the device of TQueue to the host
0075 return ...;
0076 }
0077 };
0078 }
0079 ```
0080 Note that the destination (host-side) type `TDst` can be different from or the same as the source (device-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer in a non-blocking way.
0081
0082 The `CopyToHost` class template is partially specialized for all `PortableCollection` instantiations.
0083
0084 #### ESProducer
0085
0086 In ESProducers for each host-side data product a transfer from the host memory space to the device memory space (of the backend of the ESProducer) is registered automatically. The data product is copied only if the job has another ESProducer or EDModule that consumes the device-side data product. The framework code to issue makes use of `cms::alpakatools::CopyToDevice` class template that must be specialized along
0087 ```cpp
0088 #include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
0089
0090 namespace cms::alpakatools {
0091 template<>
0092 struct CopyToDevice<TSrc> {
0093 template <typename TQueue>
0094 static auto copyAsync(TQueue& queue, TSrc const& hostProduct) -> TDst {
0095 // code to construct TDst object, and launch the asynchronous memcpy from the host to the device of TQueue
0096 return ...;
0097 }
0098 };
0099 }
0100 ```
0101 Note that the destination (device-side) type `TDst` can be different from or the same as the source (host-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer (currently the synchronization blocks, but work is ongoing to make it non-blocking).
0102
0103 The `CopyToDevice` class template is partially specialized for all `PortableCollection` instantiations.
0104
0105 ### `PortableCollection`
0106
0107 For more information see [`DataFormats/Portable/README.md`](../../DataFormats/Portable/README.md) and [`DataFormats/SoATemplate/README.md`](../../DataFormats/SoATemplate/README.md).
0108
0109
0110 ## Modules
0111
0112 ### Base classes
0113
0114 The Alpaka-based EDModules should use one of the following base classes (that are defined in the `ALPAKA_ACCELERATOR_NAMESPACE`):
0115
0116 * `global::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"`)
0117 * A [global EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface) that launches (possibly) asynchronous work
0118 * `stream::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/EDProducer.h"`)
0119 * A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that launches (possibly) asynchronous work
0120 * `stream::SynchronizingEDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/SynchronizingEDProducer.h"`)
0121 * A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that may launch (possibly) asynchronous work, and synchronizes the asynchronous work on the device with the host
0122 * The base class uses the [`edm::ExternalWork`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface#edm_ExternalWork) for the non-blocking synchronization
0123
0124 The `...` can in principle be any of the module abilities listed in the linked TWiki pages, except the `edm::ExternalWork`. The majority of the Alpaka EDProducers should be `global::EDProducer` or `stream::EDProducer`, with `stream::SynchronizingEDProducer` used only in cases where some data to be copied from the device to the host, that requires synchronization, for different reason than copying an Event data product from the device to the host.
0125
0126 New base classes (or other functionality) can be added based on new use cases that come up.
0127
0128 The Alpaka-based ESProducers should use the `ESProducer` base class (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESProducer.h"`). Note that the Alpaka-based ESProducer constructor must pass the argument `edm::ParameterSet` object to the constructor of the `ESProducer` base class.
0129
0130 Note that currently Alpaka-based ESSources are not supported. If you need to produce EventSetup data products into a Record for which there is no ESSource yet, use [`EmptyESSource`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMParametersForModules#EmptyESSource).
0131
0132
0133 ### Event, EventSetup, Records
0134
0135 The Alpaka-based modules have a notion of a _host memory space_ and _device memory space_ for the Event and EventSetup data products. The data products in the host memory space are accessible for non-Alpaka modules, whereas the data products in device memory space are available only for modules of the specific Alpaka backend. The host backend(s) use the host memory space directly.
0136
0137 The EDModules get `device::Event` and `device::EventSetup` from the framework, from which data products in both host memory space and device memory space can be accessed. Data products can also be produced to either memory space. For all data products produced in the device memory space an implicit data copy from the device memory space to the host memory space is registered as discussed above. The `device::Event::queue()` returns the Alpaka `Queue` object into which all work in the EDModule must be enqueued.
0138
0139 The ESProducer can have two different `produce()` function signatures
0140 * If the function has the usual `TRecord const&` parameter, the function can read an ESProduct from the host memory space, and produce another product into the host memory space. An implicit copy of the data product from the host memory space to the device memory space (of the backend of the ESProducer) is registered as discussed above.
0141 * If the function has `device::Record<TRecord> const&` parameter, the function can read an ESProduct from the device memory space, and produce another product into the device memory space. No further copies are made by the framework. The `device::Record<TRecord>::queue()` gives the Alpaka `Queue` object into which all work in the ESProducer must be enqueued.
0142
0143 ### Tokens
0144
0145 The memory spaces of the consumed and (in EDProducer case) produced data products are driven by the tokens. The token types to be used in different cases are summarized below.
0146
0147
0148 | | Host memory space | Device memory space |
0149 |----------------------------------------------------------------|-------------------------------|----------------------------------|
0150 | Access Event data product of type `T` | `edm::EDGetTokenT<T>` | `device::EDGetToken<T>` |
0151 | Produce Event data product of type `T` | `edm::EDPutTokenT<T>` | `device::EDPutToken<T>` |
0152 | Access EventSetup data product of type `T` in Record `TRecord` | `edm::ESGetToken<T, TRecord>` | `device::ESGetToken<T, TRecord>` |
0153
0154 With the device memory space tokens the type-deducing `consumes()`, `produces()`, and `esConsumes()` calls must be used (i.e. do not specify the data product type as part of the function call). For more information on these registration functions see
0155 * [`consumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMGetDataFromEvent#consumes)
0156 * [`produces()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts#Producing_the_EDProduct)
0157 * [`esConsumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ED_module)
0158 * [`consumes()` in ESProducers](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ESProducer)
0159
0160
0161 ### `fillDescriptions()`
0162
0163 In the [`fillDescriptions()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp) function specifying the module label automatically with the [`edm::ConfigurationDescriptions::addWithDefaultLabel()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp#Automatic_module_labels_from_plu) is strongly recommended. Currently a `cfi` file is generated for a module for each Alpaka backend such that the backend namespace is explicitly used in the module definition. An additional `cfi` file is generated for the ["module type resolver"](#module-type-resolver-portable) functionality, where the module type has `@alpaka` postfix.
0164
0165 Also note that the `fillDescription()` function must have the same content for all backends, i.e. any backend-specific behavior with e.g. `#ifdef` or `if constexpr` are forbidden.
0166
0167 ## Guarantees
0168
0169 * All Event data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed through the `device::Event`.
0170 * All Event data products in the host memory space are guaranteed to be accessible for all operations (after the data product has been obtained from the `edm::Event` or `device::Event`).
0171 * All EventSetup data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed via the `device::EventSetup` (ED modules), or by `device::Record<TRecord>::queue()` when accessed via the `device::Record<TRecord>` (ESProducers).
0172 * The EDM Stream does not proceed to the next Event until after all asynchronous work of the current Event has finished.
0173 * **Note**: this implies if an EDProducer in its `produce()` function uses the `Event::queue()` or gets a device-side data product, and does not produce any device-side data products, the `produce()` call will be synchronous (i.e. will block the CPU thread until the asynchronous work finishes)
0174
0175 ## Examples
0176
0177 For concrete examples see code in [`HeterogeneousCore/AlpakaTest`](../../HeterogeneousCore/AlpakaTest) and [`DataFormats/PortableTestObjects`](../../DataFormats/PortableTestObjects).
0178
0179 ### EDProducer
0180
0181 This example shows a mixture of behavior from test code in [`HeterogeneousCore/AlpakaTest/plugins/alpaka/`](HeterogeneousCore/AlpakaTest/plugins/alpaka/)
0182 ```cpp
0183 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDGetToken.h"
0184 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDPutToken.h"
0185 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESGetToken.h"
0186 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/Event.h"
0187 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EventSetup.h"
0188 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"
0189 #include "HeterogeneousCore/AlpakaInterface/interface/config.h"
0190 // + usual #includes for the used framework components, data format(s), record(s)
0191
0192 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0193 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0194
0195 // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0196 class ExampleAlpakaProducer : public global::EDProducer<> {
0197 public:
0198 ExampleAlpakaProducer(edm::ParameterSet const& iConfig)
0199 // produces() must not specify the product type, it is deduced from deviceToken_
0200 : deviceToken_{produces()}, size_{iConfig.getParameter<int32_t>("size")} {}
0201
0202 // device::Event and device::EventSetup are defined in ALPAKA_ACCELERATOR_NAMESPACE as well
0203 void produce(edm::StreamID sid, device::Event& iEvent, device::EventSetup const& iSetup) const override {
0204 // get input data products
0205 auto const& hostInput = iEvent.get(getTokenHost_);
0206 auto const& deviceInput = iEvent.get(getTokenDevice_);
0207 auto const& deviceESData = iSetup.getData(esGetTokenDevice_);
0208
0209 // run the algorithm, potentially asynchronously
0210 portabletest::TestDeviceCollection deviceProduct{size_, event.queue()};
0211 algo_.fill(event.queue(), hostInput, deviceInput, deviceESData, deviceProduct);
0212
0213 // put the asynchronous product into the event without waiting
0214 // must use EDPutToken with emplace() or put()
0215 //
0216 // for a product produced with device::EDPutToken<T> the base class registers
0217 // a separately scheduled transformation function for the copy to host
0218 // the transformation function calls
0219 // cms::alpakatools::CopyToDevice<portabletest::TestDeviceCollection>::copyAsync(Queue&, portabletest::TestDeviceCollection const&)
0220 // function
0221 event.emplace(deviceToken_, std::move(deviceProduct));
0222 }
0223
0224 static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0225 // All backends must have exactly the same fillDescriptions() content!
0226 edm::ParameterSetDescription desc;
0227 desc.add<int32_t>("size");
0228 descriptions.addWithDefaultLabel(desc);
0229 }
0230
0231 private:
0232 // use edm::EGetTokenT<T> to read from host memory space
0233 edm::EDGetTokenT<FooProduct> const getTokenHost_;
0234
0235 // use device::EDGetToken<T> to read from device memory space
0236 device::EDGetToken<BarProduct> const getTokenDevice_;
0237
0238 // use device::ESGetToken<T, TRecord> to read from device memory space
0239 device::ESGetToken<TestProduct, TestRecord> const esGetTokenDevice_;
0240
0241 // use device::EDPutToken<T> to place the data product in the device memory space
0242 device::EDPutToken<portabletest::TestDeviceCollection> const deviceToken_;
0243 int32_t const size_;
0244
0245 // implementation of the algorithm
0246 TestAlgo algo_;
0247 };
0248
0249 } // namespace ALPAKA_ACCELERATOR_NAMESPACE
0250
0251 #include "HeterogeneousCore/AlpakaCore/interface/MakerMacros.h"
0252 DEFINE_FWK_ALPAKA_MODULE(TestAlpakaProducer);
0253
0254 ```
0255
0256 ### ESProducer to reformat an existing ESProduct for use in device
0257
0258 ```cpp
0259 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0260 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0261
0262 // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0263 class ExampleAlpakaESProducer : public ESProducer {
0264 public:
0265 ExampleAlpakaESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
0266 // register the production function
0267 auto cc = setWhatProduced(this);
0268 // register consumed ESProduct(s)
0269 token_ = cc.consumes();
0270 }
0271
0272 static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0273 // All backends must have exactly the same fillDescriptions() content!
0274 edm::ParameterSetDescription desc;
0275 descriptions.addWithDefaultLabel(desc);
0276 }
0277
0278 // return type can be
0279 // - std::optional<T> (T is cheap to move),
0280 // - std::unique_ptr<T> (T is not cheap to move),
0281 // - std::shared_ptr<T> (allows sharing between IOVs)
0282 //
0283 // the base class registers a separately scheduled function to copy the product on device memory
0284 // the function calls
0285 // cms::alpakatools::CopyToDevice<SimpleProduct>::copyAsync(Queue&, SimpleProduct const&)
0286 // function
0287 std::optional<SimpleProduct> produce(TestRecord const& iRecord) {
0288 // get input data
0289 auto const& hostInput = iRecord.get(token_);
0290
0291 // allocate data product on the host memory
0292 SimpleProduct hostProduct;
0293
0294 // fill the hostProduct from hostInput
0295
0296 return std::move(hostProduct);
0297 }
0298
0299 private:
0300 edm::ESGetToken<TestProduct, TestRecord> token_;
0301 };
0302 } // namespace ALPAKA_ACCELERATOR_NAMESPACE
0303
0304 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
0305 DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaESProducer);
0306 ```
0307
0308 ### ESProducer to derive a new ESProduct from an existing device-side ESProduct
0309
0310 ```cpp
0311 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0312 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0313
0314 // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0315 class ExampleAlpakaDeriveESProducer : public ESProducer {
0316 public:
0317 ExampleAlpakaDeriveESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
0318 // register the production function
0319 auto cc = setWhatProduced(this);
0320 // register consumed ESProduct(s)
0321 token_ = cc.consumes();
0322 }
0323
0324 static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0325 // All backends must have exactly the same fillDescriptions() content!
0326 edm::ParameterSetDescription desc;
0327 descriptions.addWithDefaultLabel(desc);
0328 }
0329
0330 std::optional<OtherProduct> produce(device::Record<TestRecord> const& iRecord) {
0331 // get input data in the device memory space
0332 auto const& deviceInput = iRecord.get(token_);
0333
0334 // allocate data product on the device memory
0335 OtherProduct deviceProduct(iRecord.queue());
0336
0337 // run the algorithm, potentially asynchronously
0338 algo_.fill(iRecord.queue(), deviceInput, deviceProduct);
0339
0340 // return the product without waiting
0341 return std::move(deviceProduct);
0342 }
0343
0344 private:
0345 device::ESGetToken<SimpleProduct, TestRecord> token_;
0346
0347 OtherAlgo algo_;
0348 };
0349
0350 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
0351 DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaDeviceESProducer);
0352 ```
0353
0354 ## Configuration
0355
0356 There are a few different options for using Alpaka-based modules in the CMSSW configuration.
0357
0358 In all cases the configuration must load the necessary `ProcessAccelerator` objects (see below) For accelerators used in production, these are aggregated in `Configuration.StandardSequences.Accelerators_cff`. The `runTheMatrix.py` handles the loading of this `Accelerators_cff` automatically. The HLT menus also load the necessary `ProcessAccelerator`s.
0359 ```python
0360 ## Load explicitly
0361 # One ProcessAccelerator for each accelerator technology, plus a generic one for Alpaka
0362 process.load("Configuration.StandardSequences.Accelerators_cff")
0363 ```
0364
0365 ### Explicit module type (non-portable)
0366
0367 The Alpaka modules can be used in the python configuration with their explicit, full type names
0368 ```python
0369 process.producerCPU = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...)
0370 process.producerGPU = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
0371 ```
0372 Obviously this kind of configuration can be run only on machines that provide the necessary hardware. The configuration is thus explicitly non-portable.
0373
0374
0375 ### SwitchProducerCUDA (semi-portable)
0376
0377 A step towards a portable configuration is to use the `SwitchProcucer` mechanism, for which currently the only concrete implementation is [`SwitchProducerCUDA`](../../HeterogeneousCore/CUDACore/README.md#automatic-switching-between-cpu-and-gpu-modules). The modules for different Alpaka backends still need to be specified explicitly
0378 ```python
0379 from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
0380 process.producer = SwitchProducerCUDA(
0381 cpu = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...),
0382 cuda = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
0383 )
0384
0385 # or
0386
0387 process.producer = SwitchProducerCUDA(
0388 cpu = cms.EDAlias(producerCPU = cms.EDAlias.allProducts(),
0389 cuda = cms.EDAlias(producerGPU = cms.EDAlias.allProducts()
0390 )
0391 ```
0392 This kind of configuration can be run on any machine (a given CMSSW build supports), but is limited to CMSSW builds where the modules for all the Alpaka backends declared in the configuration can be built (`alpaka_serial_sync` and `alpaka_cuda_async` in this example). Therefore the `SwitchProducer` approach is here called "semi-portable".
0393
0394 ### Module type resolver (portable)
0395
0396 A fully portable way to express a configuration can be achieved with "module type resolver" approach. The module is specified in the configuration without the backend-specific namespace, and with `@alpaka` postfix
0397 ```python
0398 process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka", ...)
0399
0400 # backend can also be set explicitly
0401 process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0402 ...
0403 alpaka = cms.untracked.PSet(
0404 backend = cms.untracked.string("serial_sync")
0405 )
0406 )
0407 ```
0408 The `@alpaka` postfix in the module type tells the system the module's exact class type should be resolved at run time. The type (or backend) is set according to the value of `process.options.accelerators` and the set of accelerators available in the machine. If the backend is set explicitly in the module's `alpaka` PSet, the module of that backend will be used.
0409
0410 This approach is portable also across CMSSW builds that support different sets of accelerators, as long as only the host backends (if any) are specified explicitly in the `alpaka` PSet.
0411
0412
0413 #### Examples on explicitly setting the backend
0414
0415 ##### For individual modules
0416
0417 The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This setting overrides the `ProcessAcceleratorAlpaka` setting described in the next paragraph.
0418
0419 ```python
0420 process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0421 ...
0422 alpaka = cms.untracked.PSet(
0423 backend = cms.untracked.string("serial_sync") # or "cuda_async" or "rocm_async"
0424 )
0425 )
0426 ```
0427
0428 ##### For all Alpaka modules
0429
0430 The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This `ProcessAcceleratorAlpaka` setting can be further overridden for individual modules as described in the previous paragraph.
0431
0432 ```python
0433 process.ProcessAcceleratorAlpaka.setBackend("serial_sync") # or "cuda_async" or "rocm_async"
0434 ```
0435
0436 ##### For entire job (i.e. also for non-Alpaka modules)
0437 ```python
0438 process.options.accelerators = ["cpu"] # or "gpu-nvidia" or "gpu-amd"
0439 ```
0440
0441
0442 ## Unit tests
0443
0444 Unit tests that depend on Alpaka and define `<flags ALPAKA_BACKENDS="1"/>`, e.g. as a binary along
0445 ```xml
0446 <bin name="<unique test binary name>" file="<comma-separated list of files">
0447 <use name="alpaka"/>
0448 <flags ALPAKA_BACKENDS="1"/>
0449 </bin>
0450 ```
0451 or as a command (e.g. `cmsRun` or a shell script) to run
0452
0453 ```xml
0454 <test name="<unique name of the test>" command="<command to run>">
0455 <use name="alpaka"/>
0456 <flags ALPAKA_BACKENDS="1"/>
0457 </test>
0458 ```
0459
0460 will be run as part of `scram build runtests` according to the
0461 availability of the hardware:
0462 - `serial_sync` version is run always
0463 - `cuda_async` version is run if NVIDIA GPU is present (i.e. `cudaIsEnabled` returns 0)
0464 - `rocm_async` version is run if AMD GPU is present (i.e. `rocmIsEnabled` returns 0)
0465
0466 Tests for specific backend (or hardware) can be explicitly specified to be run by setting `USER_UNIT_TESTS=cuda` or `USER_UNIT_TESTS=rocm` environment variable. Tests not depending on the hardware are skipped. If the corresponding hardware is not available, the tests will fail.
0467