Warning, /HeterogeneousCore/AlpakaCore/README.md is written in an unsupported language. File is not indexed.
0001 # Alpaka algorithms and modules in CMSSW
0002
0003 ## Introduction
0004
0005 This page documents the Alpaka integration within CMSSW. For more information about Alpaka itself see the [Alpaka documentation](https://alpaka.readthedocs.io/en/latest/).
0006
0007 ### Compilation model
0008
0009 The code in `Package/SubPackage/{interface,src,plugins,test}/alpaka` is compiled once for each enabled Alpaka backend. The `ALPAKA_ACCELERATOR_NAMESPACE` macro is substituted with a concrete, backend-specific namespace name in order to guarantee different symbol names for all backends, that allows for `cmsRun` to dynamically load any set of the backend libraries.
0010
0011 The source files with `.dev.cc` suffix are compiled with the backend-specific device compiler. The other `.cc` source files are compiled with the host compiler.
0012
0013 The `BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="1"/>` to enable the behavior described above.
0014
0015 ## Overall guidelines
0016
0017 * Minimize explicit blocking synchronization calls
0018 * Avoid `alpaka::wait()`, non-cached memory buffer allocations
0019 * If you can, use `global::EDProducer` base class
0020 * If you need per-stream storage
0021 * For few objects consider using [`edm::StreamCache<T>`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface#edm_StreamCacheT) with the global module, or
0022 * Use `stream::EDProducer`
0023 * If you need to transfer some data back to host, use `stream::SynchronizingEDProducer`
0024 * All code using `ALPAKA_ACCELERATOR_NAMESPACE` should be placed in `Package/SubPackage/{interface,src,plugins,test}/alpaka` directory
0025 * Alpaka-dependent code that uses templates instead of the namespace macro can be placed in `Package/SubPackage/interface` directory
0026 * All source files (not headers) using Alpaka device code (such as kernel call, functions called by kernels) must have a suffic `.dev.cc`, and be placed in the aforementioned `alpaka` subdirectory
0027 * Any code that `#include`s a header from the framework or from the `HeterogeneousCore/AlpakaCore` must be separated from the Alpaka device code, and have the usual `.cc` suffix.
0028 * Some framework headers are allowed to be used in `.dev.cc` files:
0029 * Any header containing only macros, e.g. `FWCore/Utilities/interface/CMSUnrollLoop.h`, `FWCore/Utilities/interface/stringize.h`
0030 * `FWCore/Utilities/interface/Exception.h`
0031 * `FWCore/MessageLogger/interface/MessageLogger.h`, although it is preferred to issue messages only in the `.cc` files
0032 * `HeterogeneousCore/AlpakaCore/interface/EventCache.h` and `HeterogeneousCore/AlpakaCore/interface/QueueCache.h` can, in principle, be used in `.dev.cc` files, even if there should be little need to use them explicitly
0033
0034 ## Data formats
0035
0036 Data formats, for both Event and EventSetup, should be placed following their usual rules. The Alpaka-specific conventions are
0037 * There must be a host-only flavor of the data format that is either independent of Alpaka, or depends only on Alpaka's Serial backend
0038 * The host-only data format must be defined in `Package/SubPackage/interface/` directory
0039 * If the data format is to be serialized (with ROOT), it must be serialized in a way that the on-disk format does not depend on Alpaka, i.e. it can be read without Alpaka
0040 * For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/classes{.h,_def.xml}`
0041 * As usual, the `classes_def.xml` should declare the dictionaries for the data product type `T` and `edm::Wrapper<T>`. These data products can be declared as persistent (default) or transient (`persistent="false"` attribute).
0042 * For EventSetup data products [the registration macro `TYPELOOKUP_DATA_REG`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToRegisterESData) should be placed in `Package/SubPackage/src/ES_<type name>.cc`.
0043 * The device-side data formats are defined in `Package/SubPackage/interface/alpaka/` directory
0044 * The device-side data format classes should be either templated over the device type, or defined in the `ALPAKA_ACCELERATOR_NAMESPACE` namespace.
0045 * For host backends (`serial`), the "device-side" data format class must be the same as the aforementioned host-only data format class
0046 * Use `ASSERT_DEVICE_MATCHES_HOST_COLLECTION(<device collection type>, <host collection type>);` macro to ensure that, see an example in [../../DataFormats/PortableTestObjects/interface/alpaka/TestDeviceCollection.h](TestDeviceCollection.h)
0047 * This equality is necessary for the [implicit data transfers](#implicit-data-transfers) to function properly
0048 * For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/alpaka/classes_<platform>{.h,_def.xml}`
0049 * The `classes_<platform>_def.xml` should declare the dictionaries for the data product type `T`, `edm::DeviceProduct<T>`, and `edm::Wrapper<edm::DeviceProduct<T>>`. All these dictionaries must be declared as transient with `persistent="false"` attribute.
0050 * The list of `<platform>` includes currently: `cuda`, `rocm`
0051 * For EventSetup data products the registration macro should be placed in `Package/SubPackage/src/alpaka/ES_<type name>.cc`
0052 * Data products defined in `ALPAKA_ACCELERATOR_NAMESPACE` should use `TYPELOOKUP_ALPAKA_DATA_REG` macro
0053 * Data products templated over the device type should use `TYPELOOKUP_ALPAKA_TEMPLATED_DATA_REG` macro
0054 * For Event data products the `DataFormats/SubPackage/BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="!serial"/>`
0055 * unless the package has something that is really specific for `serial` backend that is not generally applicable on host
0056
0057 Note that even if for Event data formats the examples above used `DataFormats` package, Event data formats are allowed to be defined in other packages too in some circumstances. For full details please see [SWGuideCreatingNewProducts](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts).
0058
0059 ### Implicit data transfers
0060
0061 Both EDProducers and ESProducers make use of implicit data transfers. In CPU backends these data transfers are omitted, and the host-side and the "device-side" data products are the same.
0062
0063 #### Data copy definitions
0064
0065 The implicit host-to-device and device-to-host copies rely on specialization of `cms::alpakatools::CopyToDevice` and `cms::alpakatools::CopyToHost` class templates, respectively. These have to be specialized along
0066 ```cpp
0067 #include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
0068
0069 namespace cms::alpakatools {
0070 template<>
0071 struct CopyToDevice<TSrc> {
0072 template <typename TQueue>
0073 requires alpaka::isQueue<TQueue>
0074 static auto copyAsync(TQueue& queue, TSrc const& hostProduct) -> TDst {
0075 // code to construct TDst object, and launch the asynchronous memcpy from the host to the device of TQueue
0076 return ...;
0077 }
0078 };
0079 }
0080 ```
0081 or
0082 ```cpp
0083 #include "HeterogeneousCore/AlpakaInterface/interface/CopyToHost.h"
0084
0085 namespace cms::alpakatools {
0086 template <>
0087 struct CopyToHost<TSrc> {
0088 template <typename TQueue>
0089 requires alpaka::isQueue<TQueue>
0090 static auto copyAsync(TQueue& queue, TSrc const& deviceProduct) -> TDst {
0091 // code to construct TDst object, and launch the asynchronous memcpy from the device of TQueue to the host
0092 return ...;
0093 }
0094 };
0095 }
0096 ```
0097 respectively.
0098
0099 Note that the destination (device-side/host-side) type `TDst` can be different from or the same as the source (host-side/device-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer in a non-blocking way.
0100
0101 Both `CopyToDevice` and `CopyToHost` class templates are partially specialized for all `PortableObject` and `PortableCollection` instantiations.
0102
0103
0104 ##### Data products with `memcpy()`ed pointers
0105
0106 If the data product in question contains pointers to memory elsewhere within the data product, after the `alpaka::memcpy()` calls in the `copyAsync()` those pointers still point to device memory, and need to be updated. **Such data products are generally discouraged.** Nevertheless, such pointers can be updated without any additional synchronization by implementing a `postCopy()` function in the `CopyToHost` specialization along (extending the `CopyToHost` example [above](#data-copy-definitions))
0107 ```cpp
0108 namespace cms::alpakatools {
0109 template <>
0110 struct CopyToHost<TSrc> {
0111 // copyAsync() definition from above
0112
0113 static void postCopy(TDst& obj) {
0114 // modify obj
0115 // any modifications must be such that the postCopy() can be
0116 // skipped when the obj originates from the host (i.e. on CPU backends)
0117 }
0118 };
0119 }
0120 ```
0121 The `postCopy()` is called after the operations enqueued in the `copyAsync()` have finished. The code in `postCopy()` must be such that the call to `postCopy()` can be omitted on CPU backends.
0122
0123 Note that for `CopyToDevice` such `postCopy()` functionality is **not** provided. It should be possible to a issue kernel call from the `CopyToDevice::copyAsync()` function to achieve the same effect.
0124
0125
0126 #### EDProducer
0127
0128 In EDProducers for each device-side data product a transfer from the device memory space to the host memory space is registered automatically. The data product is copied only if the job has another EDModule that consumes the host-side data product. For each device-side data product a specialization of `cms::alpakatools::CopyToHost` is required to exist.
0129
0130 In addition, for each host-side data product a transfer from the host memory space to the device meory space is registered autmatically **if** a `cms::alpakatools::CopyToDevice` specialization exists. The data product is copied only if the job has another EDModule that consumes the device-side data product.
0131
0132 #### ESProducer
0133
0134 In ESProducers for each host-side data product a transfer from the host memory space to the device memory space (of the backend of the ESProducer) is registered automatically. The data product is copied only if the job has another ESProducer or EDModule that consumes the device-side data product. For each host-side data product a specialization of `cms::alpakatools::CopyToDevice` is required to exist.
0135
0136 ### `PortableCollection`
0137
0138 For more information see [`DataFormats/Portable/README.md`](../../DataFormats/Portable/README.md) and [`DataFormats/SoATemplate/README.md`](../../DataFormats/SoATemplate/README.md).
0139
0140
0141 ## Modules
0142
0143 ### Base classes
0144
0145 The Alpaka-based EDModules should use one of the following base classes (that are defined in the `ALPAKA_ACCELERATOR_NAMESPACE`):
0146
0147 * `global::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"`)
0148 * A [global EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface) that launches (possibly) asynchronous work
0149 * `stream::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/EDProducer.h"`)
0150 * A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that launches (possibly) asynchronous work
0151 * `stream::SynchronizingEDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/SynchronizingEDProducer.h"`)
0152 * A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that may launch (possibly) asynchronous work, and synchronizes the asynchronous work on the device with the host
0153 * The base class uses the [`edm::ExternalWork`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface#edm_ExternalWork) for the non-blocking synchronization
0154
0155 The `...` can in principle be any of the module abilities listed in the linked TWiki pages, except the `edm::ExternalWork`. The majority of the Alpaka EDProducers should be `global::EDProducer` or `stream::EDProducer`, with `stream::SynchronizingEDProducer` used only in cases where some data to be copied from the device to the host, that requires synchronization, for different reason than copying an Event data product from the device to the host.
0156
0157 New base classes (or other functionality) can be added based on new use cases that come up.
0158
0159 The Alpaka-based ESProducers should use the `ESProducer` base class (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESProducer.h"`).
0160
0161 Note that both the Alpaka-based EDProducer and ESProducer constructors must pass the argument `edm::ParameterSet` object to the constructor of their base class.
0162
0163 Note that currently Alpaka-based ESSources are not supported. If you need to produce EventSetup data products into a Record for which there is no ESSource yet, use [`EmptyESSource`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMParametersForModules#EmptyESSource).
0164
0165
0166 ### Event, EventSetup, Records
0167
0168 The Alpaka-based modules have a notion of a _host memory space_ and _device memory space_ for the Event and EventSetup data products. The data products in the host memory space are accessible for non-Alpaka modules, whereas the data products in device memory space are available only for modules of the specific Alpaka backend. The host backend(s) use the host memory space directly.
0169
0170 The EDModules get `device::Event` and `device::EventSetup` from the framework, from which data products in both host memory space and device memory space can be accessed. Data products can also be produced to either memory space. As discussed [above](#edproducer), for each data product produced into the device memory space an implicit data copy from the device memory space to the host memory space is registered, and for each data produced produced into the host memory space for which `cms::alpakatools::CopyToDevice` is specialized an implicit data copy from the host memory space to the device memory space is registered. The `device::Event::queue()` returns the Alpaka `Queue` object into which all work in the EDModule must be enqueued.
0171
0172 The ESProducer can have two different `produce()` function signatures
0173 * If the function has the usual `TRecord const&` parameter, the function can read an ESProduct from the host memory space, and produce another product into the host memory space. An implicit copy of the data product from the host memory space to the device memory space (of the backend of the ESProducer) is registered as discussed above.
0174 * If the function has `device::Record<TRecord> const&` parameter, the function can read an ESProduct from the device memory space, and produce another product into the device memory space. No further copies are made by the framework. The `device::Record<TRecord>::queue()` gives the Alpaka `Queue` object into which all work in the ESProducer must be enqueued.
0175
0176 ### Tokens
0177
0178 The memory spaces of the consumed and (in EDProducer case) produced data products are driven by the tokens. The token types to be used in different cases are summarized below.
0179
0180
0181 | | Host memory space | Device memory space |
0182 |----------------------------------------------------------------|-------------------------------|----------------------------------|
0183 | Access Event data product of type `T` | `edm::EDGetTokenT<T>` | `device::EDGetToken<T>` |
0184 | Produce Event data product of type `T` | `edm::EDPutTokenT<T>` | `device::EDPutToken<T>` |
0185 | Access EventSetup data product of type `T` in Record `TRecord` | `edm::ESGetToken<T, TRecord>` | `device::ESGetToken<T, TRecord>` |
0186
0187 With the device memory space tokens the type-deducing `consumes()`, `produces()`, and `esConsumes()` calls must be used (i.e. do not specify the data product type as part of the function call). For more information on these registration functions see
0188 * [`consumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMGetDataFromEvent#consumes)
0189 * [`produces()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts#Producing_the_EDProduct)
0190 * [`esConsumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ED_module)
0191 * [`consumes()` in ESProducers](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ESProducer)
0192
0193
0194 ### `fillDescriptions()`
0195
0196 In the [`fillDescriptions()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp) function specifying the module label automatically with the [`edm::ConfigurationDescriptions::addWithDefaultLabel()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp#Automatic_module_labels_from_plu) is strongly recommended. Currently a `cfi` file is generated for a module for each Alpaka backend such that the backend namespace is explicitly used in the module definition. An additional `cfi` file is generated for the ["module type resolver"](#module-type-resolver-portable) functionality, where the module type has `@alpaka` postfix.
0197
0198 Also note that the `fillDescription()` function must have the same content for all backends, i.e. any backend-specific behavior with e.g. `#ifdef` or `if constexpr` are forbidden.
0199
0200 ### Copy e.g. configuration data to all devices in EDProducer
0201
0202 While the EventSetup can be used to handle copying data to all devices
0203 of an Alpaka backend, for data used only by one EDProducer a simpler
0204 way would be to use one of
0205 * `cms::alpakatools::MoveToDeviceCache<TDevice, THostObject>` (recommended)
0206 * `#include "HeterogeneousCore/AlpakaCore/interface/MoveToDeviceCache.h"`
0207 * Moves the `THostObject` to all devices using `cms::alpakatools::CopyToDevice<THostObject>` synchronously. On host backends the argument `THostObject` is moved around, but not copied.
0208 * The `THostObject` must not be copyable
0209 * This is to avoid easy mistakes with objects that follow copy semantics of `std::shared_ptr` (that includes Alpaka buffers), that would allow the source memory buffer to be used via another copy during the asynchronous data copy to the device.
0210 * The constructor argument `THostObject` object may not be used, unless it is initialized again e.g. by assigning another `THostObject` into it.
0211 * The corresponding device-side object can be obtained with `get()` member function using either alpaka Device or Queue object. It can be used immediately after the constructor returns.
0212 * `cms::alpakatools::CopyToDeviceCache<TDevice, THostObject>` (use only if **must** use copyable `THostObject`)
0213 * `#include "HeterogeneousCore/AlpakaCore/interface/CopyToDeviceCache.h"`
0214 * Copies the `THostObject` to all devices using `cms::alpakatools::CopyToDevice<THostObject>` synchronously. Also host backends do a copy.
0215 * The constructor argument `THostObject` object can be used for other purposes immediately after the constructor returns
0216 * The corresponding device-side object can be obtained with `get()` member function using either alpaka Device or Queue object. It can be used immediately after the constructor returns.
0217
0218 For examples see [`HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerCopyToDeviceCache.cc`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerCopyToDeviceCache.cc) and [`HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerMoveToDeviceCache.cc`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerMoveToDeviceCache.cc).
0219
0220 ## Guarantees
0221
0222 * All Event data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed through the `device::Event`.
0223 * All Event data products in the host memory space are guaranteed to be accessible for all operations (after the data product has been obtained from the `edm::Event` or `device::Event`).
0224 * All EventSetup data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed via the `device::EventSetup` (ED modules), or by `device::Record<TRecord>::queue()` when accessed via the `device::Record<TRecord>` (ESProducers).
0225 * The EDM Stream does not proceed to the next Event until after all asynchronous work of the current Event has finished.
0226 * **Note**: this implies if an EDProducer in its `produce()` function uses the `Event::queue()` or gets a device-side data product, and does not produce any device-side data products, the `produce()` call will be synchronous (i.e. will block the CPU thread until the asynchronous work finishes)
0227
0228 ## Examples
0229
0230 For concrete examples see code in [`HeterogeneousCore/AlpakaTest`](../../HeterogeneousCore/AlpakaTest) and [`DataFormats/PortableTestObjects`](../../DataFormats/PortableTestObjects).
0231
0232 ### EDProducer
0233
0234 This example shows a mixture of behavior from test code in [`HeterogeneousCore/AlpakaTest/plugins/alpaka/`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/)
0235 ```cpp
0236 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDGetToken.h"
0237 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDPutToken.h"
0238 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESGetToken.h"
0239 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/Event.h"
0240 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/EventSetup.h"
0241 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"
0242 #include "HeterogeneousCore/AlpakaInterface/interface/config.h"
0243 // + usual #includes for the used framework components, data format(s), record(s)
0244
0245 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0246 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0247
0248 // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0249 class ExampleAlpakaProducer : public global::EDProducer<> {
0250 public:
0251 ExampleAlpakaProducer(edm::ParameterSet const& iConfig)
0252 : EDProducer<>(iConfig),
0253 // produces() must not specify the product type, it is deduced from deviceToken_
0254 deviceToken_{produces()},
0255 size_{iConfig.getParameter<int32_t>("size")} {}
0256
0257 // device::Event and device::EventSetup are defined in ALPAKA_ACCELERATOR_NAMESPACE as well
0258 void produce(edm::StreamID sid, device::Event& iEvent, device::EventSetup const& iSetup) const override {
0259 // get input data products
0260 auto const& hostInput = iEvent.get(getTokenHost_);
0261 auto const& deviceInput = iEvent.get(getTokenDevice_);
0262 auto const& deviceESData = iSetup.getData(esGetTokenDevice_);
0263
0264 // run the algorithm, potentially asynchronously
0265 portabletest::TestDeviceCollection deviceProduct{size_, event.queue()};
0266 algo_.fill(event.queue(), hostInput, deviceInput, deviceESData, deviceProduct);
0267
0268 // put the asynchronous product into the event without waiting
0269 // must use EDPutToken with emplace() or put()
0270 //
0271 // for a product produced with device::EDPutToken<T> the base class registers
0272 // a separately scheduled transformation function for the copy to host
0273 // the transformation function calls
0274 // cms::alpakatools::CopyToDevice<portabletest::TestDeviceCollection>::copyAsync(Queue&, portabletest::TestDeviceCollection const&)
0275 // function
0276 event.emplace(deviceToken_, std::move(deviceProduct));
0277 }
0278
0279 static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0280 // All backends must have exactly the same fillDescriptions() content!
0281 edm::ParameterSetDescription desc;
0282 desc.add<int32_t>("size");
0283 descriptions.addWithDefaultLabel(desc);
0284 }
0285
0286 private:
0287 // use edm::EGetTokenT<T> to read from host memory space
0288 edm::EDGetTokenT<FooProduct> const getTokenHost_;
0289
0290 // use device::EDGetToken<T> to read from device memory space
0291 device::EDGetToken<BarProduct> const getTokenDevice_;
0292
0293 // use device::ESGetToken<T, TRecord> to read from device memory space
0294 device::ESGetToken<TestProduct, TestRecord> const esGetTokenDevice_;
0295
0296 // use device::EDPutToken<T> to place the data product in the device memory space
0297 device::EDPutToken<portabletest::TestDeviceCollection> const deviceToken_;
0298 int32_t const size_;
0299
0300 // implementation of the algorithm
0301 TestAlgo algo_;
0302 };
0303
0304 } // namespace ALPAKA_ACCELERATOR_NAMESPACE
0305
0306 #include "HeterogeneousCore/AlpakaCore/interface/MakerMacros.h"
0307 DEFINE_FWK_ALPAKA_MODULE(TestAlpakaProducer);
0308
0309 ```
0310
0311 ### ESProducer to reformat an existing ESProduct for use in device
0312
0313 ```cpp
0314 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0315 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0316
0317 // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0318 class ExampleAlpakaESProducer : public ESProducer {
0319 public:
0320 ExampleAlpakaESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
0321 // register the production function
0322 auto cc = setWhatProduced(this);
0323 // register consumed ESProduct(s)
0324 token_ = cc.consumes();
0325 }
0326
0327 static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0328 // All backends must have exactly the same fillDescriptions() content!
0329 edm::ParameterSetDescription desc;
0330 descriptions.addWithDefaultLabel(desc);
0331 }
0332
0333 // return type can be
0334 // - std::optional<T> (T is cheap to move),
0335 // - std::unique_ptr<T> (T is not cheap to move),
0336 // - std::shared_ptr<T> (allows sharing between IOVs)
0337 //
0338 // the base class registers a separately scheduled function to copy the product on device memory
0339 // the function calls
0340 // cms::alpakatools::CopyToDevice<SimpleProduct>::copyAsync(Queue&, SimpleProduct const&)
0341 // function
0342 std::optional<SimpleProduct> produce(TestRecord const& iRecord) {
0343 // get input data
0344 auto const& hostInput = iRecord.get(token_);
0345
0346 // allocate data product on the host memory
0347 SimpleProduct hostProduct;
0348
0349 // fill the hostProduct from hostInput
0350
0351 return hostProduct;
0352 }
0353
0354 private:
0355 edm::ESGetToken<TestProduct, TestRecord> token_;
0356 };
0357 } // namespace ALPAKA_ACCELERATOR_NAMESPACE
0358
0359 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
0360 DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaESProducer);
0361 ```
0362
0363 ### ESProducer to derive a new ESProduct from an existing device-side ESProduct
0364
0365 ```cpp
0366 // Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
0367 namespace ALPAKA_ACCELERATOR_NAMESPACE {
0368
0369 // Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
0370 class ExampleAlpakaDeriveESProducer : public ESProducer {
0371 public:
0372 ExampleAlpakaDeriveESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
0373 // register the production function
0374 auto cc = setWhatProduced(this);
0375 // register consumed ESProduct(s)
0376 token_ = cc.consumes();
0377 }
0378
0379 static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
0380 // All backends must have exactly the same fillDescriptions() content!
0381 edm::ParameterSetDescription desc;
0382 descriptions.addWithDefaultLabel(desc);
0383 }
0384
0385 std::optional<OtherProduct> produce(device::Record<TestRecord> const& iRecord) {
0386 // get input data in the device memory space
0387 auto const& deviceInput = iRecord.get(token_);
0388
0389 // allocate data product on the device memory
0390 OtherProduct deviceProduct(iRecord.queue());
0391
0392 // run the algorithm, potentially asynchronously
0393 algo_.fill(iRecord.queue(), deviceInput, deviceProduct);
0394
0395 // return the product without waiting
0396 return deviceProduct;
0397 }
0398
0399 private:
0400 device::ESGetToken<SimpleProduct, TestRecord> token_;
0401
0402 OtherAlgo algo_;
0403 };
0404
0405 #include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
0406 DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaDeviceESProducer);
0407 ```
0408
0409 ## Configuration
0410
0411 There are a few different options for using Alpaka-based modules in the CMSSW configuration.
0412
0413 In all cases the configuration must load the necessary `ProcessAccelerator` objects (see below) For accelerators used in production, these are aggregated in `Configuration.StandardSequences.Accelerators_cff`. The `runTheMatrix.py` handles the loading of this `Accelerators_cff` automatically. The HLT menus also load the necessary `ProcessAccelerator`s.
0414 ```python
0415 ## Load explicitly
0416 # One ProcessAccelerator for each accelerator technology, plus a generic one for Alpaka
0417 process.load("Configuration.StandardSequences.Accelerators_cff")
0418 ```
0419
0420 ### Explicit module type (non-portable)
0421
0422 The Alpaka modules can be used in the python configuration with their explicit, full type names
0423 ```python
0424 process.producerCPU = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...)
0425 process.producerGPU = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
0426 ```
0427 Obviously this kind of configuration can be run only on machines that provide the necessary hardware. The configuration is thus explicitly non-portable.
0428
0429
0430 ### SwitchProducerCUDA (semi-portable)
0431
0432 A step towards a portable configuration is to use the `SwitchProcucer` mechanism, for which currently the only concrete implementation is [`SwitchProducerCUDA`](../../HeterogeneousCore/CUDACore/README.md#automatic-switching-between-cpu-and-gpu-modules). The modules for different Alpaka backends still need to be specified explicitly
0433 ```python
0434 from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
0435 process.producer = SwitchProducerCUDA(
0436 cpu = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...),
0437 cuda = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
0438 )
0439
0440 # or
0441
0442 process.producer = SwitchProducerCUDA(
0443 cpu = cms.EDAlias(producerCPU = cms.EDAlias.allProducts(),
0444 cuda = cms.EDAlias(producerGPU = cms.EDAlias.allProducts()
0445 )
0446 ```
0447 This kind of configuration can be run on any machine (a given CMSSW build supports), but is limited to CMSSW builds where the modules for all the Alpaka backends declared in the configuration can be built (`alpaka_serial_sync` and `alpaka_cuda_async` in this example). Therefore the `SwitchProducer` approach is here called "semi-portable".
0448
0449 ### Module type resolver (portable)
0450
0451 A fully portable way to express a configuration can be achieved with "module type resolver" approach. The module is specified in the configuration without the backend-specific namespace, and with `@alpaka` postfix
0452 ```python
0453 process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka", ...)
0454
0455 # backend can also be set explicitly
0456 process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0457 ...
0458 alpaka = cms.untracked.PSet(
0459 backend = cms.untracked.string("serial_sync")
0460 )
0461 )
0462 ```
0463 The `@alpaka` postfix in the module type tells the system the module's exact class type should be resolved at run time. The type (or backend) is set according to the value of `process.options.accelerators` and the set of accelerators available in the machine. If the backend is set explicitly in the module's `alpaka` PSet, the module of that backend will be used.
0464
0465 This approach is portable also across CMSSW builds that support different sets of accelerators, as long as only the host backends (if any) are specified explicitly in the `alpaka` PSet.
0466
0467
0468 #### Examples on explicitly setting the backend
0469
0470 ##### For individual modules
0471
0472 The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This setting overrides the `ProcessAcceleratorAlpaka` setting described in the next paragraph.
0473
0474 ```python
0475 process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0476 ...
0477 alpaka = cms.untracked.PSet(
0478 backend = cms.untracked.string("serial_sync") # or "cuda_async" or "rocm_async"
0479 )
0480 )
0481 ```
0482
0483 ##### For all Alpaka modules
0484
0485 The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This `ProcessAcceleratorAlpaka` setting can be further overridden for individual modules as described in the previous paragraph.
0486
0487 ```python
0488 process.ProcessAcceleratorAlpaka.setBackend("serial_sync") # or "cuda_async" or "rocm_async"
0489 ```
0490
0491 ##### For entire job (i.e. also for non-Alpaka modules)
0492 ```python
0493 process.options.accelerators = ["cpu"] # or "gpu-nvidia" or "gpu-amd"
0494 ```
0495
0496 ### Blocking synchronization (for testing)
0497
0498 While the general approach is to favor asynchronous operations with non-blocking synchronization, for testing purposes it can be useful to synchronize the EDModule's `acquire()` / `produce()` or ESProducer's production functions in a blocking way. Such a blocking synchronization can be specified for individual modules via the `alpaka` `PSet` along
0499 ```python
0500 process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka",
0501 ...
0502 alpaka = cms.untracked.PSet(
0503 synchronize = cms.untracked.bool(True)
0504 )
0505 )
0506 ```
0507
0508 The blocking synchronization can be specified for all Alpaka modules via the `ProcessAcceleratorAlpaka` along
0509 ```python
0510 process.ProcessAcceleratorAlpaka.setSynchronize(True)
0511 ```
0512 Note that the possible per-module parameter overrides this global setting.
0513
0514
0515 ## Unit tests
0516
0517 Unit tests that depend on Alpaka and define `<flags ALPAKA_BACKENDS="1"/>`, e.g. as a binary along
0518 ```xml
0519 <bin name="<unique test binary name>" file="<comma-separated list of files">
0520 <use name="alpaka"/>
0521 <flags ALPAKA_BACKENDS="1"/>
0522 </bin>
0523 ```
0524 or as a command (e.g. `cmsRun` or a shell script) to run
0525
0526 ```xml
0527 <test name="<unique name of the test>" command="<command to run>">
0528 <use name="alpaka"/>
0529 <flags ALPAKA_BACKENDS="1"/>
0530 </test>
0531 ```
0532
0533 will be run as part of `scram build runtests` according to the
0534 availability of the hardware:
0535 - `serial_sync` version is run always
0536 - `cuda_async` version is run if NVIDIA GPU is present (i.e. `cudaIsEnabled` returns 0)
0537 - `rocm_async` version is run if AMD GPU is present (i.e. `rocmIsEnabled` returns 0)
0538
0539 Tests for specific backend (or hardware) can be explicitly specified to be run by setting `USER_UNIT_TESTS=cuda` or `USER_UNIT_TESTS=rocm` environment variable. Tests not depending on the hardware are skipped. If the corresponding hardware is not available, the tests will fail.
0540