1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
|
# Alpaka algorithms and modules in CMSSW
## Introduction
This page documents the Alpaka integration within CMSSW. For more information about Alpaka itself see the [Alpaka documentation](https://alpaka.readthedocs.io/en/latest/).
### Compilation model
The code in `Package/SubPackage/{interface,src,plugins,test}/alpaka` is compiled once for each enabled Alpaka backend. The `ALPAKA_ACCELERATOR_NAMESPACE` macro is substituted with a concrete, backend-specific namespace name in order to guarantee different symbol names for all backends, that allows for `cmsRun` to dynamically load any set of the backend libraries.
The source files with `.dev.cc` suffix are compiled with the backend-specific device compiler. The other `.cc` source files are compiled with the host compiler.
The `BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="1"/>` to enable the behavior described above.
## Overall guidelines
* Minimize explicit blocking synchronization calls
* Avoid `alpaka::wait()`, non-cached memory buffer allocations
* If you can, use `global::EDProducer` base class
* If you need per-stream storage
* For few objects consider using [`edm::StreamCache<T>`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface#edm_StreamCacheT) with the global module, or
* Use `stream::EDProducer`
* If you need to transfer some data back to host, use `stream::SynchronizingEDProducer`
* All code using `ALPAKA_ACCELERATOR_NAMESPACE` should be placed in `Package/SubPackage/{interface,src,plugins,test}/alpaka` directory
* Alpaka-dependent code that uses templates instead of the namespace macro can be placed in `Package/SubPackage/interface` directory
* All source files (not headers) using Alpaka device code (such as kernel call, functions called by kernels) must have a suffic `.dev.cc`, and be placed in the aforementioned `alpaka` subdirectory
* Any code that `#include`s a header from the framework or from the `HeterogeneousCore/AlpakaCore` must be separated from the Alpaka device code, and have the usual `.cc` suffix.
* Some framework headers are allowed to be used in `.dev.cc` files:
* Any header containing only macros, e.g. `FWCore/Utilities/interface/CMSUnrollLoop.h`, `FWCore/Utilities/interface/stringize.h`
* `FWCore/Utilities/interface/Exception.h`
* `FWCore/MessageLogger/interface/MessageLogger.h`, although it is preferred to issue messages only in the `.cc` files
* `HeterogeneousCore/AlpakaCore/interface/EventCache.h` and `HeterogeneousCore/AlpakaCore/interface/QueueCache.h` can, in principle, be used in `.dev.cc` files, even if there should be little need to use them explicitly
## Data formats
Data formats, for both Event and EventSetup, should be placed following their usual rules. The Alpaka-specific conventions are
* There must be a host-only flavor of the data format that is either independent of Alpaka, or depends only on Alpaka's Serial backend
* The host-only data format must be defined in `Package/SubPackage/interface/` directory
* If the data format is to be serialized (with ROOT), it must be serialized in a way that the on-disk format does not depend on Alpaka, i.e. it can be read without Alpaka
* For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/classes{.h,_def.xml}`
* As usual, the `classes_def.xml` should declare the dictionaries for the data product type `T` and `edm::Wrapper<T>`. These data products can be declared as persistent (default) or transient (`persistent="false"` attribute).
* For EventSetup data products [the registration macro `TYPELOOKUP_DATA_REG`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToRegisterESData) should be placed in `Package/SubPackage/src/ES_<type name>.cc`.
* The device-side data formats are defined in `Package/SubPackage/interface/alpaka/` directory
* The device-side data format classes should be either templated over the device type, or defined in the `ALPAKA_ACCELERATOR_NAMESPACE` namespace.
* For host backends (`serial`), the "device-side" data format class must be the same as the aforementioned host-only data format class
* Use `ASSERT_DEVICE_MATCHES_HOST_COLLECTION(<device collection type>, <host collection type>);` macro to ensure that, see an example in [../../DataFormats/PortableTestObjects/interface/alpaka/TestDeviceCollection.h](TestDeviceCollection.h)
* This equality is necessary for the [implicit data transfers](#implicit-data-transfers) to function properly
* For Event data products the ROOT dictionary should be defined in `DataFormats/SubPackage/src/alpaka/classes_<platform>{.h,_def.xml}`
* The `classes_<platform>_def.xml` should declare the dictionaries for the data product type `T`, `edm::DeviceProduct<T>`, and `edm::Wrapper<edm::DeviceProduct<T>>`. All these dictionaries must be declared as transient with `persistent="false"` attribute.
* The list of `<platform>` includes currently: `cuda`, `rocm`
* For EventSetup data products the registration macro should be placed in `Package/SubPackage/src/alpaka/ES_<type name>.cc`
* Data products defined in `ALPAKA_ACCELERATOR_NAMESPACE` should use `TYPELOOKUP_ALPAKA_DATA_REG` macro
* Data products templated over the device type should use `TYPELOOKUP_ALPAKA_TEMPLATED_DATA_REG` macro
* For Event data products the `DataFormats/SubPackage/BuildFile.xml` must contain `<flags ALPAKA_BACKENDS="!serial"/>`
* unless the package has something that is really specific for `serial` backend that is not generally applicable on host
Note that even if for Event data formats the examples above used `DataFormats` package, Event data formats are allowed to be defined in other packages too in some circumstances. For full details please see [SWGuideCreatingNewProducts](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts).
### Implicit data transfers
Both EDProducers and ESProducers make use of implicit data transfers. In CPU backends these data transfers are omitted, and the host-side and the "device-side" data products are the same.
#### Data copy definitions
The implicit host-to-device and device-to-host copies rely on specialization of `cms::alpakatools::CopyToDevice` and `cms::alpakatools::CopyToHost` class templates, respectively. These have to be specialized along
```cpp
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
namespace cms::alpakatools {
template<>
struct CopyToDevice<TSrc> {
template <typename TQueue>
requires alpaka::isQueue<TQueue>
static auto copyAsync(TQueue& queue, TSrc const& hostProduct) -> TDst {
// code to construct TDst object, and launch the asynchronous memcpy from the host to the device of TQueue
return ...;
}
};
}
```
or
```cpp
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToHost.h"
namespace cms::alpakatools {
template <>
struct CopyToHost<TSrc> {
template <typename TQueue>
requires alpaka::isQueue<TQueue>
static auto copyAsync(TQueue& queue, TSrc const& deviceProduct) -> TDst {
// code to construct TDst object, and launch the asynchronous memcpy from the device of TQueue to the host
return ...;
}
};
}
```
respectively.
Note that the destination (device-side/host-side) type `TDst` can be different from or the same as the source (host-side/device-side) type `TSrc` as far as the framework is concerned. For example, in the `PortableCollection` model the types are different. The `copyAsync()` member function is easiest to implement as a template over `TQueue`. The framework handles the necessary synchronization between the copy function and the consumer in a non-blocking way.
Both `CopyToDevice` and `CopyToHost` class templates are partially specialized for all `PortableObject` and `PortableCollection` instantiations.
##### Data products with `memcpy()`ed pointers
If the data product in question contains pointers to memory elsewhere within the data product, after the `alpaka::memcpy()` calls in the `copyAsync()` those pointers still point to device memory, and need to be updated. **Such data products are generally discouraged.** Nevertheless, such pointers can be updated without any additional synchronization by implementing a `postCopy()` function in the `CopyToHost` specialization along (extending the `CopyToHost` example [above](#data-copy-definitions))
```cpp
namespace cms::alpakatools {
template <>
struct CopyToHost<TSrc> {
// copyAsync() definition from above
static void postCopy(TDst& obj) {
// modify obj
// any modifications must be such that the postCopy() can be
// skipped when the obj originates from the host (i.e. on CPU backends)
}
};
}
```
The `postCopy()` is called after the operations enqueued in the `copyAsync()` have finished. The code in `postCopy()` must be such that the call to `postCopy()` can be omitted on CPU backends.
Note that for `CopyToDevice` such `postCopy()` functionality is **not** provided. It should be possible to a issue kernel call from the `CopyToDevice::copyAsync()` function to achieve the same effect.
#### EDProducer
In EDProducers for each device-side data product a transfer from the device memory space to the host memory space is registered automatically. The data product is copied only if the job has another EDModule that consumes the host-side data product. For each device-side data product a specialization of `cms::alpakatools::CopyToHost` is required to exist.
In addition, for each host-side data product a transfer from the host memory space to the device meory space is registered autmatically **if** a `cms::alpakatools::CopyToDevice` specialization exists. The data product is copied only if the job has another EDModule that consumes the device-side data product.
#### ESProducer
In ESProducers for each host-side data product a transfer from the host memory space to the device memory space (of the backend of the ESProducer) is registered automatically. The data product is copied only if the job has another ESProducer or EDModule that consumes the device-side data product. For each host-side data product a specialization of `cms::alpakatools::CopyToDevice` is required to exist.
### `PortableCollection`
For more information see [`DataFormats/Portable/README.md`](../../DataFormats/Portable/README.md) and [`DataFormats/SoATemplate/README.md`](../../DataFormats/SoATemplate/README.md).
## Modules
### Base classes
The Alpaka-based EDModules should use one of the following base classes (that are defined in the `ALPAKA_ACCELERATOR_NAMESPACE`):
* `global::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"`)
* A [global EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkGlobalModuleInterface) that launches (possibly) asynchronous work
* `stream::EDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/EDProducer.h"`)
* A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that launches (possibly) asynchronous work
* `stream::SynchronizingEDProducer<...>` (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/stream/SynchronizingEDProducer.h"`)
* A [stream EDProducer](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface) that may launch (possibly) asynchronous work, and synchronizes the asynchronous work on the device with the host
* The base class uses the [`edm::ExternalWork`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/FWMultithreadedFrameworkStreamModuleInterface#edm_ExternalWork) for the non-blocking synchronization
The `...` can in principle be any of the module abilities listed in the linked TWiki pages, except the `edm::ExternalWork`. The majority of the Alpaka EDProducers should be `global::EDProducer` or `stream::EDProducer`, with `stream::SynchronizingEDProducer` used only in cases where some data to be copied from the device to the host, that requires synchronization, for different reason than copying an Event data product from the device to the host.
New base classes (or other functionality) can be added based on new use cases that come up.
The Alpaka-based ESProducers should use the `ESProducer` base class (`#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESProducer.h"`).
Note that both the Alpaka-based EDProducer and ESProducer constructors must pass the argument `edm::ParameterSet` object to the constructor of their base class.
Note that currently Alpaka-based ESSources are not supported. If you need to produce EventSetup data products into a Record for which there is no ESSource yet, use [`EmptyESSource`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMParametersForModules#EmptyESSource).
### Event, EventSetup, Records
The Alpaka-based modules have a notion of a _host memory space_ and _device memory space_ for the Event and EventSetup data products. The data products in the host memory space are accessible for non-Alpaka modules, whereas the data products in device memory space are available only for modules of the specific Alpaka backend. The host backend(s) use the host memory space directly.
The EDModules get `device::Event` and `device::EventSetup` from the framework, from which data products in both host memory space and device memory space can be accessed. Data products can also be produced to either memory space. As discussed [above](#edproducer), for each data product produced into the device memory space an implicit data copy from the device memory space to the host memory space is registered, and for each data produced produced into the host memory space for which `cms::alpakatools::CopyToDevice` is specialized an implicit data copy from the host memory space to the device memory space is registered. The `device::Event::queue()` returns the Alpaka `Queue` object into which all work in the EDModule must be enqueued.
The ESProducer can have two different `produce()` function signatures
* If the function has the usual `TRecord const&` parameter, the function can read an ESProduct from the host memory space, and produce another product into the host memory space. An implicit copy of the data product from the host memory space to the device memory space (of the backend of the ESProducer) is registered as discussed above.
* If the function has `device::Record<TRecord> const&` parameter, the function can read an ESProduct from the device memory space, and produce another product into the device memory space. No further copies are made by the framework. The `device::Record<TRecord>::queue()` gives the Alpaka `Queue` object into which all work in the ESProducer must be enqueued.
### Tokens
The memory spaces of the consumed and (in EDProducer case) produced data products are driven by the tokens. The token types to be used in different cases are summarized below.
| | Host memory space | Device memory space |
|----------------------------------------------------------------|-------------------------------|----------------------------------|
| Access Event data product of type `T` | `edm::EDGetTokenT<T>` | `device::EDGetToken<T>` |
| Produce Event data product of type `T` | `edm::EDPutTokenT<T>` | `device::EDPutToken<T>` |
| Access EventSetup data product of type `T` in Record `TRecord` | `edm::ESGetToken<T, TRecord>` | `device::ESGetToken<T, TRecord>` |
With the device memory space tokens the type-deducing `consumes()`, `produces()`, and `esConsumes()` calls must be used (i.e. do not specify the data product type as part of the function call). For more information on these registration functions see
* [`consumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMGetDataFromEvent#consumes)
* [`produces()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts#Producing_the_EDProduct)
* [`esConsumes()` in EDModules](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ED_module)
* [`consumes()` in ESProducers](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideHowToGetDataFromES#In_ESProducer)
### `fillDescriptions()`
In the [`fillDescriptions()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp) function specifying the module label automatically with the [`edm::ConfigurationDescriptions::addWithDefaultLabel()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp#Automatic_module_labels_from_plu) is strongly recommended. Currently a `cfi` file is generated for a module for each Alpaka backend such that the backend namespace is explicitly used in the module definition. An additional `cfi` file is generated for the ["module type resolver"](#module-type-resolver-portable) functionality, where the module type has `@alpaka` postfix.
Also note that the `fillDescription()` function must have the same content for all backends, i.e. any backend-specific behavior with e.g. `#ifdef` or `if constexpr` are forbidden.
### Copy e.g. configuration data to all devices in EDProducer
While the EventSetup can be used to handle copying data to all devices
of an Alpaka backend, for data used only by one EDProducer a simpler
way would be to use one of
* `cms::alpakatools::MoveToDeviceCache<TDevice, THostObject>` (recommended)
* `#include "HeterogeneousCore/AlpakaCore/interface/MoveToDeviceCache.h"`
* Moves the `THostObject` to all devices using `cms::alpakatools::CopyToDevice<THostObject>` synchronously. On host backends the argument `THostObject` is moved around, but not copied.
* The `THostObject` must not be copyable
* This is to avoid easy mistakes with objects that follow copy semantics of `std::shared_ptr` (that includes Alpaka buffers), that would allow the source memory buffer to be used via another copy during the asynchronous data copy to the device.
* The constructor argument `THostObject` object may not be used, unless it is initialized again e.g. by assigning another `THostObject` into it.
* The corresponding device-side object can be obtained with `get()` member function using either alpaka Device or Queue object. It can be used immediately after the constructor returns.
* `cms::alpakatools::CopyToDeviceCache<TDevice, THostObject>` (use only if **must** use copyable `THostObject`)
* `#include "HeterogeneousCore/AlpakaCore/interface/CopyToDeviceCache.h"`
* Copies the `THostObject` to all devices using `cms::alpakatools::CopyToDevice<THostObject>` synchronously. Also host backends do a copy.
* The constructor argument `THostObject` object can be used for other purposes immediately after the constructor returns
* The corresponding device-side object can be obtained with `get()` member function using either alpaka Device or Queue object. It can be used immediately after the constructor returns.
For examples see [`HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerCopyToDeviceCache.cc`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerCopyToDeviceCache.cc) and [`HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerMoveToDeviceCache.cc`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerMoveToDeviceCache.cc).
## Guarantees
* All Event data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed through the `device::Event`.
* All Event data products in the host memory space are guaranteed to be accessible for all operations (after the data product has been obtained from the `edm::Event` or `device::Event`).
* All EventSetup data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed via the `device::EventSetup` (ED modules), or by `device::Record<TRecord>::queue()` when accessed via the `device::Record<TRecord>` (ESProducers).
* The EDM Stream does not proceed to the next Event until after all asynchronous work of the current Event has finished.
* **Note**: this implies if an EDProducer in its `produce()` function uses the `Event::queue()` or gets a device-side data product, and does not produce any device-side data products, the `produce()` call will be synchronous (i.e. will block the CPU thread until the asynchronous work finishes)
## Examples
For concrete examples see code in [`HeterogeneousCore/AlpakaTest`](../../HeterogeneousCore/AlpakaTest) and [`DataFormats/PortableTestObjects`](../../DataFormats/PortableTestObjects).
### EDProducer
This example shows a mixture of behavior from test code in [`HeterogeneousCore/AlpakaTest/plugins/alpaka/`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/)
```cpp
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDGetToken.h"
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/EDPutToken.h"
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ESGetToken.h"
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/Event.h"
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/EventSetup.h"
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/global/EDProducer.h"
#include "HeterogeneousCore/AlpakaInterface/interface/config.h"
// + usual #includes for the used framework components, data format(s), record(s)
// Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
namespace ALPAKA_ACCELERATOR_NAMESPACE {
// Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
class ExampleAlpakaProducer : public global::EDProducer<> {
public:
ExampleAlpakaProducer(edm::ParameterSet const& iConfig)
: EDProducer<>(iConfig),
// produces() must not specify the product type, it is deduced from deviceToken_
deviceToken_{produces()},
size_{iConfig.getParameter<int32_t>("size")} {}
// device::Event and device::EventSetup are defined in ALPAKA_ACCELERATOR_NAMESPACE as well
void produce(edm::StreamID sid, device::Event& iEvent, device::EventSetup const& iSetup) const override {
// get input data products
auto const& hostInput = iEvent.get(getTokenHost_);
auto const& deviceInput = iEvent.get(getTokenDevice_);
auto const& deviceESData = iSetup.getData(esGetTokenDevice_);
// run the algorithm, potentially asynchronously
portabletest::TestDeviceCollection deviceProduct{size_, event.queue()};
algo_.fill(event.queue(), hostInput, deviceInput, deviceESData, deviceProduct);
// put the asynchronous product into the event without waiting
// must use EDPutToken with emplace() or put()
//
// for a product produced with device::EDPutToken<T> the base class registers
// a separately scheduled transformation function for the copy to host
// the transformation function calls
// cms::alpakatools::CopyToDevice<portabletest::TestDeviceCollection>::copyAsync(Queue&, portabletest::TestDeviceCollection const&)
// function
event.emplace(deviceToken_, std::move(deviceProduct));
}
static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
// All backends must have exactly the same fillDescriptions() content!
edm::ParameterSetDescription desc;
desc.add<int32_t>("size");
descriptions.addWithDefaultLabel(desc);
}
private:
// use edm::EGetTokenT<T> to read from host memory space
edm::EDGetTokenT<FooProduct> const getTokenHost_;
// use device::EDGetToken<T> to read from device memory space
device::EDGetToken<BarProduct> const getTokenDevice_;
// use device::ESGetToken<T, TRecord> to read from device memory space
device::ESGetToken<TestProduct, TestRecord> const esGetTokenDevice_;
// use device::EDPutToken<T> to place the data product in the device memory space
device::EDPutToken<portabletest::TestDeviceCollection> const deviceToken_;
int32_t const size_;
// implementation of the algorithm
TestAlgo algo_;
};
} // namespace ALPAKA_ACCELERATOR_NAMESPACE
#include "HeterogeneousCore/AlpakaCore/interface/MakerMacros.h"
DEFINE_FWK_ALPAKA_MODULE(TestAlpakaProducer);
```
### ESProducer to reformat an existing ESProduct for use in device
```cpp
// Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
namespace ALPAKA_ACCELERATOR_NAMESPACE {
// Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
class ExampleAlpakaESProducer : public ESProducer {
public:
ExampleAlpakaESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
// register the production function
auto cc = setWhatProduced(this);
// register consumed ESProduct(s)
token_ = cc.consumes();
}
static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
// All backends must have exactly the same fillDescriptions() content!
edm::ParameterSetDescription desc;
descriptions.addWithDefaultLabel(desc);
}
// return type can be
// - std::optional<T> (T is cheap to move),
// - std::unique_ptr<T> (T is not cheap to move),
// - std::shared_ptr<T> (allows sharing between IOVs)
//
// the base class registers a separately scheduled function to copy the product on device memory
// the function calls
// cms::alpakatools::CopyToDevice<SimpleProduct>::copyAsync(Queue&, SimpleProduct const&)
// function
std::optional<SimpleProduct> produce(TestRecord const& iRecord) {
// get input data
auto const& hostInput = iRecord.get(token_);
// allocate data product on the host memory
SimpleProduct hostProduct;
// fill the hostProduct from hostInput
return hostProduct;
}
private:
edm::ESGetToken<TestProduct, TestRecord> token_;
};
} // namespace ALPAKA_ACCELERATOR_NAMESPACE
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaESProducer);
```
### ESProducer to derive a new ESProduct from an existing device-side ESProduct
```cpp
// Module must be defined in ALPAKA_ACCELERATOR_NAMESPACE
namespace ALPAKA_ACCELERATOR_NAMESPACE {
// Base class is defined in ALPAKA_ACCELEATOR_NAMESPACE as well (note, no edm:: prefix!)
class ExampleAlpakaDeriveESProducer : public ESProducer {
public:
ExampleAlpakaDeriveESProducer(edm::ParameterSet const& iConfig) : ESProducer(iConfig) {
// register the production function
auto cc = setWhatProduced(this);
// register consumed ESProduct(s)
token_ = cc.consumes();
}
static void fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
// All backends must have exactly the same fillDescriptions() content!
edm::ParameterSetDescription desc;
descriptions.addWithDefaultLabel(desc);
}
std::optional<OtherProduct> produce(device::Record<TestRecord> const& iRecord) {
// get input data in the device memory space
auto const& deviceInput = iRecord.get(token_);
// allocate data product on the device memory
OtherProduct deviceProduct(iRecord.queue());
// run the algorithm, potentially asynchronously
algo_.fill(iRecord.queue(), deviceInput, deviceProduct);
// return the product without waiting
return deviceProduct;
}
private:
device::ESGetToken<SimpleProduct, TestRecord> token_;
OtherAlgo algo_;
};
#include "HeterogeneousCore/AlpakaCore/interface/alpaka/ModuleFactory.h"
DEFINE_FWK_EVENTSETUP_ALPAKA_MODULE(ExampleAlpakaDeviceESProducer);
```
## Configuration
There are a few different options for using Alpaka-based modules in the CMSSW configuration.
In all cases the configuration must load the necessary `ProcessAccelerator` objects (see below) For accelerators used in production, these are aggregated in `Configuration.StandardSequences.Accelerators_cff`. The `runTheMatrix.py` handles the loading of this `Accelerators_cff` automatically. The HLT menus also load the necessary `ProcessAccelerator`s.
```python
## Load explicitly
# One ProcessAccelerator for each accelerator technology, plus a generic one for Alpaka
process.load("Configuration.StandardSequences.Accelerators_cff")
```
### Explicit module type (non-portable)
The Alpaka modules can be used in the python configuration with their explicit, full type names
```python
process.producerCPU = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...)
process.producerGPU = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
```
Obviously this kind of configuration can be run only on machines that provide the necessary hardware. The configuration is thus explicitly non-portable.
### SwitchProducerCUDA (semi-portable)
A step towards a portable configuration is to use the `SwitchProcucer` mechanism, for which currently the only concrete implementation is [`SwitchProducerCUDA`](../../HeterogeneousCore/CUDACore/README.md#automatic-switching-between-cpu-and-gpu-modules). The modules for different Alpaka backends still need to be specified explicitly
```python
from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
process.producer = SwitchProducerCUDA(
cpu = cms.EDProducer("alpaka_serial_sync::ExampleAlpakaProducer", ...),
cuda = cms.EDProducer("alpaka_cuda_async::ExampleAlpakaProducer", ...)
)
# or
process.producer = SwitchProducerCUDA(
cpu = cms.EDAlias(producerCPU = cms.EDAlias.allProducts(),
cuda = cms.EDAlias(producerGPU = cms.EDAlias.allProducts()
)
```
This kind of configuration can be run on any machine (a given CMSSW build supports), but is limited to CMSSW builds where the modules for all the Alpaka backends declared in the configuration can be built (`alpaka_serial_sync` and `alpaka_cuda_async` in this example). Therefore the `SwitchProducer` approach is here called "semi-portable".
### Module type resolver (portable)
A fully portable way to express a configuration can be achieved with "module type resolver" approach. The module is specified in the configuration without the backend-specific namespace, and with `@alpaka` postfix
```python
process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka", ...)
# backend can also be set explicitly
process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
...
alpaka = cms.untracked.PSet(
backend = cms.untracked.string("serial_sync")
)
)
```
The `@alpaka` postfix in the module type tells the system the module's exact class type should be resolved at run time. The type (or backend) is set according to the value of `process.options.accelerators` and the set of accelerators available in the machine. If the backend is set explicitly in the module's `alpaka` PSet, the module of that backend will be used.
This approach is portable also across CMSSW builds that support different sets of accelerators, as long as only the host backends (if any) are specified explicitly in the `alpaka` PSet.
#### Examples on explicitly setting the backend
##### For individual modules
The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This setting overrides the `ProcessAcceleratorAlpaka` setting described in the next paragraph.
```python
process.producerCPU = cms.EDProducer("ExampleAlpakaProducer@alpaka",
...
alpaka = cms.untracked.PSet(
backend = cms.untracked.string("serial_sync") # or "cuda_async" or "rocm_async"
)
)
```
##### For all Alpaka modules
The explicitly-set backend must be one of those allowed by the job-wide `process.options.accelerators` setting. This `ProcessAcceleratorAlpaka` setting can be further overridden for individual modules as described in the previous paragraph.
```python
process.ProcessAcceleratorAlpaka.setBackend("serial_sync") # or "cuda_async" or "rocm_async"
```
##### For entire job (i.e. also for non-Alpaka modules)
```python
process.options.accelerators = ["cpu"] # or "gpu-nvidia" or "gpu-amd"
```
### Blocking synchronization (for testing)
While the general approach is to favor asynchronous operations with non-blocking synchronization, for testing purposes it can be useful to synchronize the EDModule's `acquire()` / `produce()` or ESProducer's production functions in a blocking way. Such a blocking synchronization can be specified for individual modules via the `alpaka` `PSet` along
```python
process.producer = cms.EDProducer("ExampleAlpakaProducer@alpaka",
...
alpaka = cms.untracked.PSet(
synchronize = cms.untracked.bool(True)
)
)
```
The blocking synchronization can be specified for all Alpaka modules via the `ProcessAcceleratorAlpaka` along
```python
process.ProcessAcceleratorAlpaka.setSynchronize(True)
```
Note that the possible per-module parameter overrides this global setting.
## Unit tests
Unit tests that depend on Alpaka and define `<flags ALPAKA_BACKENDS="1"/>`, e.g. as a binary along
```xml
<bin name="<unique test binary name>" file="<comma-separated list of files">
<use name="alpaka"/>
<flags ALPAKA_BACKENDS="1"/>
</bin>
```
or as a command (e.g. `cmsRun` or a shell script) to run
```xml
<test name="<unique name of the test>" command="<command to run>">
<use name="alpaka"/>
<flags ALPAKA_BACKENDS="1"/>
</test>
```
will be run as part of `scram build runtests` according to the
availability of the hardware:
- `serial_sync` version is run always
- `cuda_async` version is run if NVIDIA GPU is present (i.e. `cudaIsEnabled` returns 0)
- `rocm_async` version is run if AMD GPU is present (i.e. `rocmIsEnabled` returns 0)
Tests for specific backend (or hardware) can be explicitly specified to be run by setting `USER_UNIT_TESTS=cuda` or `USER_UNIT_TESTS=rocm` environment variable. Tests not depending on the hardware are skipped. If the corresponding hardware is not available, the tests will fail.
|