Warning, /DataFormats/SoATemplate/README.md is written in an unsupported language. File is not indexed.
0001 # Structure of array (SoA) generation
0002
0003 The two header files [`SoALayout.h`](SoALayout.h) and [`SoAView.h`](SoAView.h) define preprocessor macros that
0004 allow generating SoA classes. The SoA classes generate multiple, aligned column from a memory buffer. The memory
0005 buffer is allocated separately by the user, and can be located in a memory space different from the local one (for
0006 example, a SoA located in a GPU device memory can be fully pre-defined on the host and the resulting structure is
0007 passed to the GPU kernel).
0008
0009 This columnar storage allows efficient memory access by GPU kernels (coalesced access on cache line aligned data)
0010 and possibly vectorization.
0011
0012 Additionally, templation of the layout and view classes allows compile-time variations of accesses and checks:
0013 verification of alignment and corresponding compiler hinting, cache strategy (non-coherent, streaming with immediate
0014 invalidation), range checking.
0015
0016 Macro generation allows generating code that provides a clear and concise access of data when used. The code
0017 generation uses the Boost Preprocessing library.
0018
0019 ## Layout
0020
0021 `SoALayout` is a macro generated templated class that subdivides a provided buffer into a collection of columns,
0022 Eigen columns and scalars. The buffer is expected to be aligned with a selectable alignment defaulting to the CUDA
0023 GPU cache line (128 bytes). All columns and scalars within a `SoALayout` will be individually aligned, leaving
0024 padding at the end of each if necessary. Eigen columns have each component of the vector or matrix properly aligned
0025 in individual column (by defining the stride between components). Only compile-time sized Eigen vectors and matrices
0026 are supported. Scalar members are members of layout with one element, irrespective of the size of the layout.
0027
0028 Static utility functions automatically compute the byte size of a layout, taking into account all its columns and
0029 alignment.
0030
0031 ## View
0032
0033 `SoAView` is a macro generated templated class allowing access to columns defined in one or multiple `SoALayout`s or
0034 `SoAViews`. The view can be generated in a constant and non-constant flavors. All view flavors provide with the same
0035 interface where scalar elements are accessed with an `operator()`: `soa.scalar()` while columns (Eigen or not) are
0036 accessed via a array of structure (AoS) -like syntax: `soa[index].x()`. The "struct" object returned by `operator[]`
0037 can be used as a shortcut: `auto si = soa[index]; si.z() = si.x() + zi.y();`
0038
0039 A view can be instanciated by being passed the corresponding layout.
0040
0041 Layout classes also define a `View` and `ConstView` subclass that provide access to each column and
0042 scalar of the layout. In addition to those fully parametrized templates, two others levels of parametrization are
0043 provided: `ViewTemplate`, `ViewViewTemplateFreeParams` and respectively `ConstViewTemplate`,
0044 `ConstViewTemplateFreeParams`. The parametrization of those templates is explained in the [Template
0045 parameters section](#template-parameters).
0046
0047 It is also possible to build a generic `View` or `ConstView` passing from the [Metarecords sublass](#metarecords-subclass).
0048 This view can point to data belonging to different SoAs and thus not contiguous in memory.
0049
0050 ## Descriptor
0051
0052 The nested class `ConstDescriptor` can only be instantiated passing a `View` or a `ConstView` and provides access to
0053 each column through a `std::tuple<std::span<T>...>`. This class should be considered an internal implementation detail,
0054 used solely by the SoA and EDM frameworks for performing heterogeneous memory operations. It is used to implement the
0055 `deepCopy` from a `View` referencing different memory buffers, as shown in
0056 [`PortableHostCollection<T>`](../../DataFormats/Portable/README.md#portablehostCollection)
0057 and [`PortableDeviceCollection<T, TDev>`](../../DataFormats/Portable/README.md#portabledeviceCollection) sections.
0058 It should likely not be used for other purposes.
0059
0060 ## Metadata subclass
0061
0062 In order to no clutter the namespace of the generated class, a subclass name `Metadata` is generated. It is
0063 instanciated with the `metadata()` member function and contains various utility functions, like `size()` (number
0064 of elements in the SoA), `byteSize()`, `byteAlignment()`, `data()` (a pointer to the buffer). A `nextByte()`
0065 function computes the first byte of a structure right after a layout, allowing using a single buffer for multiple
0066 layouts.
0067
0068 ## Metarecords subclass
0069
0070 The nested type `Metarecords` describes the elements of the SoA. It can be instantiated by the `records()` member
0071 function of a `View` or `ConstView`. Every object contains the address of the first element of the column, the number
0072 of elements per column, and the stride for the Eigen columns. These are used to validate the columns size at run time
0073 and to build a generic `View` as described in [View](#view).
0074
0075 ## Customized methods
0076
0077 It is possible to generate methods inside the `element` and `const_element` nested structs using the `SOA_ELEMENT_METHODS`
0078 and `SOA_CONST_ELEMENT_METHODS` macros. Each of these macros can be called only once, and can define multiple methods.
0079 [An example is showed below.](#examples)
0080
0081 ## ROOT serialization and de-serialization
0082
0083 Layouts can be serialized and de-serialized with ROOT. In order to generate the ROOT dictionary, separate
0084 `clases_def.xml` and `classes.h` should be prepared. `classes.h` ensures the inclusion of the proper header files to
0085 get the definition of the serialized classes, and `classes_def.xml` needs to define the fixed list of members that
0086 ROOT should ignore, plus the list of all the columns. [An example is provided below.](#examples)
0087
0088 Serialization of Eigen data is not yet supported.
0089
0090 ## Template parameters
0091
0092 The template shared by layouts and parameters are:
0093 - Byte aligment (defaulting to the nVidia GPU cache line size (128 bytes))
0094 - Alignment enforcement (`relaxed` or `enforced`). When enforced, the alignment will be checked at construction
0095 time.~~, and the accesses are done with compiler hinting (using the widely supported `__builtin_assume_aligned`
0096 intrinsic).~~ It turned out that hinting `nvcc` for alignement removed the benefit of more important `__restrict__`
0097 hinting. The `__builtin_assume_aligned` is hence currently not use.
0098
0099 In addition, the views also provide access parameters:
0100 - Restrict qualify: add restrict hints to read accesses, so that the compiler knows it can relax accesses to the
0101 data and assume it will not change. On nVidia GPUs, this leads to the generation of instruction using the faster
0102 non-coherent cache.
0103 - Range checking: add index checking on each access. As this is a compile time parameter, the cost of the feature at
0104 run time is null if turned off. When turned on, the accesses will be slowed down by checks. Uppon error detection,
0105 an exception is launched (on the CPU side) or the kernel is made to crash (on the GPU side). This feature can help
0106 the debugging of index issues at runtime, but of course requires a recompilation.
0107
0108 The trivial views subclasses come in a variety of parametrization levels: `View` uses the same byte
0109 alignement and alignment enforcement as the layout, and defaults (off) for restrict qualifying and range checking.
0110 `ViewTemplate` template allows setting of restrict qualifying and range checking, while
0111 `ViewTemplateFreeParams` allows full re-customization of the template parameters.
0112
0113 ## Using SoA layouts and views with GPUs
0114
0115 Instanciation of views and layouts is preferably done on the CPU side. The view object is lightweight, with only one
0116 pointer per column, plus the global number of elements. Extra view class can be generated to restrict this number of
0117 pointers to the strict minimum in scenarios where only a subset of columns are used in a given GPU kernel.
0118
0119 ## Examples
0120
0121 A layout can be defined as:
0122
0123 ```C++
0124 #include "DataFormats/SoALayout.h"
0125
0126 GENERATE_SOA_LAYOUT(SoA1LayoutTemplate,
0127 // predefined static scalars
0128 // size_t size;
0129 // size_t alignment;
0130
0131 // columns: one value per element
0132 SOA_COLUMN(double, x),
0133 SOA_COLUMN(double, y),
0134 SOA_COLUMN(double, z),
0135 SOA_EIGEN_COLUMN(Eigen::Vector3d, a),
0136 SOA_EIGEN_COLUMN(Eigen::Vector3d, b),
0137 SOA_EIGEN_COLUMN(Eigen::Vector3d, r),
0138 SOA_COLUMN(uint16_t, color),
0139 SOA_COLUMN(int32_t, value),
0140 SOA_COLUMN(double *, py),
0141 SOA_COLUMN(uint32_t, count),
0142 SOA_COLUMN(uint32_t, anotherCount),
0143
0144 // scalars: one value for the whole structure
0145 SOA_SCALAR(const char *, description),
0146 SOA_SCALAR(uint32_t, someNumber)
0147 );
0148
0149 // Default template parameters are <
0150 // size_t ALIGNMENT = cms::soa::CacheLineSize::defaultSize,
0151 // bool ALIGNMENT_ENFORCEMENT = cms::soa::AlignmentEnforcement::relaxed
0152 // >
0153 using SoA1Layout = SoA1LayoutTemplate<>;
0154
0155 using SoA1LayoutAligned = SoA1LayoutTemplate<cms::soa::CacheLineSize::defaultSize, cms::soa::AlignmentEnforcement::enforced>;
0156 ```
0157
0158 It is possible to declare methods that operate on the SoA elements:
0159
0160 ```C++
0161 #include "DataFormats/SoALayout.h"
0162
0163 GENERATE_SOA_LAYOUT(SoATemplate,
0164 SOA_COLUMN(double, x),
0165 SOA_COLUMN(double, y),
0166 SOA_COLUMN(double, z),
0167
0168 // methods operating on const_element
0169 SOA_CONST_ELEMENT_METHODS(
0170 auto norm() const {
0171 return sqrt(x()*x() + y()+y() + z()*z());
0172 }
0173 ),
0174
0175 // methods operating on element
0176 SOA_ELEMENT_METHODS(
0177 void scale(float arg) {
0178 x() *= arg;
0179 y() *= arg;
0180 z() *= arg;
0181 }
0182 ),
0183
0184 SOA_SCALAR(int, detectorType)
0185 );
0186
0187 using SoA = SoATemplate<>;
0188 using SoAView = SoA::View;
0189 using SoAConstView = SoA::ConstView;
0190 ```
0191
0192 The buffer of the proper size is allocated, and the layout is populated with:
0193
0194 ```C++
0195 // Allocation of aligned
0196 size_t elements = 100;
0197 using AlignedBuffer = std::unique_ptr<std::byte, decltype(std::free) *>;
0198 AlignedBuffer h_buf (reinterpret_cast<std::byte*>(aligned_alloc(SoA1LayoutAligned::alignment, SoA1LayoutAligned::computeDataSize(elements))), std::free);
0199 SoA1LayoutAligned soaLayout(h_buf.get(), elements);
0200 ```
0201
0202 A view will derive its column types from one or multiple layouts. The macro generating the view takes a list of layouts or views it
0203 gets is data from as a first parameter, and the selection of the columns the view will give access to as a second parameter.
0204
0205 ```C++
0206 // A 1 to 1 view of the layout (except for unsupported types).
0207 GENERATE_SOA_VIEW(SoA1ViewTemplate,
0208 SOA_VIEW_LAYOUT_LIST(
0209 SOA_VIEW_LAYOUT(SoA1Layout, soa1)
0210 ),
0211 SOA_VIEW_VALUE_LIST(
0212 SOA_VIEW_VALUE(soa1, x),
0213 SOA_VIEW_VALUE(soa1, y),
0214 SOA_VIEW_VALUE(soa1, z),
0215 SOA_VIEW_VALUE(soa1, color),
0216 SOA_VIEW_VALUE(soa1, value),
0217 SOA_VIEW_VALUE(soa1, py),
0218 SOA_VIEW_VALUE(soa1, count),
0219 SOA_VIEW_VALUE(soa1, anotherCount),
0220 SOA_VIEW_VALUE(soa1, description),
0221 SOA_VIEW_VALUE(soa1, someNumber)
0222 )
0223 );
0224
0225 using SoA1View = SoA1ViewTemplate<>;
0226
0227 SoA1View soaView(soaLayout);
0228
0229 for (size_t i=0; i < soaLayout.metadata().size(); ++i) {
0230 auto si = soaView[i];
0231 si.x() = si.y() = i;
0232 soaView.someNumber() += i;
0233 }
0234 ```
0235
0236 The mutable and const views with the exact same set of columns and their parametrized variants are provided from the layout as:
0237
0238 ```C++
0239 // (Pseudo-code)
0240 struct SoA1Layout::View;
0241
0242 template<bool RESTRICT_QUALIFY = cms::soa::RestrictQualify::enabled,
0243 bool RANGE_CHECKING = cms::soa::RangeChecking::disabled>
0244 struct SoA1Layout::ViewTemplate;
0245
0246 template<size_t ALIGNMENT = cms::soa::CacheLineSize::defaultSize,
0247 bool ALIGNMENT_ENFORCEMENT = cms::soa::AlignmentEnforcement::relaxed,
0248 bool RESTRICT_QUALIFY = cms::soa::RestrictQualify::enabled,
0249 bool RANGE_CHECKING = cms::soa::RangeChecking::disabled>
0250 struct SoA1Layout::ViewTemplateFreeParams;
0251
0252 struct SoA1Layout::ConstView;
0253
0254 template<bool RESTRICT_QUALIFY = cms::soa::RestrictQualify::enabled,
0255 bool RANGE_CHECKING = cms::soa::RangeChecking::disabled>
0256 struct SoA1Layout::ConstViewTemplate;
0257
0258 template<size_t ALIGNMENT = cms::soa::CacheLineSize::defaultSize,
0259 bool ALIGNMENT_ENFORCEMENT = cms::soa::AlignmentEnforcement::relaxed,
0260 bool RESTRICT_QUALIFY = cms::soa::RestrictQualify::enabled,
0261 bool RANGE_CHECKING = cms::soa::RangeChecking::disabled>
0262 struct SoA1Layout::ConstViewTemplateFreeParams;
0263 ```
0264
0265 ## Current status and further improvements
0266
0267 ### Available features
0268
0269 - The layout and views support scalars and columns, alignment and alignment enforcement and hinting (linked).
0270 - Automatic `__restrict__` compiler hinting is supported and can be enabled where appropriate.
0271 - Automatic creation of trivial views and const views derived from a single layout.
0272 - Cache access style, which was explored, was abandoned as this not-yet-used feature interferes with `__restrict__`
0273 support (which is already in used in existing code). It could be made available as a separate tool that can be used
0274 directly by the module developer, orthogonally from SoA.
0275 - Optional (compile time) range checking validates the index of every column access, throwing an exception on the
0276 CPU side and forcing a segmentation fault to halt kernels. When not enabled, it has no impact on performance (code
0277 not compiled)
0278 - Eigen columns are also suported, with both const and non-const flavors.
0279 - ROOT serialization and deserialization is supported. In CMSSW, it is planned to be used through the memory
0280 managing `PortableCollection` family of classes.
0281 - An `operator<<()` is provided to print the layout of an SoA to standard streams.