0001 # Structure of array (SoA) generation
0003 The two header files [`SoALayout.h`](SoALayout.h) and [`SoAView.h`](SoAView.h) define preprocessor macros that
0004 allow generating SoA classes. The SoA classes generate multiple, aligned column from a memory buffer. The memory
0005 buffer is allocated separately by the user, and can be located in a memory space different from the local one (for
0006 example, a SoA located in a GPU device memory can be fully pre-defined on the host and the resulting structure is
0007 passed to the GPU kernel).
0009 This columnar storage allows efficient memory access by GPU kernels (coalesced access on cache line aligned data)
0010 and possibly vectorization.
0012 Additionally, templation of the layout and view classes allows compile-time variations of accesses and checks:
0013 verification of alignment and corresponding compiler hinting, cache strategy (non-coherent, streaming with immediate
0014 invalidation), range checking.
0016 Macro generation allows generating code that provides a clear and concise access of data when used. The code
0017 generation uses the Boost Preprocessing library.
0019 ## Layout
0021 `SoALayout` is a macro generated templated class that subdivides a provided buffer into a collection of columns,
0022 Eigen columns and scalars. The buffer is expected to be aligned with a selectable alignment defaulting to the CUDA
0023 GPU cache line (128 bytes). All columns and scalars within a `SoALayout` will be individually aligned, leaving
0024 padding at the end of each if necessary. Eigen columns have each component of the vector or matrix properly aligned
0025 in individual column (by defining the stride between components). Only compile-time sized Eigen vectors and matrices
0026 are supported. Scalar members are members of layout with one element, irrespective of the size of the layout.
0028 Static utility functions automatically compute the byte size of a layout, taking into account all its columns and
0029 alignment.
0031 ## View
0033 `SoAView` is a macro generated templated class allowing access to columns defined in one or multiple `SoALayout`s or
0034 `SoAViews`. The view can be generated in a constant and non-constant flavors. All view flavors provide with the same
0035 interface where scalar elements are accessed with an `operator()`: `soa.scalar()` while columns (Eigen or not) are
0036 accessed via a array of structure (AoS) -like syntax: `soa[index].x()`. The "struct" object returned by `operator[]`
0037 can be used as a shortcut: `auto si = soa[index]; si.z() = si.x() + zi.y();`
0039 A view can be instanciated by being passed the layout(s) and view(s) it is defined against, or column by column.
0041 Layout classes also define a `View` and `ConstView` subclass that provide access to each column and
0042 scalar of the layout. In addition to those fully parametrized templates, two others levels of parametrization are
0043 provided: `ViewTemplate`, `ViewViewTemplateFreeParams` and respectively `ConstViewTemplate`,
0044 `ConstViewTemplateFreeParams`. The parametrization of those templates is explained in the [Template
0045 parameters section](#template-parameters).
0047 ## Metadata subclass
0049 In order to no clutter the namespace of the generated class, a subclass name `Metadata` is generated. It is
0050 instanciated with the `metadata()` member function and contains various utility functions, like `size()` (number
0051 of elements in the SoA), `byteSize()`, `byteAlignment()`, `data()` (a pointer to the buffer). A `nextByte()`
0052 function computes the first byte of a structure right after a layout, allowing using a single buffer for multiple
0053 layouts.
0055 ## ROOT serialization and de-serialization
0057 Layouts can be serialized and de-serialized with ROOT. In order to generate the ROOT dictionary, separate
0058 `clases_def.xml` and `classes.h` should be prepared. `classes.h` ensures the inclusion of the proper header files to
0059 get the definition of the serialized classes, and `classes_def.xml` needs to define the fixed list of members that
0060 ROOT should ignore, plus the list of all the columns. [An example is provided below.](#examples)
0062 Serialization of Eigen data is not yet supported.
0064 ## Template parameters
0066 The template shared by layouts and parameters are:
0067 - Byte aligment (defaulting to the nVidia GPU cache line size (128 bytes))
0068 - Alignment enforcement (`relaxed` or `enforced`). When enforced, the alignment will be checked at construction
0069   time.~~, and the accesses are done with compiler hinting (using the widely supported `__builtin_assume_aligned`
0070   intrinsic).~~ It turned out that hinting `nvcc` for alignement removed the benefit of more important `__restrict__`
0071   hinting. The `__builtin_assume_aligned` is hence currently not use.
0073 In addition, the views also provide access parameters:
0074 - Restrict qualify: add restrict hints to read accesses, so that the compiler knows it can relax accesses to the
0075   data and assume it will not change. On nVidia GPUs, this leads to the generation of instruction using the faster
0076   non-coherent cache.
0077 - Range checking: add index checking on each access. As this is a compile time parameter, the cost of the feature at
0078   run time is null if turned off. When turned on, the accesses will be slowed down by checks. Uppon error detection,
0079   an exception is launched (on the CPU side) or the kernel is made to crash (on the GPU side). This feature can help
0080   the debugging of index issues at runtime, but of course requires a recompilation.
0082 The trivial views subclasses come in a variety of parametrization levels: `View` uses the same byte
0083 alignement and alignment enforcement as the layout, and defaults (off) for restrict qualifying and range checking.
0084 `ViewTemplate` template allows setting of restrict qualifying and range checking, while
0085 `ViewTemplateFreeParams` allows full re-customization of the template parameters.
0087 ## Using SoA layouts and views with GPUs
0089 Instanciation of views and layouts is preferably done on the CPU side. The view object is lightweight, with only one
0090 pointer per column, plus the global number of elements. Extra view class can be generated to restrict this number of
0091 pointers to the strict minimum in scenarios where only a subset of columns are used in a given GPU kernel.
0093 ## Examples
0095 A layout can be defined as:
0097 ```C++
0098 #include "DataFormats/SoALayout.h"
0100 GENERATE_SOA_LAYOUT(SoA1LayoutTemplate,
0101   // predefined static scalars
0102   // size_t size;
0103   // size_t alignment;
0105   // columns: one value per element
0106   SOA_COLUMN(double, x),
0107   SOA_COLUMN(double, y),
0108   SOA_COLUMN(double, z),
0109   SOA_EIGEN_COLUMN(Eigen::Vector3d, a),
0110   SOA_EIGEN_COLUMN(Eigen::Vector3d, b),
0111   SOA_EIGEN_COLUMN(Eigen::Vector3d, r),
0112   SOA_COLUMN(uint16_t, color),
0113   SOA_COLUMN(int32_t, value),
0114   SOA_COLUMN(double *, py),
0115   SOA_COLUMN(uint32_t, count),
0116   SOA_COLUMN(uint32_t, anotherCount),
0118   // scalars: one value for the whole structure
0119   SOA_SCALAR(const char *, description),
0120   SOA_SCALAR(uint32_t, someNumber)
0121 );
0123 // Default template parameters are <
0124 //   size_t ALIGNMENT = cms::soa::CacheLineSize::defaultSize,
0125 //   bool ALIGNMENT_ENFORCEMENT = cms::soa::AlignmentEnforcement::relaxed
0126 // >
0127 using SoA1Layout = SoA1LayoutTemplate<>;
0129 using SoA1LayoutAligned = SoA1LayoutTemplate<cms::soa::CacheLineSize::defaultSize, cms::soa::AlignmentEnforcement::enforced>;
0130 ```
0132 The buffer of the proper size is allocated, and the layout is populated with:
0134 ```C++
0135 // Allocation of aligned
0136 size_t elements = 100;
0137 using AlignedBuffer = std::unique_ptr<std::byte, decltype(std::free) *>;
0138 AlignedBuffer h_buf (reinterpret_cast<std::byte*>(aligned_alloc(SoA1LayoutAligned::alignment, SoA1LayoutAligned::computeDataSize(elements))), std::free);
0139 SoA1LayoutAligned soaLayout(h_buf.get(), elements);
0140 ```
0142 A view will derive its column types from one or multiple layouts. The macro generating the view takes a list of layouts or views it
0143 gets is data from as a first parameter, and the selection of the columns the view will give access to as a second parameter.
0145 ```C++
0146 // A 1 to 1 view of the layout (except for unsupported types).
0147 GENERATE_SOA_VIEW(SoA1ViewTemplate,
0149     SOA_VIEW_LAYOUT(SoA1Layout, soa1)
0150   ),
0152     SOA_VIEW_VALUE(soa1, x),
0153     SOA_VIEW_VALUE(soa1, y),
0154     SOA_VIEW_VALUE(soa1, z),
0155     SOA_VIEW_VALUE(soa1, color),
0156     SOA_VIEW_VALUE(soa1, value),
0157     SOA_VIEW_VALUE(soa1, py),
0158     SOA_VIEW_VALUE(soa1, count),
0159     SOA_VIEW_VALUE(soa1, anotherCount),
0160     SOA_VIEW_VALUE(soa1, description),
0161     SOA_VIEW_VALUE(soa1, someNumber)
0162   )
0163 );
0165 using SoA1View = SoA1ViewTemplate<>;
0167 SoA1View soaView(soaLayout);
0169 for (size_t i=0; i < soaLayout.metadata().size(); ++i) {
0170   auto si = soaView[i];
0171   si.x() = si.y() = i;
0172   soaView.someNumber() += i;
0173 }
0174 ```
0176 The mutable and const views with the exact same set of columns and their parametrized variants are provided from the layout as:
0178 ```C++
0179 // (Pseudo-code)
0180 struct SoA1Layout::View;
0182 template<bool RESTRICT_QUALIFY = cms::soa::RestrictQualify::enabled,
0183          bool RANGE_CHECKING = cms::soa::RangeChecking::disabled>
0184 struct SoA1Layout::ViewTemplate;
0186 template<size_t ALIGNMENT = cms::soa::CacheLineSize::defaultSize,
0187          bool ALIGNMENT_ENFORCEMENT = cms::soa::AlignmentEnforcement::relaxed,
0188          bool RESTRICT_QUALIFY = cms::soa::RestrictQualify::enabled,
0189          bool RANGE_CHECKING = cms::soa::RangeChecking::disabled>
0190 struct SoA1Layout::ViewTemplateFreeParams;
0192 struct SoA1Layout::ConstView;
0194 template<bool RESTRICT_QUALIFY = cms::soa::RestrictQualify::enabled,
0195          bool RANGE_CHECKING = cms::soa::RangeChecking::disabled>
0196 struct SoA1Layout::ConstViewTemplate;
0198 template<size_t ALIGNMENT = cms::soa::CacheLineSize::defaultSize,
0199          bool ALIGNMENT_ENFORCEMENT = cms::soa::AlignmentEnforcement::relaxed,
0200          bool RESTRICT_QUALIFY = cms::soa::RestrictQualify::enabled,
0201          bool RANGE_CHECKING = cms::soa::RangeChecking::disabled>
0202 struct SoA1Layout::ConstViewTemplateFreeParams;
0203 ```
0207 ## Current status and further improvements
0209 ### Available features
0211 - The layout and views support scalars and columns, alignment and alignment enforcement and hinting (linked).
0212 - Automatic `__restrict__` compiler hinting is supported and can be enabled where appropriate.
0213 - Automatic creation of trivial views and const views derived from a single layout.
0214 - Cache access style, which was explored, was abandoned as this not-yet-used feature interferes with `__restrict__`
0215   support (which is already in used in existing code). It could be made available as a separate tool that can be used
0216   directly by the module developer, orthogonally from SoA.
0217 - Optional (compile time) range checking validates the index of every column access, throwing an exception on the
0218   CPU side and forcing a segmentation fault to halt kernels. When not enabled, it has no impact on performance (code
0219   not compiled)
0220 - Eigen columns are also suported, with both const and non-const flavors.
0221 - ROOT serialization and deserialization is supported. In CMSSW, it is planned to be used through the memory
0222   managing `PortableCollection` family of classes.
0223 - An `operator<<()` is provided to print the layout of an SoA to standard streams.