Day 4 — Kokkos on CPU
June 25, 2026 · Speakers: Paul Zehner, Juan-José Silva Cuevas & Thomas Padioleau (Maison de la Simulation, CEA) · Marcel Vivargent Auditorium + satellite sites (including CINERI). The hands-on lives in
GrayScott2026/day-4/exercises/— CPU only: thegpu*exercises wait for Day 9. Rare bonus: the lecture itself is in the repo (courses/kokkos_cpu.tex, Gray-Scott beamer theme).
Morning session — from sequential to the first kernel
1. Why Kokkos
The lecture opens on the promises — and they are precise. Kokkos targets the 3Ps: Performance (the best of a given hardware), Portability (the same code on different hardware), Productivity (write, maintain, extend fast) — plus maturity (production-ready, not a research product), community, longevity and interoperability (I/O, linear algebra, ML).
| Approach | Runs on | You write |
|---|---|---|
| Raw CUDA / HIP | one vendor's GPU | low-level kernels |
| OpenMP / OpenACC | CPU (+ some GPU) | directives on loops |
std::execution::par | compiler-dependent | standard algorithms |
| Kokkos | CPU + NVIDIA / AMD / Intel GPU | C++ patterns + Views |
2. The thesis: one code, many backends
3. The hands-on starts: hello_world → sequential
The repo progression is built to isolate each idea. hello_world checks the installation
(init/finalize via Kokkos::ScopeGuard); sequential sets the reference sequential
Gray-Scott — the baseline every variant is compared against. Shared infrastructure lives in
common/ (CLI11 parameters, output writer, helpers).
Afternoon session — Views, parallelism, SIMD
4. View — the container that abstracts memory
Why an abstracted container? The lecture answers: no more manual allocation, a unified CPU/GPU memory semantic, vendor-specific allocation hidden, advanced capabilities (abstracted layout, subarrays, multidimensionality) and safety (compile/runtime checks).
The loop closes with Days 2-3: layout was the biggest lever — Kokkos turns it into a type parameter, defaulting to what fits the hardware.
5. parallel_for and parallel_reduce
The kernel becomes a lambda handed to a parallel pattern, with an iteration policy:
Kokkos::parallel_for("compute",
Kokkos::MDRangePolicy<Kokkos::Rank<2>>({1, 1}, {rows - 1, cols - 1}),
KOKKOS_LAMBDA (int i, int j) {
u_temp(i, j) = u(i, j) + dt * (/* … the stencil … */);
});
The reduction (field checksum) follows the same pattern with parallel_reduce — the
parallel, safe version of accumulation.
6. Switching to the CPU backend
The cpu exercise enables OpenMP (-DKokkos_ENABLE_OPENMP=ON): same Views, same lambdas,
all the cores. Repo honesty: the file carries an explicit warning — this version "only
runs on CPU, it is not yet portable Kokkos". What is missing for the GPU (layouts, transfers,
fence) is exactly Day 9's program.
| Backend | Flag |
|---|---|
| Serial | -DKokkos_ENABLE_SERIAL=ON |
| OpenMP | -DKokkos_ENABLE_OPENMP=ON |
| Threads | -DKokkos_ENABLE_THREADS=ON |
7. SIMD — explicit vectorization, Kokkos style
The cpu_simd exercise goes one level down: the Kokkos::Experimental::simd types pack
simd_width = SimdType::size() values per operation, loaded from the Views
(simd_flag_default). It is the same in-core parallelism as Day 1 — and the direct preview
of EVE on Day 6.
8. Verify, always
scripts/check_outcome.sh replays every implementation on the 10 × 10 case and compares
checksums; scripts/run_all.sh runs all the variants of a build. Day 1's rule
(sequential/parallel equivalence) is here tooled.
The hands-on — GrayScott2026/day-4/
Dependencies: CMake ≥ 3.28, Kokkos ≥ 5.1.1, HDF5 (C++), CLI11. Two paths:
# 1) dependencies handled by CMake (except HDF5)
cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_DOWNLOAD_FALLBACK=ON -DHDF5_ROOT=…
cmake --build build --parallel $(nproc)
# 2) check an implementation
bash exercises/scripts/check_outcome.sh build/cpu/gray_scott_cpu
Official Docker images (CPU: interactive, jupyter, vscode, code_server) are listed in the repo README.
On video — the official replay
Sources & official material
- The course repository (exercises + the LaTeX lecture in
courses/): github.com/Maison-de-la-Simulation/gray-scott-kokkos - The day's slides (PDF, school GitLab wiki): kokkos_cpu.pdf
- Kokkos: kokkos.org · github.com/kokkos/kokkos
- Video replays (YouTube): Gray Scott Thursdays
- School website: GrayScott2026
Day 3 — Fortran on CPU
June 24, with Vincent Lafage: Fortran 2018 on CPU all day — the language of arrays, floating-point precision, the flags exercise, and the Gray-Scott solver in modern Fortran.
Day 5 — Python on CPU
June 26, with Alice Faure, Jean-Marc Colley, Sébastien Valat and Nabil Garroum: profile Python, vectorize with NumPy, compile with Numba, then trace with JAX — up to ×18 without leaving Python.