Day 3 — Fortran on CPU
June 24, 2026 · Speaker: Vincent Lafage (IJCLab) · Marcel Vivargent Auditorium + satellite sites (including CINERI) · at lunch, visit of the LAPP Museum. The hands-on lives in
GrayScott2026/day-3-a/— CPU only: the repo'sGPU/andCUDA/folders will serve on Day 8.
Morning session — Modern Fortran, the language of arrays
1. Fortran is alive and well
The course opens with a proof by example: the Annexe/ folder holds the same program in
FORTRAN 66 (obact66.f) and Fortran 95 (obact95.f90), plus a genuine historical spaghetti
dish (spaghetti.f) with its untangled version (unspaghetti.f). The 1957 language has
become a modern one — modules, generic interfaces, coarrays — without losing its
specialty: numerical arrays.
2. Column-major — the trap coming from C
The first structural difference with C/C++: memory order.
Everything Day 2 taught about memory walks still holds — but reversed: in Fortran the inner loop must walk the first dimension. A C reflex copied as-is halves the throughput.
3. The solver in four modules
The hands-on Gray-Scott is cleanly split (src/):
| Module | Role |
|---|---|
floating_point_precision.f90 | the pr kind — the whole solver's precision in one line |
gray_scott_settings.f90 | physical parameters + namelist (config without recompiling) |
hdf5_utils.f90 | HDF5 image output |
gray_scott.f90 | the program: init → evolve → output |
The Laplacian stencil is picked among four variants (isotropy vs accuracy):
| Stencil | Center | Note |
|---|---|---|
| Central difference | −4 | the classic 5-point |
| Weights-pretty (default) | −8 | isotropic, used here |
| Oono-Puri (1988) | −12/4 | better isotropy |
| Patra-Karttunen | −20/6 | 4th-order accurate |
The center weighs −8, its eight neighbors +1.
4. Telling the compiler your intent
The modern keywords are promises the compiler can exploit: pure / elemental (no side
effects), contiguous (dense memory), do concurrent (independent iterations — the key to
Days 8-9), and whole-array operations (sum(stencil * U(i-1:i+1, j-1:j+1))) that state what
to compute, not how.
Afternoon session — Precision, flags, variants
5. Precision in one line
The hands-on precision module is three lines long — on purpose:
module floating_point_precision
use, intrinsic :: iso_fortran_env, only: pr => REAL32
end module
Changing REAL32 to REAL64 switches the whole solver. What you trade:
6. Compensated summation
Summation order changes the floating-point result. The hands-on implements the four classic
remedies (Solutions/compensated_summation.f90 — generic 1D/2D interfaces):
| Method | Idea |
|---|---|
sum_pairwise | recursive divide-and-conquer summation |
sum_Kahan | carry a compensation term |
sum_Neumaier | robust when the next term is larger |
sum_KBN | Kahan-Babuška-Neumaier, the practical default |
7. The flags exercise — same sources, ×6 to ×8
The hands-on script run_flags_exo.sh runs the demonstration end to end, measurements
included:
- Detect the CPU's SIMD (
/proc/cpuinfo: AVX2? AVX-512?); - C:
speedcbrtf.ccompiled scalar (-O2 -fno-tree-vectorize) then native (-Ofast -march=native -ftree-vectorize -mrecip …) — countingymmregisters in the binary (objdump | grep -c ymm) to prove vectorization; - Fortran:
gfortran -fopt-info-vecprints the vectorized loops, and-fopt-info-vec-missedexplains why a loop was refused.
cd GrayScott2026/day-3-a
pixi run bash run_flags_exo.sh # "same sources, ×6-8 depending on flags"
This is act one of "the cube-root story" — whose final word comes at Day 9's closing. To go
further: maqao cqa --fct-loops=loop_cbrt analyzes the loop at assembly level
(OptimisationCbrtC/ and OptimisationCbrtFortran/ folders).
8. The repo variants — one solver, ten experiments
Solutions/ declines the same solver to isolate each idea:
| Variant | What it isolates |
|---|---|
preopenmp → openmp → openmp_bis (+ _ptr) | !$omp parallelization step by step |
blocking_ptr | Day 2's blocking, in Fortran |
RK / CK | Runge-Kutta / Cash-Karp integrators — temporal accuracy |
stdpar / openacc / openmp_offload | the GPU back-ends — see you on Day 8 |
| Variant | Mechanism | Target |
|---|---|---|
openmp | !$omp directives | CPU cores |
stdpar | do concurrent | multicore CPU or GPU |
openacc / openmp_offload | directives | GPU (Day 8) |
9. Tested, or it does not count
Solutions/test_stencil.f90 checks the stencil's invariants (Laplacian multiple,
normalization) with tolerance — Day 1's rules applied directly. And Annexe/ shows pFUnit
tests (test_roman_numeral.pf, test_divisor.pf): the unit-testing framework of the Fortran
world.
The Annexe — the cabinet of curiosities
Vincent Lafage's pedagogical signature: santa_claus_problem_caf.f90 (the Santa Claus
problem… in coarrays, Fortran 2008's native parallelism), roman_numeral.F90 (+ its
tests), physconst.f90, decimal_interface.f90, and the spaghetti.f / unspaghetti.f duo
that sums up 60 years of the language.
On video — the official replays
Sources & official material
- The course repository: gitlab.in2p3.fr/lafage/GrayScottFortranTuto
- The day's slides (PDF, school GitLab wiki): FortranFurious — IJC dual GS 2026
- pFUnit (Fortran unit tests): github.com/Goddard-Fortran-Ecosystem/pFUnit_demos
- Video replays (YouTube): Gray Scott Thursdays
- Course containers (podman/apptainer,
grayscott_fortran_*images): container list · school website
Day 2 — C++ on CPU
Two sessions on June 23: C++ 17/20/23 on CPU in the morning, advanced optimization (blocking & Pyramid) in the afternoon. Measure, understand the stencil, exploit the cache.
Day 4 — Kokkos on CPU
June 25, with Paul Zehner, Juan-José Silva Cuevas and Thomas Padioleau: Kokkos on CPU — one C++ source, backends chosen at compile time, Views, parallel_for and SIMD.