The Gray Scott School

Day 3 — Fortran on CPU

June 24, with Vincent Lafage: Fortran 2018 on CPU all day — the language of arrays, floating-point precision, the flags exercise, and the Gray-Scott solver in modern Fortran.

June 24, 2026 · Speaker: Vincent Lafage (IJCLab) · Marcel Vivargent Auditorium + satellite sites (including CINERI) · at lunch, visit of the LAPP Museum. The hands-on lives in GrayScott2026/day-3-a/ — CPU only: the repo's GPU/ and CUDA/ folders will serve on Day 8.

Morning session — Modern Fortran, the language of arrays

1. Fortran is alive and well

The course opens with a proof by example: the Annexe/ folder holds the same program in FORTRAN 66 (obact66.f) and Fortran 95 (obact95.f90), plus a genuine historical spaghetti dish (spaghetti.f) with its untangled version (unspaghetti.f). The 1957 language has become a modern one — modules, generic interfaces, coarrays — without losing its specialty: numerical arrays.

2. Column-major — the trap coming from C

The first structural difference with C/C++: memory order.

The same 2D array, two memory orders: in Fortran the inner loop must walk the FIRST dimension — the opposite of C

Everything Day 2 taught about memory walks still holds — but reversed: in Fortran the inner loop must walk the first dimension. A C reflex copied as-is halves the throughput.

3. The solver in four modules

The hands-on Gray-Scott is cleanly split (src/):

Module	Role
`floating_point_precision.f90`	the `pr` kind — the whole solver's precision in one line
`gray_scott_settings.f90`	physical parameters + namelist (config without recompiling)
`hdf5_utils.f90`	HDF5 image output
`gray_scott.f90`	the program: init → `evolve` → output

The Laplacian stencil is picked among four variants (isotropy vs accuracy):

Stencil	Center	Note
Central difference	−4	the classic 5-point
Weights-pretty (default)	−8	isotropic, used here
Oono-Puri (1988)	−12/4	better isotropy
Patra-Karttunen	−20/6	4th-order accurate

Laplacian stencil — Weights-pretty (default)

-8

The center weighs −8, its eight neighbors +1.

4. Telling the compiler your intent

The modern keywords are promises the compiler can exploit: pure / elemental (no side effects), contiguous (dense memory), do concurrent (independent iterations — the key to Days 8-9), and whole-array operations (sum(stencil * U(i-1:i+1, j-1:j+1))) that state what to compute, not how.

Afternoon session — Precision, flags, variants

5. Precision in one line

The hands-on precision module is three lines long — on purpose:

module floating_point_precision
  use, intrinsic :: iso_fortran_env, only: pr => REAL32
end module

Changing REAL32 to REAL64 switches the whole solver. What you trade:

IEEE 754: precision is the width of the mantissa. The hands-on floating_point_precision module switches the whole solver in one line

6. Compensated summation

Summation order changes the floating-point result. The hands-on implements the four classic remedies (Solutions/compensated_summation.f90 — generic 1D/2D interfaces):

Method	Idea
`sum_pairwise`	recursive divide-and-conquer summation
`sum_Kahan`	carry a compensation term
`sum_Neumaier`	robust when the next term is larger
`sum_KBN`	Kahan-Babuška-Neumaier, the practical default

Kahan summation: carry a compensation term c that recovers, at every step, what rounding just lost

7. The flags exercise — same sources, ×6 to ×8

The hands-on script run_flags_exo.sh runs the demonstration end to end, measurements included:

Detect the CPU's SIMD (/proc/cpuinfo: AVX2? AVX-512?);
C: speedcbrtf.c compiled scalar (-O2 -fno-tree-vectorize) then native (-Ofast -march=native -ftree-vectorize -mrecip …) — counting ymm registers in the binary (objdump | grep -c ymm) to prove vectorization;
Fortran: gfortran -fopt-info-vec prints the vectorized loops, and -fopt-info-vec-missed explains why a loop was refused.

cd GrayScott2026/day-3-a
pixi run bash run_flags_exo.sh    # "same sources, ×6-8 depending on flags"

This is act one of "the cube-root story" — whose final word comes at Day 9's closing. To go further: maqao cqa --fct-loops=loop_cbrt analyzes the loop at assembly level (OptimisationCbrtC/ and OptimisationCbrtFortran/ folders).

8. The repo variants — one solver, ten experiments

Solutions/ declines the same solver to isolate each idea:

Variant	What it isolates
`preopenmp` → `openmp` → `openmp_bis` (+ `_ptr`)	`!$omp` parallelization step by step
`blocking_ptr`	Day 2's blocking, in Fortran
`RK` / `CK`	Runge-Kutta / Cash-Karp integrators — temporal accuracy
`stdpar` / `openacc` / `openmp_offload`	the GPU back-ends — see you on Day 8

Variant	Mechanism	Target
`openmp`	`!$omp` directives	CPU cores
`stdpar`	`do concurrent`	multicore CPU or GPU
`openacc` / `openmp_offload`	directives	GPU (Day 8)

9. Tested, or it does not count

Solutions/test_stencil.f90 checks the stencil's invariants (Laplacian multiple, normalization) with tolerance — Day 1's rules applied directly. And Annexe/ shows pFUnit tests (test_roman_numeral.pf, test_divisor.pf): the unit-testing framework of the Fortran world.

The Annexe — the cabinet of curiosities

Vincent Lafage's pedagogical signature: santa_claus_problem_caf.f90 (the Santa Claus problem… in coarrays, Fortran 2008's native parallelism), roman_numeral.F90 (+ its tests), physconst.f90, decimal_interface.f90, and the spaghetti.f / unspaghetti.f duo that sums up 60 years of the language.

On video — the official replays

Replay — Fortran On CPU (Gray Scott Thursdays)

Replay — Computing Precision: floating-point in practice (Gray Scott Thursdays)

Sources & official material

The course repository: gitlab.in2p3.fr/lafage/GrayScottFortranTuto
The day's slides (PDF, school GitLab wiki): FortranFurious — IJC dual GS 2026
pFUnit (Fortran unit tests): github.com/Goddard-Fortran-Ecosystem/pFUnit_demos
Video replays (YouTube): Gray Scott Thursdays
Course containers (podman/apptainer, grayscott_fortran_* images): container list · school website

Edit this pageorReport an issue

Day 2 — C++ on CPU

Two sessions on June 23: C++ 17/20/23 on CPU in the morning, advanced optimization (blocking & Pyramid) in the afternoon. Measure, understand the stencil, exploit the cache.

Day 4 — Kokkos on CPU

June 25, with Paul Zehner, Juan-José Silva Cuevas and Thomas Padioleau: Kokkos on CPU — one C++ source, backends chosen at compile time, Views, parallel_for and SIMD.