The Gray Scott School

Day 3 — Fortran on CPU

June 24, with Vincent Lafage: Fortran 2018 on CPU all day — the language of arrays, floating-point precision, the flags exercise, and the Gray-Scott solver in modern Fortran.

June 24, 2026 · Speaker: Vincent Lafage (IJCLab) · Marcel Vivargent Auditorium + satellite sites (including CINERI) · at lunch, visit of the LAPP Museum. The hands-on lives in GrayScott2026/day-3-a/CPU only: the repo's GPU/ and CUDA/ folders will serve on Day 8.

Morning session — Modern Fortran, the language of arrays

1. Fortran is alive and well

The course opens with a proof by example: the Annexe/ folder holds the same program in FORTRAN 66 (obact66.f) and Fortran 95 (obact95.f90), plus a genuine historical spaghetti dish (spaghetti.f) with its untangled version (unspaghetti.f). The 1957 language has become a modern one — modules, generic interfaces, coarrays — without losing its specialty: numerical arrays.

2. Column-major — the trap coming from C

The first structural difference with C/C++: memory order.

Fortran — column-major1591317261014183711151948121620A(i, j): i varies fastest → walk down the columnsC/C++ — row-major1234567891011121314151617181920A[i][j]: j varies fastest → walk across the rowsdo j … do i … A(i, j) ⇔ for i … for j … A[i][j]
The same 2D array, two memory orders: in Fortran the inner loop must walk the FIRST dimension — the opposite of C

Everything Day 2 taught about memory walks still holds — but reversed: in Fortran the inner loop must walk the first dimension. A C reflex copied as-is halves the throughput.

3. The solver in four modules

The hands-on Gray-Scott is cleanly split (src/):

ModuleRole
floating_point_precision.f90the pr kind — the whole solver's precision in one line
gray_scott_settings.f90physical parameters + namelist (config without recompiling)
hdf5_utils.f90HDF5 image output
gray_scott.f90the program: init → evolve → output

The Laplacian stencil is picked among four variants (isotropy vs accuracy):

StencilCenterNote
Central difference−4the classic 5-point
Weights-pretty (default)−8isotropic, used here
Oono-Puri (1988)−12/4better isotropy
Patra-Karttunen−20/64th-order accurate
Laplacian stencil — Weights-pretty (default)
1
1
1
1
-8
1
1
1
1

The center weighs −8, its eight neighbors +1.

4. Telling the compiler your intent

The modern keywords are promises the compiler can exploit: pure / elemental (no side effects), contiguous (dense memory), do concurrent (independent iterations — the key to Days 8-9), and whole-array operations (sum(stencil * U(i-1:i+1, j-1:j+1))) that state what to compute, not how.

Afternoon session — Precision, flags, variants

5. Precision in one line

The hands-on precision module is three lines long — on purpose:

module floating_point_precision
  use, intrinsic :: iso_fortran_env, only: pr => REAL32
end module

Changing REAL32 to REAL64 switches the whole solver. What you trade:

REAL32 — ~7 significant digits1exponent · 8mantissa · 23 bitsREAL64 — ~16 significant digitsexponent · 11mantissa · 52 bitsuse iso_fortran_env, only: pr => REAL32 ! one line switches everything
IEEE 754: precision is the width of the mantissa. The hands-on floating_point_precision module switches the whole solver in one line

6. Compensated summation

Summation order changes the floating-point result. The hands-on implements the four classic remedies (Solutions/compensated_summation.f90 — generic 1D/2D interfaces):

MethodIdea
sum_pairwiserecursive divide-and-conquer summation
sum_Kahancarry a compensation term
sum_Neumaierrobust when the next term is larger
sum_KBNKahan-Babuška-Neumaier, the practical default
for each x in the arrayy = x − ct = s + yc = (t − s) − ys = tthe bits lost by the addition, captured in c
Kahan summation: carry a compensation term c that recovers, at every step, what rounding just lost

7. The flags exercise — same sources, ×6 to ×8

The hands-on script run_flags_exo.sh runs the demonstration end to end, measurements included:

  1. Detect the CPU's SIMD (/proc/cpuinfo: AVX2? AVX-512?);
  2. C: speedcbrtf.c compiled scalar (-O2 -fno-tree-vectorize) then native (-Ofast -march=native -ftree-vectorize -mrecip …) — counting ymm registers in the binary (objdump | grep -c ymm) to prove vectorization;
  3. Fortran: gfortran -fopt-info-vec prints the vectorized loops, and -fopt-info-vec-missed explains why a loop was refused.
cd GrayScott2026/day-3-a
pixi run bash run_flags_exo.sh    # "same sources, ×6-8 depending on flags"

This is act one of "the cube-root story" — whose final word comes at Day 9's closing. To go further: maqao cqa --fct-loops=loop_cbrt analyzes the loop at assembly level (OptimisationCbrtC/ and OptimisationCbrtFortran/ folders).

8. The repo variants — one solver, ten experiments

Solutions/ declines the same solver to isolate each idea:

VariantWhat it isolates
preopenmpopenmpopenmp_bis (+ _ptr)!$omp parallelization step by step
blocking_ptrDay 2's blocking, in Fortran
RK / CKRunge-Kutta / Cash-Karp integrators — temporal accuracy
stdpar / openacc / openmp_offloadthe GPU back-ends — see you on Day 8
VariantMechanismTarget
openmp!$omp directivesCPU cores
stdpardo concurrentmulticore CPU or GPU
openacc / openmp_offloaddirectivesGPU (Day 8)

9. Tested, or it does not count

Solutions/test_stencil.f90 checks the stencil's invariants (Laplacian multiple, normalization) with tolerance — Day 1's rules applied directly. And Annexe/ shows pFUnit tests (test_roman_numeral.pf, test_divisor.pf): the unit-testing framework of the Fortran world.

The Annexe — the cabinet of curiosities

Vincent Lafage's pedagogical signature: santa_claus_problem_caf.f90 (the Santa Claus problem… in coarrays, Fortran 2008's native parallelism), roman_numeral.F90 (+ its tests), physconst.f90, decimal_interface.f90, and the spaghetti.f / unspaghetti.f duo that sums up 60 years of the language.

On video — the official replays

Replay — Fortran On CPU (Gray Scott Thursdays)
Replay — Computing Precision: floating-point in practice (Gray Scott Thursdays)

Sources & official material

Copyright © 2026