[{"data":1,"prerenderedAt":837},["ShallowReactive",2],{"navigation_docs_en":3,"-en-gray-scott-school-jour-2":66,"-en-gray-scott-school-jour-2-surround":832},[4,50,60],{"title":5,"path":6,"stem":7,"children":8},"The Gray Scott School","/en/gray-scott-school","en/1.gray-scott-school/01.index",[9,10,14,18,22,26,30,34,38,42,46],{"title":5,"path":6,"stem":7},{"title":11,"path":12,"stem":13},"CINERI Presentation","/en/gray-scott-school/presentation-cineri","en/1.gray-scott-school/02.presentation-cineri",{"title":15,"path":16,"stem":17},"Day 1 — Foundations","/en/gray-scott-school/jour-1","en/1.gray-scott-school/03.jour-1",{"title":19,"path":20,"stem":21},"Day 2 — C++ on CPU","/en/gray-scott-school/jour-2","en/1.gray-scott-school/04.jour-2",{"title":23,"path":24,"stem":25},"Day 3 — Fortran on CPU","/en/gray-scott-school/jour-3","en/1.gray-scott-school/05.jour-3",{"title":27,"path":28,"stem":29},"Day 4 — Kokkos on CPU","/en/gray-scott-school/jour-4","en/1.gray-scott-school/06.jour-4",{"title":31,"path":32,"stem":33},"Day 5 — Python on CPU","/en/gray-scott-school/jour-5","en/1.gray-scott-school/07.jour-5",{"title":35,"path":36,"stem":37},"Day 6 — SIMD with EVE + GPU architecture","/en/gray-scott-school/jour-6","en/1.gray-scott-school/08.jour-6",{"title":39,"path":40,"stem":41},"Day 7 — Python on GPU","/en/gray-scott-school/jour-7","en/1.gray-scott-school/09.jour-7",{"title":43,"path":44,"stem":45},"Day 8 — Fortran on GPU","/en/gray-scott-school/jour-8","en/1.gray-scott-school/10.jour-8",{"title":47,"path":48,"stem":49},"Day 9 — Kokkos on GPU","/en/gray-scott-school/jour-9","en/1.gray-scott-school/11.jour-9",{"title":51,"path":52,"stem":53,"children":54},"Projects","/en/projets","en/2.projets/1.index",[55,56],{"title":51,"path":52,"stem":53},{"title":57,"path":58,"stem":59},"SenLand","/en/projets/senland","en/2.projets/2.senland",{"title":61,"path":62,"stem":63,"children":64},"About","/en/a-propos","en/3.a-propos/1.index",[65],{"title":61,"path":62,"stem":63},{"id":67,"title":19,"badge":68,"body":69,"category":68,"description":820,"extension":821,"links":822,"meta":827,"navigation":829,"path":20,"seo":830,"stem":21,"tags":68,"__hash__":831},"docs_en/en/1.gray-scott-school/04.jour-2.md",null,{"type":70,"value":71,"toc":803},"minimark",[72,108,113,118,133,201,208,212,219,222,229,233,251,254,260,264,278,316,330,334,365,408,411,418,424,428,432,442,445,451,456,460,478,481,504,510,515,534,540,551,653,679,683,690,696,701,705,799],[73,74,75],"blockquote",{},[76,77,78,82,83,86,87,90,91,94,95,98,99,103,104,107],"p",{},[79,80,81],"strong",{},"June 23, 2026"," · Speakers: ",[79,84,85],{},"Sébastien Valat"," & ",[79,88,89],{},"Pierre Aubert"," (LAPP) · Marcel\nVivargent Auditorium + satellite sites (including CINERI). Two distinct sessions: in the\nmorning, ",[79,92,93],{},"C++ 17/20/23 on CPU","; in the afternoon, ",[79,96,97],{},"advanced optimization"," (blocking &\nPyramid). The hands-on lives in ",[100,101,102],"code",{},"GrayScott2026/day-2/"," — ",[79,105,106],{},"the GPU is not on today's menu",".",[109,110,112],"h2",{"id":111},"morning-session-c-172023-on-cpu","Morning session — C++ 17/20/23 on CPU",[114,115,117],"h3",{"id":116},"_1-the-golden-rule-measure-first","1. The golden rule: measure first",[76,119,120,121,124,125,128,129,132],{},"No optimization is kept without a number. So the hands-on starts by building the measurement\ntooling itself, in three steps (",[100,122,123],{},"1-FirstPerformanceTest"," → ",[100,126,127],{},"2-BenchmarkFunction"," →\n",[100,130,131],{},"3-FunctionTimer","):",[134,135,136,152],"table",{},[137,138,139],"thead",{},[140,141,142,146,149],"tr",{},[143,144,145],"th",{},"Module",[143,147,148],{},"What you build",[143,150,151],{},"The lesson",[153,154,155,171,186],"tbody",{},[140,156,157,162,168],{},[158,159,160],"td",{},[100,161,123],{},[158,163,164,165],{},"a minimal ",[100,166,167],{},"timer.cpp",[158,169,170],{},"measuring is code too",[140,172,173,177,183],{},[158,174,175],{},[100,176,127],{},[158,178,179,180],{},"repetitions + statistics + ",[100,181,182],{},"pin_thread_to_core",[158,184,185],{},"a single measurement lies; pinning the thread stabilizes",[140,187,188,192,198],{},[158,189,190],{},[100,191,131],{},[158,193,194,195],{},"the reusable course ",[100,196,197],{},"FunctionTimer",[158,199,200],{},"the tool we keep for the whole school",[76,202,203,204,207],{},"A detail that matters: ",[100,205,206],{},"pin_thread_to_core.cpp"," — without pinning, the scheduler walks the\nthread from core to core and noise drowns the effect you are measuring.",[114,209,211],{"id":210},"_2-the-stencil-and-why-it-is-memory-bound","2. The stencil — and why it is memory-bound",[76,213,214,215,218],{},"Gray-Scott's discrete Laplacian is a ",[79,216,217],{},"3×3 stencil",": each output point is a weighted sum of\nits nine neighbors.",[220,221],"d2-stencil",{},[76,223,224,225,228],{},"A dozen flops for nine reads: ",[79,226,227],{},"arithmetic intensity is low",". On Day 1's roofline this kernel\nlives under the sloped roof — it waits for memory, not compute. The whole day follows from\nthis observation.",[114,230,232],{"id":231},"_3-data-layout-the-single-biggest-lever","3. Data layout — the single biggest lever",[76,234,235,236,239,240,243,244,247,248,107],{},"Module ",[100,237,238],{},"5-DataLayout",": the same kernel with two memory walks (",[100,241,242],{},"layout_efficient"," vs\n",[100,245,246],{},"layout_swap_axis","). Memory is a 1D ribbon; the cache loads whole ",[79,249,250],{},"lines",[252,253],"d2-layout",{},[76,255,256,257],{},"Walking the array in storage order feeds the cache; swapping the axes starves it on every\naccess. ",[79,258,259],{},"Before any compute trick, fix the memory walk.",[114,261,263],{"id":262},"_4-vectorization-a-conversation-with-the-compiler","4. Vectorization — a conversation with the compiler",[76,265,235,266,269,270,273,274,277],{},[100,267,268],{},"6-Vectorization",": no hand-written SIMD today — we ",[79,271,272],{},"let the compiler"," do it, provided\nwe hand it a clean loop (pure, contiguous, no aliasing; ",[100,275,276],{},"__restrict__"," promises no aliasing).",[134,279,280,290],{},[137,281,282],{},[140,283,284,287],{},[143,285,286],{},"Target",[143,288,289],{},"Flags",[153,291,292,304],{},[140,293,294,299],{},[158,295,296],{},[100,297,298],{},"naive_gray_scott_O3",[158,300,301],{},[100,302,303],{},"-O3",[140,305,306,311],{},[158,307,308],{},[100,309,310],{},"autovec_gray_scott_O3",[158,312,313],{},[100,314,315],{},"-O3 -march=native -mtune=native -ftree-vectorize -funroll-loops",[76,317,318,321,322,325,326,329],{},[100,319,320],{},"-march=native"," unlocks the widest SIMD of the CPU (AVX2 → ×8 floats); the\n",[100,323,324],{},"autovectorization3x3"," module specializes the kernel for the 3×3 stencil. Check what the\ncompiler actually did: ",[100,327,328],{},"make helpoption"," lists the hands-on build variants.",[114,331,333],{"id":332},"_5-assembling-the-simulation","5. Assembling the simulation",[76,335,336,337,340,341,344,345,348,349,352,353,356,357,360,361,364],{},"Modules ",[100,338,339],{},"9-Simulation"," (the assembled solver, from ",[100,342,343],{},"very_naive"," to ",[100,346,347],{},"autovec","),\n",[100,350,351],{},"10-FullHDSimulation"," (scaling to 1920×1080), ",[100,354,355],{},"7-DataOutput"," (",[79,358,359],{},"HDF5"," output) and\n",[100,362,363],{},"8-ImagePlotting"," (image conversion):",[366,367,372],"pre",{"className":368,"code":369,"language":370,"meta":371,"style":371},"language-bash shiki shiki-themes material-theme-lighter material-theme material-theme-palenight","time ./9-Simulation/autovec_gray_scott_O3 -n 10 -e 30 -r 1080 -c 1920\nmkdir pics && time ./8-ImagePlotting/gray_scott_image -i output.h5 -o pics/\n","bash","",[100,373,374,387],{"__ignoreMap":371},[375,376,379,383],"span",{"class":377,"line":378},"line",1,[375,380,382],{"class":381},"sbssI","time",[375,384,386],{"class":385},"sTEyZ"," ./9-Simulation/autovec_gray_scott_O3 -n 10 -e 30 -r 1080 -c 1920\n",[375,388,390,394,398,402,405],{"class":377,"line":389},2,[375,391,393],{"class":392},"sBMFI","mkdir",[375,395,397],{"class":396},"sfazB"," pics",[375,399,401],{"class":400},"sMK4o"," &&",[375,403,404],{"class":381}," time",[375,406,407],{"class":385}," ./8-ImagePlotting/gray_scott_image -i output.h5 -o pics/\n",[76,409,410],{},"The Turing patterns appear — the visual reward of the day:",[76,412,413],{},[414,415],"img",{"alt":416,"src":417},"Gray-Scott simulation output (course frame)","/school/day-2/gray-scott-frame.png",[76,419,420],{},[421,422,423],"em",{},"Frame produced by the course simulation — figure from the official material, © Pierre Aubert (LAPP).",[109,425,427],{"id":426},"afternoon-session-advanced-optimization-blocking-pyramid","Afternoon session — Advanced optimization: blocking & Pyramid",[114,429,431],{"id":430},"_6-blocking-cache-tiling","6. Blocking (cache tiling)",[76,433,235,434,437,438,441],{},[100,435,436],{},"14-Blocking",". On a Full HD grid, sweeping whole rows overflows the cache: every value\nis evicted before being reused. Blocking splits the domain into ",[79,439,440],{},"tiles"," sized for a cache\nlevel, and finishes each tile before moving on.",[443,444],"d2-blocking",{},[76,446,447],{},[414,448],{"alt":449,"src":450},"Official domain decomposition into blocks with halos (PerformanceWithStencil course)","/school/day-2/block-decomposition.png",[76,452,453],{},[421,454,455],{},"The course block decomposition: four families of blocks, each with its read halo — figure from\nthe official material, © Pierre Aubert (LAPP).",[114,457,459],{"id":458},"_7-the-pyramid-space-time-tiling","7. The Pyramid — space-time tiling",[76,461,235,462,465,466,469,470,473,474,477],{},[100,463,464],{},"15-AdvancedBlocking",", the summit of the day. Blocking tiles space; the ",[79,467,468],{},"Pyramid","\ntiles space ",[79,471,472],{},"and time",": the cached tile absorbs several consecutive time steps (the halo\nshrinks by one cell per step — hence the pyramid shape), and the ",[79,475,476],{},"anti-pyramid"," fills the\ngaps between pyramids.",[479,480],"d2-pyramid",{},[76,482,483,484,487,488,491,492,495,496,499,500,503],{},"In the hands-on it is a real little library (",[100,485,486],{},"151-PyramidLib","): ",[100,489,490],{},"PyramidIterator",",\n",[100,493,494],{},"PyramidIdx",", ",[100,497,498],{},"AntiPyramidIdx"," — then ",[100,501,502],{},"153-SimplePyramid"," plugs it into Gray-Scott.",[76,505,506],{},[414,507],{"alt":508,"src":509},"Naive iteration (row by row) vs pyramid iteration — course figure","/school/day-2/pyramid-iteration.png",[76,511,512],{},[421,513,514],{},"Left: naive iteration sweeps the whole domain at every step; right: the actual pyramid order\nfrom the course — figure from the official material, © Pierre Aubert (LAPP).",[76,516,517,518,521,522,525,526,529,530,533],{},"The folder\neven ships ",[79,519,520],{},"auto-tuning",": ",[100,523,524],{},"scriptFindBestPyramid.sh"," sweeps pyramid sizes and keeps the best\none for ",[421,527,528],{},"your"," machine. The day's final lesson: memory traffic is amortized ",[79,531,532],{},"across\niterations",", not just across space.",[109,535,537,538],{"id":536},"the-hands-on-grayscott2026day-2","The hands-on — ",[100,539,102],{},[76,541,542,543,546,547,550],{},"The environment is pinned with ",[79,544,545],{},"pixi"," (channels ",[100,548,549],{},"prefix.dev/phoenix"," + conda-forge — gcc,\ncmake, HDF5, TBB, Phoenix libs), hence reproducible without a container:",[366,552,554],{"className":368,"code":553,"language":370,"meta":371,"style":371},"cd GrayScott2026/day-2\npixi install                      # the whole toolchain, pinned\npixi shell\nmkdir -p build && cd build\ncmake .. $(phoenixcmake-config --cmake) && make -j$(nproc)\nmake plot_all                     # runs all the performance measurements\n",[100,555,556,565,576,584,603,641],{"__ignoreMap":371},[375,557,558,562],{"class":377,"line":378},[375,559,561],{"class":560},"s2Zo4","cd",[375,563,564],{"class":396}," GrayScott2026/day-2\n",[375,566,567,569,572],{"class":377,"line":389},[375,568,545],{"class":392},[375,570,571],{"class":396}," install",[375,573,575],{"class":574},"sHwdD","                      # the whole toolchain, pinned\n",[375,577,579,581],{"class":377,"line":578},3,[375,580,545],{"class":392},[375,582,583],{"class":396}," shell\n",[375,585,587,589,592,595,597,600],{"class":377,"line":586},4,[375,588,393],{"class":392},[375,590,591],{"class":396}," -p",[375,593,594],{"class":396}," build",[375,596,401],{"class":400},[375,598,599],{"class":560}," cd",[375,601,602],{"class":396}," build\n",[375,604,606,609,612,615,618,621,624,626,629,632,635,638],{"class":377,"line":605},5,[375,607,608],{"class":392},"cmake",[375,610,611],{"class":396}," ..",[375,613,614],{"class":400}," $(",[375,616,617],{"class":392},"phoenixcmake-config",[375,619,620],{"class":396}," --cmake",[375,622,623],{"class":400},")",[375,625,401],{"class":400},[375,627,628],{"class":392}," make",[375,630,631],{"class":396}," -j",[375,633,634],{"class":400},"$(",[375,636,637],{"class":392},"nproc",[375,639,640],{"class":400},")\n",[375,642,644,647,650],{"class":377,"line":643},6,[375,645,646],{"class":392},"make",[375,648,649],{"class":396}," plot_all",[375,651,652],{"class":574},"                     # runs all the performance measurements\n",[76,654,655,656,659,660,663,664,667,668,495,671,674,675,678],{},"Official alternative: the course ",[79,657,658],{},"apptainer containers","\n(",[100,661,662],{},"performancewithstencil_cpu_job","). The repo's ",[100,665,666],{},"GPU/"," folder exists but is not on today's\nprogram; ",[100,669,670],{},"TBB/",[100,672,673],{},"27-Deliverable"," and ",[100,676,677],{},"29-DistributedComputing"," are exploited later in the\nschool.",[109,680,682],{"id":681},"on-video-the-official-replays","On video — the official replays",[76,684,685,686,689],{},"Two episodes of the ",[79,687,688],{},"Gray Scott Thursdays"," (the school's webinar series) cover exactly\ntoday's material:",[691,692],"yt-embed",{"caption":693,"id":694,"title":695},"Replay — Modern C++ CPU computing with std::algorithm (Gray Scott Thursdays)","HwxGGAOpUAo","Modern C++ CPU computing with std::algorithm",[691,697],{"caption":698,"id":699,"title":700},"Replay — Memory Allocations: the real cost of memory (Gray Scott Thursdays)","NX23_VRoMXw","Memory Allocations",[109,702,704],{"id":703},"sources-official-material","Sources & official material",[706,707,708,722,749,765,779],"ul",{},[709,710,711,714,715],"li",{},[79,712,713],{},"The online course"," (CPU chapters 1 → 20: measurement, layout, vectorization, blocking,\nPyramid, valgrind/kcachegrind, OpenMP, TBB):\n",[716,717,721],"a",{"href":718,"rel":719},"https://cta-lapp.pages.in2p3.fr/COURS/PerformanceWithStencil/",[720],"nofollow","cta-lapp.pages.in2p3.fr/COURS/PerformanceWithStencil",[709,723,724,727,728,733,734,733,739,733,744],{},[79,725,726],{},"The day's slides"," (PDF, school GitLab wiki):\n",[716,729,732],{"href":730,"rel":731},"https://gitlab.in2p3.fr/CTA-LAPP/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/-/wikis/uploads/GrayScottDay-2/3-lecture_presentation_gray_scott_cpp.pdf",[720],"morning C++ lecture"," ·\n",[716,735,738],{"href":736,"rel":737},"https://gitlab.in2p3.fr/CTA-LAPP/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/-/wikis/uploads/GrayScottDay-2/4-lecture_blocking.pdf",[720],"blocking",[716,740,743],{"href":741,"rel":742},"https://gitlab.in2p3.fr/CTA-LAPP/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/-/wikis/uploads/GrayScottDay-2/5-lecture_simpler_advanced_blocking.pdf",[720],"simpler advanced blocking",[716,745,748],{"href":746,"rel":747},"https://gitlab.in2p3.fr/CTA-LAPP/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/-/wikis/uploads/GrayScottDay-2/2026-06-gray-scott-blocking.pdf",[720],"gray-scott-blocking (June 2026)",[709,750,751,754,755,733,760],{},[79,752,753],{},"GitLab repositories",":\n",[716,756,759],{"href":757,"rel":758},"https://gitlab.in2p3.fr/CTA-LAPP/COURS/PerformanceWithStencil",[720],"PerformanceWithStencil (hands-on code)",[716,761,764],{"href":762,"rel":763},"https://gitlab.in2p3.fr/CTA-LAPP/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026",[720],"GrayScott2026 (the school)",[709,766,767,754,770,733,774],{},[79,768,769],{},"Video replays (YouTube)",[716,771,688],{"href":772,"rel":773},"https://www.youtube.com/playlist?list=PLiZttWgOMudb6PsUoWtxY3G4Gv8f2lurG",[720],[716,775,778],{"href":776,"rel":777},"https://www.youtube.com/watch?v=ILu6hCSGEMY&list=PLiZttWgOMudYvZkFakaN47nL2RqNR5TvT",[720],"2025 replays",[709,780,781,754,784,733,789,733,794],{},[79,782,783],{},"The environment",[716,785,788],{"href":786,"rel":787},"https://prefix.dev/phoenix",[720],"phoenix pixi channel",[716,790,793],{"href":791,"rel":792},"https://cta-lapp.pages.in2p3.fr/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/redirection.html?label=partContainerListTimeTable",[720],"course containers",[716,795,798],{"href":796,"rel":797},"https://cta-lapp.pages.in2p3.fr/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/index.html",[720],"school website",[800,801,802],"style",{},"html pre.shiki code .sbssI, html code.shiki .sbssI{--shiki-light:#F76D47;--shiki-default:#F78C6C;--shiki-dark:#F78C6C}html pre.shiki code .sTEyZ, html code.shiki .sTEyZ{--shiki-light:#90A4AE;--shiki-default:#EEFFFF;--shiki-dark:#BABED8}html pre.shiki code .sBMFI, html code.shiki .sBMFI{--shiki-light:#E2931D;--shiki-default:#FFCB6B;--shiki-dark:#FFCB6B}html pre.shiki code .sfazB, html code.shiki .sfazB{--shiki-light:#91B859;--shiki-default:#C3E88D;--shiki-dark:#C3E88D}html pre.shiki code .sMK4o, html code.shiki .sMK4o{--shiki-light:#39ADB5;--shiki-default:#89DDFF;--shiki-dark:#89DDFF}html .light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html.light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .s2Zo4, html code.shiki .s2Zo4{--shiki-light:#6182B8;--shiki-default:#82AAFF;--shiki-dark:#82AAFF}html pre.shiki code .sHwdD, html code.shiki .sHwdD{--shiki-light:#90A4AE;--shiki-light-font-style:italic;--shiki-default:#546E7A;--shiki-default-font-style:italic;--shiki-dark:#676E95;--shiki-dark-font-style:italic}",{"title":371,"searchDepth":389,"depth":389,"links":804},[805,812,816,818,819],{"id":111,"depth":389,"text":112,"children":806},[807,808,809,810,811],{"id":116,"depth":578,"text":117},{"id":210,"depth":578,"text":211},{"id":231,"depth":578,"text":232},{"id":262,"depth":578,"text":263},{"id":332,"depth":578,"text":333},{"id":426,"depth":389,"text":427,"children":813},[814,815],{"id":430,"depth":578,"text":431},{"id":458,"depth":578,"text":459},{"id":536,"depth":389,"text":817},"The hands-on — GrayScott2026/day-2/",{"id":681,"depth":389,"text":682},{"id":703,"depth":389,"text":704},"Two sessions on June 23: C++ 17/20/23 on CPU in the morning, advanced optimization (blocking & Pyramid) in the afternoon. Measure, understand the stencil, exploit the cache.","md",[823],{"label":824,"icon":825,"to":718,"target":826},"Online course","i-lucide-graduation-cap","_blank",{"icon":828},"lucide:square-code",true,{"title":19,"description":820},"jEbMSoQPJ_vKmi60efBqSkItx2y67fu--K80KLlRRZA",[833,835],{"title":15,"path":16,"stem":17,"description":834,"children":-1},"The vocabulary and principles. Optimizing means understanding how the hardware works and locating where time is lost.",{"title":23,"path":24,"stem":25,"description":836,"children":-1},"June 24, with Vincent Lafage: Fortran 2018 on CPU all day — the language of arrays, floating-point precision, the flags exercise, and the Gray-Scott solver in modern Fortran.",1783172490753]