[{"data":1,"prerenderedAt":682},["ShallowReactive",2],{"navigation_docs_en":3,"-en-gray-scott-school-jour-5":66,"-en-gray-scott-school-jour-5-surround":677},[4,50,60],{"title":5,"path":6,"stem":7,"children":8},"The Gray Scott School","/en/gray-scott-school","en/1.gray-scott-school/01.index",[9,10,14,18,22,26,30,34,38,42,46],{"title":5,"path":6,"stem":7},{"title":11,"path":12,"stem":13},"CINERI Presentation","/en/gray-scott-school/presentation-cineri","en/1.gray-scott-school/02.presentation-cineri",{"title":15,"path":16,"stem":17},"Day 1 — Foundations","/en/gray-scott-school/jour-1","en/1.gray-scott-school/03.jour-1",{"title":19,"path":20,"stem":21},"Day 2 — C++ on CPU","/en/gray-scott-school/jour-2","en/1.gray-scott-school/04.jour-2",{"title":23,"path":24,"stem":25},"Day 3 — Fortran on CPU","/en/gray-scott-school/jour-3","en/1.gray-scott-school/05.jour-3",{"title":27,"path":28,"stem":29},"Day 4 — Kokkos on CPU","/en/gray-scott-school/jour-4","en/1.gray-scott-school/06.jour-4",{"title":31,"path":32,"stem":33},"Day 5 — Python on CPU","/en/gray-scott-school/jour-5","en/1.gray-scott-school/07.jour-5",{"title":35,"path":36,"stem":37},"Day 6 — SIMD with EVE + GPU architecture","/en/gray-scott-school/jour-6","en/1.gray-scott-school/08.jour-6",{"title":39,"path":40,"stem":41},"Day 7 — Python on GPU","/en/gray-scott-school/jour-7","en/1.gray-scott-school/09.jour-7",{"title":43,"path":44,"stem":45},"Day 8 — Fortran on GPU","/en/gray-scott-school/jour-8","en/1.gray-scott-school/10.jour-8",{"title":47,"path":48,"stem":49},"Day 9 — Kokkos on GPU","/en/gray-scott-school/jour-9","en/1.gray-scott-school/11.jour-9",{"title":51,"path":52,"stem":53,"children":54},"Projects","/en/projets","en/2.projets/1.index",[55,56],{"title":51,"path":52,"stem":53},{"title":57,"path":58,"stem":59},"SenLand","/en/projets/senland","en/2.projets/2.senland",{"title":61,"path":62,"stem":63,"children":64},"About","/en/a-propos","en/3.a-propos/1.index",[65],{"title":61,"path":62,"stem":63},{"id":67,"title":31,"badge":68,"body":69,"category":68,"description":666,"extension":667,"links":668,"meta":673,"navigation":492,"path":32,"seo":675,"stem":33,"tags":68,"__hash__":676},"docs_en/en/1.gray-scott-school/07.jour-5.md",null,{"type":70,"value":71,"toc":647},"minimark",[72,115,120,130,159,163,190,194,212,216,220,234,238,276,279,305,309,323,327,337,405,408,416,420,430,437,440,517,539,543,549,553,643],[73,74,75],"blockquote",{},[76,77,78,82,83,86,87,86,90,93,94,97,98,101,102,106,107,110,111,114],"p",{},[79,80,81],"strong",{},"June 26, 2026"," · Speakers: ",[79,84,85],{},"Alice Faure",", ",[79,88,89],{},"Jean-Marc Colley",[79,91,92],{},"Sébastien Valat"," &\n",[79,95,96],{},"Nabil Garroum"," · Marcel Vivargent Auditorium + satellite sites (including CINERI). The\nhands-on is a series of ",[79,99,100],{},"six Jupyter notebooks"," (",[103,104,105],"code",{},"GrayScott2026/day-5/CPU/tutorial/",",\nsolutions included) — the ",[103,108,109],{},"GPU/"," folder waits for ",[79,112,113],{},"Day 7",".",[116,117,119],"h2",{"id":118},"morning-session-measure-then-vectorize","Morning session — measure, then vectorize",[121,122,124,125,129],"h3",{"id":123},"_1-profile-first-time-and-memory","1. Profile first — time ",[126,127,128],"em",{},"and"," memory",[76,131,132,133,136,137,86,140,143,144,150,151,154,155,158],{},"True to Day 1's rule, the first notebook (",[103,134,135],{},"1_Optimization",") optimizes nothing: it measures.\nTiming (",[103,138,139],{},"timeit",[103,141,142],{},"cProfile","), then — the day's specialty, the slides are literally called\n",[126,145,146,147],{},"gray-scott-python-",[79,148,149],{},"mem"," — ",[79,152,153],{},"memory profiling"," with ",[103,156,157],{},"tracemalloc",": in Python every\ntemporary array is an allocation, and the naive Gray-Scott creates several per time step.",[121,160,162],{"id":161},"_2-the-gil-why-threads-do-not-save-python","2. The GIL — why threads do not save Python",[76,164,165,166,169,170,173,174,177,178,181,182,185,186,189],{},"The ",[79,167,168],{},"Global Interpreter Lock"," serializes ",[126,171,172],{},"pure"," Python threads. Real CPU parallelism goes\nthrough libraries that ",[79,175,176],{},"release the GIL"," during native computation — NumPy, Numba's\n",[103,179,180],{},"prange",", XLA under JAX — or through ",[103,183,184],{},"multiprocessing",". None of today's speedups fights the\nGIL: they all dive ",[126,187,188],{},"below"," it, into compiled code.",[121,191,193],{"id":192},"_3-from-loops-to-arrays","3. From loops to arrays",[76,195,196,197,200,201,204,205,101,208,211],{},"Notebooks ",[103,198,199],{},"2_Numpy"," and ",[103,202,203],{},"3_Python_Implementation",": the Laplacian is written with NumPy\n",[79,206,207],{},"slices",[103,209,210],{},"u[:-2, 1:-1] + u[2:, 1:-1] + … − 4*u[1:-1, 1:-1]",") — zero Python loops, the\niterations move into NumPy's C loops. This is the day's benchmark baseline.",[116,213,215],{"id":214},"afternoon-session-compiling-python","Afternoon session — compiling Python",[121,217,219],{"id":218},"_4-numba-compile-the-loop-you-already-have","4. Numba — compile the loop you already have",[76,221,222,223,226,227,230,231,233],{},"Notebook ",[103,224,225],{},"4_Numba_Implementation",": keep the explicit loop, add ",[103,228,229],{},"@njit",", and LLVM compiles the\nfunction on first run. Ideal when the algorithm naturally is a loop (stencils!) — and\n",[103,232,180],{}," parallelizes it.",[121,235,237],{"id":236},"_5-jax-trace-then-let-xla-fuse","5. JAX — trace, then let XLA fuse",[76,239,222,240,243,244,247,248,251,252,255,256,150,259,86,262,86,265,268,269,271,272,275],{},[103,241,242],{},"5_JAX",", the day's centerpiece. JAX trades discipline for speed: ",[79,245,246],{},"immutable","\narrays (",[103,249,250],{},"u.at[i, j].set(v)"," instead of assignment), ",[79,253,254],{},"no index checking"," (silent errors\nlurk), and above all ",[79,257,258],{},"composable transformations",[103,260,261],{},"jit",[103,263,264],{},"vmap",[103,266,267],{},"grad"," — that only work\non ",[79,270,172],{}," functions. The machinery behind ",[103,273,274],{},"jax.jit",":",[277,278],"d5-trace",{},[76,280,281,282,285,286,289,290,86,293,296,297,300,301,304],{},"The notebook details the constraints: static arguments to declare (",[103,283,284],{},"static_argnums","), fixed\nshapes (every new shape re-traces), debugging via ",[103,287,288],{},"jax.debug",", and the control-flow operators\n(",[103,291,292],{},"lax.cond",[103,294,295],{},"lax.fori_loop",") that replace ",[103,298,299],{},"if","/",[103,302,303],{},"for"," inside traced code.",[121,306,308],{"id":307},"_6-porting-gray-scott-to-jax","6. Porting Gray-Scott to JAX",[76,310,222,311,314,315,318,319,322],{},[103,312,313],{},"6_JAX_Implementation",": two competing versions — the ",[79,316,317],{},"generic stencil"," (any 3×3\nconvolution) and the ",[79,320,321],{},"specialized 3×3 stencil"," (the nine terms written by hand, which XLA\nfuses into a single kernel). They are the benchmark's two JAX columns.",[121,324,326],{"id":325},"_7-the-verdict-one-gray-scott-four-speeds","7. The verdict — one Gray-Scott, four speeds",[76,328,329,330,333,334,275],{},"Official numbers from the repo (",[103,331,332],{},"CPU/Benchmarks.md","), ",[79,335,336],{},"32×1000 iterations",[338,339,340,363],"table",{},[341,342,343],"thead",{},[344,345,346,350,354,357,360],"tr",{},[347,348,349],"th",{},"CPU",[347,351,353],{"align":352},"center","NumPy",[347,355,356],{"align":352},"Numba",[347,358,359],{"align":352},"JAX (generic)",[347,361,362],{"align":352},"JAX (3×3)",[364,365,366,386],"tbody",{},[344,367,368,372,375,378,381],{},[369,370,371],"td",{},"Intel Xeon Silver 4210R",[369,373,374],{"align":352},"7800 s",[369,376,377],{"align":352},"3257 s",[369,379,380],{"align":352},"1031 s",[369,382,383],{"align":352},[79,384,385],{},"377 s",[344,387,388,391,394,397,400],{},[369,389,390],{},"AMD EPYC 7313",[369,392,393],{"align":352},"2545 s",[369,395,396],{"align":352},"1219 s",[369,398,399],{"align":352},"386 s",[369,401,402],{"align":352},[79,403,404],{},"141 s",[406,407],"d5-ladder",{},[409,410],"gs-bar-chart",{":categories":411,":series":412,"note":413,"title":414,"unit":415},"[\"NumPy\",\"Numba\",\"JAX (generic)\",\"JAX (3×3)\"]","[{\"name\":\"Xeon 4210R\",\"values\":[7800,3257,1031,377]},{\"name\":\"EPYC 7313\",\"values\":[2545,1219,386,141]}]","Official numbers from the course repo (CPU/Benchmarks.md). Shorter = better.","Gray-Scott Python: NumPy / Numba / JAX (32×1000 iterations)"," s",[121,417,419],{"id":418},"_8-the-bridge-to-the-gpu","8. The bridge to the GPU",[76,421,422,423,425,426,429],{},"This implementation is the reference that ",[79,424,113],{}," ports to the accelerator: JAX replays the\nsame code on GPU, joined by CuPy and cuNumeric. It is also exactly the approach of the\n",[427,428,57],"a",{"href":58}," project — profile PyTorch, port to JAX, compare honestly.",[116,431,433,434],{"id":432},"the-hands-on-grayscott2026day-5cpu","The hands-on — ",[103,435,436],{},"GrayScott2026/day-5/CPU/",[76,438,439],{},"Three ways to follow, your pick:",[441,442,447],"pre",{"className":443,"code":444,"language":445,"meta":446,"style":446},"language-bash shiki shiki-themes material-theme-lighter material-theme material-theme-palenight","# 1) locally, environment pinned by pixi\ngit clone https://gitlab.in2p3.fr/alice.faure/gray-scott-python.git\npixi run jupyter-lab           # opens the tutorial/ notebooks\n\n# 2) on the MUST cluster (LAPP): https://jupyter.must-dc.cloud\n#    → \"Gray-Scott Revolutions\" → \"Python CPU\"\n\n# 3) in a container (apptainer / podman / docker): the course vscode image\n","bash","",[103,448,449,458,472,487,494,500,506,511],{"__ignoreMap":446},[450,451,454],"span",{"class":452,"line":453},"line",1,[450,455,457],{"class":456},"sHwdD","# 1) locally, environment pinned by pixi\n",[450,459,461,465,469],{"class":452,"line":460},2,[450,462,464],{"class":463},"sBMFI","git",[450,466,468],{"class":467},"sfazB"," clone",[450,470,471],{"class":467}," https://gitlab.in2p3.fr/alice.faure/gray-scott-python.git\n",[450,473,475,478,481,484],{"class":452,"line":474},3,[450,476,477],{"class":463},"pixi",[450,479,480],{"class":467}," run",[450,482,483],{"class":467}," jupyter-lab",[450,485,486],{"class":456},"           # opens the tutorial/ notebooks\n",[450,488,490],{"class":452,"line":489},4,[450,491,493],{"emptyLinePlaceholder":492},true,"\n",[450,495,497],{"class":452,"line":496},5,[450,498,499],{"class":456},"# 2) on the MUST cluster (LAPP): https://jupyter.must-dc.cloud\n",[450,501,503],{"class":452,"line":502},6,[450,504,505],{"class":456},"#    → \"Gray-Scott Revolutions\" → \"Python CPU\"\n",[450,507,509],{"class":452,"line":508},7,[450,510,493],{"emptyLinePlaceholder":492},[450,512,514],{"class":452,"line":513},8,[450,515,516],{"class":456},"# 3) in a container (apptainer / podman / docker): the course vscode image\n",[76,518,519,522,523,526,527,530,531,534,535,538],{},[103,520,521],{},"tutorial/"," sets the exercises, ",[103,524,525],{},"solutions/"," corrects them, ",[103,528,529],{},"scripts/gray_scott_utils.py","\nprovides shared I/O, and ",[103,532,533],{},"results/"," ships a reference simulation (",[103,536,537],{},"simulation.h5"," + video).",[116,540,542],{"id":541},"on-video-the-official-replay","On video — the official replay",[544,545],"yt-embed",{"caption":546,"id":547,"title":548},"Replay — Python On CPU (Gray Scott Thursdays)","ldWlh6r0bOw","Python On CPU",[116,550,552],{"id":551},"sources-official-material","Sources & official material",[554,555,556,569,580,591,601,623,633],"ul",{},[557,558,559,562,563],"li",{},[79,560,561],{},"The course repository"," (notebooks + solutions + benchmarks):\n",[427,564,568],{"href":565,"rel":566},"https://gitlab.in2p3.fr/alice.faure/gray-scott-python",[567],"nofollow","gitlab.in2p3.fr/alice.faure/gray-scott-python",[557,570,571,574,575],{},[79,572,573],{},"The day's slides"," (PDF, school GitLab wiki):\n",[427,576,579],{"href":577,"rel":578},"https://gitlab.in2p3.fr/CTA-LAPP/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/-/wikis/uploads/GrayScottDay-5/2026-06-gray-scott-python-mem.pdf",[567],"2026-06-gray-scott-python-mem.pdf",[557,581,582,585,586],{},[79,583,584],{},"The MUST cluster Jupyter platform",":\n",[427,587,590],{"href":588,"rel":589},"https://jupyter.must-dc.cloud",[567],"jupyter.must-dc.cloud",[557,592,593,585,596],{},[79,594,595],{},"The school's base container",[427,597,600],{"href":598,"rel":599},"https://gitlab.in2p3.fr/CTA-LAPP/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScottBaseContainer",[567],"GrayScottBaseContainer",[557,602,603,585,606,611,612,617,618],{},[79,604,605],{},"The libraries",[427,607,610],{"href":608,"rel":609},"https://numpy.org/",[567],"numpy.org"," · ",[427,613,616],{"href":614,"rel":615},"https://numba.pydata.org/",[567],"numba.pydata.org"," ·\n",[427,619,622],{"href":620,"rel":621},"https://docs.jax.dev/",[567],"docs.jax.dev",[557,624,625,585,628],{},[79,626,627],{},"Video replays (YouTube)",[427,629,632],{"href":630,"rel":631},"https://www.youtube.com/playlist?list=PLiZttWgOMudb6PsUoWtxY3G4Gv8f2lurG",[567],"Gray Scott Thursdays",[557,634,635,585,638],{},[79,636,637],{},"School website",[427,639,642],{"href":640,"rel":641},"https://cta-lapp.pages.in2p3.fr/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/index.html",[567],"GrayScott2026",[644,645,646],"style",{},"html pre.shiki code .sHwdD, html code.shiki .sHwdD{--shiki-light:#90A4AE;--shiki-light-font-style:italic;--shiki-default:#546E7A;--shiki-default-font-style:italic;--shiki-dark:#676E95;--shiki-dark-font-style:italic}html pre.shiki code .sBMFI, html code.shiki .sBMFI{--shiki-light:#E2931D;--shiki-default:#FFCB6B;--shiki-dark:#FFCB6B}html pre.shiki code .sfazB, html code.shiki .sfazB{--shiki-light:#91B859;--shiki-default:#C3E88D;--shiki-dark:#C3E88D}html .light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html.light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":446,"searchDepth":460,"depth":460,"links":648},[649,655,662,664,665],{"id":118,"depth":460,"text":119,"children":650},[651,653,654],{"id":123,"depth":474,"text":652},"1. Profile first — time and memory",{"id":161,"depth":474,"text":162},{"id":192,"depth":474,"text":193},{"id":214,"depth":460,"text":215,"children":656},[657,658,659,660,661],{"id":218,"depth":474,"text":219},{"id":236,"depth":474,"text":237},{"id":307,"depth":474,"text":308},{"id":325,"depth":474,"text":326},{"id":418,"depth":474,"text":419},{"id":432,"depth":460,"text":663},"The hands-on — GrayScott2026/day-5/CPU/",{"id":541,"depth":460,"text":542},{"id":551,"depth":460,"text":552},"June 26, with Alice Faure, Jean-Marc Colley, Sébastien Valat and Nabil Garroum: profile Python, vectorize with NumPy, compile with Numba, then trace with JAX — up to ×18 without leaving Python.","md",[669],{"label":670,"icon":671,"to":565,"target":672},"Course repository","i-lucide-git-branch","_blank",{"icon":674},"lucide:braces",{"title":31,"description":666},"-WrAFrGhf4xVbEtUuQ2iu-q8f2XYMTP_-ffmyGqnPCY",[678,680],{"title":27,"path":28,"stem":29,"description":679,"children":-1},"June 25, with Paul Zehner, Juan-José Silva Cuevas and Thomas Padioleau: Kokkos on CPU — one C++ source, backends chosen at compile time, Views, parallel_for and SIMD.",{"title":35,"path":36,"stem":37,"description":681,"children":-1},"June 29, two sessions: Joël Falcou opens the week with EVE and Kiwaku (explicit, portable C++20 SIMD), Pierre Aubert follows with the GPU architecture that carries the last three days.",1783172490754]