[{"data":1,"prerenderedAt":629},["ShallowReactive",2],{"navigation_docs_en":3,"-en-gray-scott-school-jour-7":66,"-en-gray-scott-school-jour-7-surround":624},[4,50,60],{"title":5,"path":6,"stem":7,"children":8},"The Gray Scott School","/en/gray-scott-school","en/1.gray-scott-school/01.index",[9,10,14,18,22,26,30,34,38,42,46],{"title":5,"path":6,"stem":7},{"title":11,"path":12,"stem":13},"CINERI Presentation","/en/gray-scott-school/presentation-cineri","en/1.gray-scott-school/02.presentation-cineri",{"title":15,"path":16,"stem":17},"Day 1 — Foundations","/en/gray-scott-school/jour-1","en/1.gray-scott-school/03.jour-1",{"title":19,"path":20,"stem":21},"Day 2 — C++ on CPU","/en/gray-scott-school/jour-2","en/1.gray-scott-school/04.jour-2",{"title":23,"path":24,"stem":25},"Day 3 — Fortran on CPU","/en/gray-scott-school/jour-3","en/1.gray-scott-school/05.jour-3",{"title":27,"path":28,"stem":29},"Day 4 — Kokkos on CPU","/en/gray-scott-school/jour-4","en/1.gray-scott-school/06.jour-4",{"title":31,"path":32,"stem":33},"Day 5 — Python on CPU","/en/gray-scott-school/jour-5","en/1.gray-scott-school/07.jour-5",{"title":35,"path":36,"stem":37},"Day 6 — SIMD with EVE + GPU architecture","/en/gray-scott-school/jour-6","en/1.gray-scott-school/08.jour-6",{"title":39,"path":40,"stem":41},"Day 7 — Python on GPU","/en/gray-scott-school/jour-7","en/1.gray-scott-school/09.jour-7",{"title":43,"path":44,"stem":45},"Day 8 — Fortran on GPU","/en/gray-scott-school/jour-8","en/1.gray-scott-school/10.jour-8",{"title":47,"path":48,"stem":49},"Day 9 — Kokkos on GPU","/en/gray-scott-school/jour-9","en/1.gray-scott-school/11.jour-9",{"title":51,"path":52,"stem":53,"children":54},"Projects","/en/projets","en/2.projets/1.index",[55,56],{"title":51,"path":52,"stem":53},{"title":57,"path":58,"stem":59},"SenLand","/en/projets/senland","en/2.projets/2.senland",{"title":61,"path":62,"stem":63,"children":64},"About","/en/a-propos","en/3.a-propos/1.index",[65],{"title":61,"path":62,"stem":63},{"id":67,"title":39,"badge":68,"body":69,"category":68,"description":612,"extension":613,"links":614,"meta":619,"navigation":621,"path":40,"seo":622,"stem":41,"tags":68,"__hash__":623},"docs_en/en/1.gray-scott-school/09.jour-7.md",null,{"type":70,"value":71,"toc":600},"minimark",[72,111,116,123,126,147,151,154,158,173,208,215,219,237,241,251,306,317,321,332,376,384,394,400,495,510,514,520,524,596],[73,74,75],"blockquote",{},[76,77,78,82,83,86,87,86,90,93,94,97,98,101,102,106,107,110],"p",{},[79,80,81],"strong",{},"June 30, 2026"," · Speakers: ",[79,84,85],{},"Alice Faure",", ",[79,88,89],{},"Jean-Marc Colley",[79,91,92],{},"Sébastien Valat"," &\n",[79,95,96],{},"Nabil Garroum"," · four Python-GPU sessions in a row (CuPy → cuPyNumeric → ",[79,99,100],{},"JAX at\n2 pm"," → wrap-up) · Marcel Vivargent Auditorium + satellites (including CINERI). The\nhands-on lives in ",[103,104,105],"code",{},"GrayScott2026/day-5/GPU/"," — three tutorials + solutions, and the\n",[79,108,109],{},"official A100 benchmarks",".",[112,113,115],"h2",{"id":114},"_1-what-really-changes-the-pcie-toll-booth","1. What really changes: the PCIe toll booth",[76,117,118,119,122],{},"Day 5's array-first code transposes almost as-is — what changes is the ",[79,120,121],{},"memory geography",".\nThe GPU computes at ~2 TB/s but is fed through a ~32 GB/s pipe:",[124,125],"d7-transfer",{},[76,127,128,129,132,133,136,137,140,141,136,144,146],{},"The whole day applies the same rule: ",[103,130,131],{},"cp.asarray"," / ",[103,134,135],{},"jax.device_put"," ",[79,138,139],{},"once"," at the start,\nthe whole time loop on the device, ",[103,142,143],{},"asnumpy",[79,145,139],{}," at the end.",[112,148,150],{"id":149},"_2-three-routes-to-the-same-gpu","2. Three routes to the same GPU",[152,153],"d7-stack",{},[112,155,157],{"id":156},"cupy-session-numpy-on-cuda-no-rewrite","CuPy session — NumPy on CUDA, no rewrite",[76,159,160,161,164,165,168,169,172],{},"Tutorial ",[103,162,163],{},"3_Python_GPU_Cupy.md",": CuPy mirrors the NumPy API on CUDA — swapping ",[103,166,167],{},"numpy"," for\n",[103,170,171],{},"cupy"," runs the same stencil on the GPU.",[174,175,180],"pre",{"className":176,"code":177,"language":178,"meta":179,"style":179},"language-python shiki shiki-themes material-theme-lighter material-theme material-theme-palenight","import cupy as cp\nu = cp.asarray(u_host)      # host → device, ONCE\n# … same stencil expressions as NumPy …\nu_host = cp.asnumpy(u)      # device → host, only when needed\n","python","",[103,181,182,190,196,202],{"__ignoreMap":179},[183,184,187],"span",{"class":185,"line":186},"line",1,[183,188,189],{},"import cupy as cp\n",[183,191,193],{"class":185,"line":192},2,[183,194,195],{},"u = cp.asarray(u_host)      # host → device, ONCE\n",[183,197,199],{"class":185,"line":198},3,[183,200,201],{},"# … same stencil expressions as NumPy …\n",[183,203,205],{"class":185,"line":204},4,[183,206,207],{},"u_host = cp.asnumpy(u)      # device → host, only when needed\n",[76,209,210,211,214],{},"Hands-on bonus: CuPy is the only version with ",[79,212,213],{},"parallel HDF5 I/O"," implemented — writing the\n1000 images while the GPU computes saves ~4 s (12 s total on A100).",[112,216,218],{"id":217},"cupynumeric-session-distributed-numpy","cuPyNumeric session — distributed NumPy",[76,220,160,221,224,225,228,229,232,233,236],{},[103,222,223],{},"2_Python_GPU_cuPyNumeric.md",": cuPyNumeric (NVIDIA, ",[79,226,227],{},"Legate"," engine) runs NumPy\ncode on several GPUs and several nodes ",[79,230,231],{},"without MPI and without rewriting"," — the same\nscript, a bigger machine. The price of generality shows in the benchmark: the generic\n",[103,234,235],{},"convolve"," version is the slowest of the field (128 s).",[112,238,240],{"id":239},"jax-session-2-pm-day-5s-code-re-jitted","JAX session (2 pm) — Day 5's code, re-jitted",[76,242,160,243,246,247,250],{},[103,244,245],{},"1_Python_GPU_JAX.md",": Day 5's JAX Gray-Scott replays ",[79,248,249],{},"unchanged"," — XLA compiles\nthe traced stencil into a CUDA kernel, and JAX places arrays on the device by default. The\nhands-on solutions show the three tools that make the difference:",[252,253,254,267],"table",{},[255,256,257],"thead",{},[258,259,260,264],"tr",{},[261,262,263],"th",{},"Hands-on solution",[261,265,266],{},"What it teaches",[268,269,270,281,296],"tbody",{},[258,271,272,278],{},[273,274,275],"td",{},[103,276,277],{},"jax_vmap_solutions.py",[273,279,280],{},"vectorize a function over a whole axis (batch)",[258,282,283,288],{},[273,284,285],{},[103,286,287],{},"jax_fori_loop_solutions.py",[273,289,290,291,295],{},"fuse the time loop ",[292,293,294],"em",{},"inside"," the compiled graph",[258,297,298,303],{},[273,299,300],{},[103,301,302],{},"jax_scan_solutions.py",[273,304,305],{},"accumulate states without returning to Python between steps",[76,307,308,309,312,313,316],{},"This is exactly the toolbox of ",[310,311,57],"a",{"href":58},"'s JAX port (",[103,314,315],{},"lax.fori_loop"," to\nfuse steps, device-resident batches).",[112,318,320],{"id":319},"the-verdict-official-a100-numbers","The verdict — official A100 numbers",[76,322,323,324,327,328,331],{},"The repo's ",[103,325,326],{},"GPU/Benchmarks.md",": ",[79,329,330],{},"32×1000 iterations, 1920×1080 grid in float32",":",[252,333,334,354],{},[255,335,336],{},[258,337,338,342,345,348,351],{},[261,339,341],{"align":340},"center","cuPyNumeric (convolve)",[261,343,344],{"align":340},"JAX (generic)",[261,346,347],{"align":340},"JAX (3×3)",[261,349,350],{"align":340},"CuPy",[261,352,353],{"align":340},"PyTorch",[268,355,356],{},[258,357,358,361,364,369,373],{},[273,359,360],{"align":340},"128 s",[273,362,363],{"align":340},"47 s",[273,365,366],{"align":340},[79,367,368],{},"18 s",[273,370,371],{"align":340},[79,372,368],{},[273,374,375],{"align":340},"22 s",[377,378],"gs-bar-chart",{":categories":379,":series":380,"note":381,"title":382,"unit":383},"[\"cuPyNumeric\",\"JAX (generic)\",\"JAX (3×3)\",\"CuPy\",\"PyTorch\"]","[{\"name\":\"A100\",\"values\":[128,47,18,18,22]}]","Official numbers from the repo (GPU/Benchmarks.md). CuPy + parallel HDF5 I/O: 12 s.","Gray-Scott Python on A100 (32×1000 iterations, 1920×1080 float32)"," s",[76,385,386,387,390,391,393],{},"The loop closes: Day 5's best-CPU ",[79,388,389],{},"377 s"," drop to ",[79,392,368],{}," on A100 — ×21, still in Python.\nAnd the ranking echoes the week's lessons: stencil specialization (Day 5) and data residency\n(today) weigh more than the library choice.",[112,395,397,398],{"id":396},"the-hands-on-grayscott2026day-5gpu","The hands-on — ",[103,399,105],{},[174,401,405],{"className":402,"code":403,"language":404,"meta":179,"style":179},"language-bash shiki shiki-themes material-theme-lighter material-theme material-theme-palenight","# Locally (NVIDIA, CUDA ≥ 12, Python 3.10-3.12)\ngit clone https://gitlab.in2p3.fr/alice.faure/gray-scott-python.git\npython -m venv gpu-env && source gpu-env/bin/activate\npip install h5py opencv-python numpy matplotlib scipy \\\n            \"jax[cuda12]\" cupy-cuda12x nvidia-cupynumeric\n","bash",[103,406,407,413,426,450,477],{"__ignoreMap":179},[183,408,409],{"class":185,"line":186},[183,410,412],{"class":411},"sHwdD","# Locally (NVIDIA, CUDA ≥ 12, Python 3.10-3.12)\n",[183,414,415,419,423],{"class":185,"line":192},[183,416,418],{"class":417},"sBMFI","git",[183,420,422],{"class":421},"sfazB"," clone",[183,424,425],{"class":421}," https://gitlab.in2p3.fr/alice.faure/gray-scott-python.git\n",[183,427,428,430,433,436,439,443,447],{"class":185,"line":198},[183,429,178],{"class":417},[183,431,432],{"class":421}," -m",[183,434,435],{"class":421}," venv",[183,437,438],{"class":421}," gpu-env",[183,440,442],{"class":441},"sMK4o"," &&",[183,444,446],{"class":445},"s2Zo4"," source",[183,448,449],{"class":421}," gpu-env/bin/activate\n",[183,451,452,455,458,461,464,467,470,473],{"class":185,"line":204},[183,453,454],{"class":417},"pip",[183,456,457],{"class":421}," install",[183,459,460],{"class":421}," h5py",[183,462,463],{"class":421}," opencv-python",[183,465,466],{"class":421}," numpy",[183,468,469],{"class":421}," matplotlib",[183,471,472],{"class":421}," scipy",[183,474,476],{"class":475},"sTEyZ"," \\\n",[183,478,480,483,486,489,492],{"class":185,"line":479},5,[183,481,482],{"class":441},"            \"",[183,484,485],{"class":421},"jax[cuda12]",[183,487,488],{"class":441},"\"",[183,490,491],{"class":421}," cupy-cuda12x",[183,493,494],{"class":421}," nvidia-cupynumeric\n",[76,496,497,498,501,502,505,506,509],{},"Official alternatives: the course ",[79,499,500],{},"Docker"," image, or ",[79,503,504],{},"apptainer on the MUST cluster","\n(",[103,507,508],{},"Install_satellite_sites.md"," for satellite sites like CINERI). AMD: CuPy and JAX have\nexperimental ROCm routes — cuPyNumeric does not. On a small local card (GTX 1650, 4 GB),\nshrink the grid: the data-residency lesson stays identical.",[112,511,513],{"id":512},"on-video-the-official-replay","On video — the official replay",[515,516],"yt-embed",{"caption":517,"id":518,"title":519},"Replay — Python On GPU (Gray Scott Thursdays)","4RsXXTCHzLo","Python On GPU",[112,521,523],{"id":522},"sources-official-material","Sources & official material",[525,526,527,544,566,576,586],"ul",{},[528,529,530,533,534,537,538],"li",{},[79,531,532],{},"The course repository"," (",[103,535,536],{},"GPU/tutorial/"," tutorials, solutions, A100 benchmarks):\n",[310,539,543],{"href":540,"rel":541},"https://gitlab.in2p3.fr/alice.faure/gray-scott-python",[542],"nofollow","gitlab.in2p3.fr/alice.faure/gray-scott-python",[528,545,546,549,550,555,556,555,561],{},[79,547,548],{},"The libraries",":\n",[310,551,554],{"href":552,"rel":553},"https://docs.cupy.dev/",[542],"docs.cupy.dev"," ·\n",[310,557,560],{"href":558,"rel":559},"https://docs.nvidia.com/cupynumeric/latest/",[542],"docs.nvidia.com/cupynumeric",[310,562,565],{"href":563,"rel":564},"https://docs.jax.dev/",[542],"docs.jax.dev",[528,567,568,549,571],{},[79,569,570],{},"The MUST platform",[310,572,575],{"href":573,"rel":574},"https://jupyter.must-dc.cloud",[542],"jupyter.must-dc.cloud",[528,577,578,549,581],{},[79,579,580],{},"Video replays (YouTube)",[310,582,585],{"href":583,"rel":584},"https://www.youtube.com/playlist?list=PLiZttWgOMudb6PsUoWtxY3G4Gv8f2lurG",[542],"Gray Scott Thursdays",[528,587,588,549,591],{},[79,589,590],{},"School website",[310,592,595],{"href":593,"rel":594},"https://cta-lapp.pages.in2p3.fr/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/index.html",[542],"GrayScott2026",[597,598,599],"style",{},"html .light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html.light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .sHwdD, html code.shiki .sHwdD{--shiki-light:#90A4AE;--shiki-light-font-style:italic;--shiki-default:#546E7A;--shiki-default-font-style:italic;--shiki-dark:#676E95;--shiki-dark-font-style:italic}html pre.shiki code .sBMFI, html code.shiki .sBMFI{--shiki-light:#E2931D;--shiki-default:#FFCB6B;--shiki-dark:#FFCB6B}html pre.shiki code .sfazB, html code.shiki .sfazB{--shiki-light:#91B859;--shiki-default:#C3E88D;--shiki-dark:#C3E88D}html pre.shiki code .sMK4o, html code.shiki .sMK4o{--shiki-light:#39ADB5;--shiki-default:#89DDFF;--shiki-dark:#89DDFF}html pre.shiki code .s2Zo4, html code.shiki .s2Zo4{--shiki-light:#6182B8;--shiki-default:#82AAFF;--shiki-dark:#82AAFF}html pre.shiki code .sTEyZ, html code.shiki .sTEyZ{--shiki-light:#90A4AE;--shiki-default:#EEFFFF;--shiki-dark:#BABED8}",{"title":179,"searchDepth":192,"depth":192,"links":601},[602,603,604,605,606,607,608,610,611],{"id":114,"depth":192,"text":115},{"id":149,"depth":192,"text":150},{"id":156,"depth":192,"text":157},{"id":217,"depth":192,"text":218},{"id":239,"depth":192,"text":240},{"id":319,"depth":192,"text":320},{"id":396,"depth":192,"text":609},"The hands-on — GrayScott2026/day-5/GPU/",{"id":512,"depth":192,"text":513},{"id":522,"depth":192,"text":523},"June 30, four sessions with Alice Faure, Jean-Marc Colley, Sébastien Valat and Nabil Garroum: CuPy, cuPyNumeric and JAX port Day 5's Gray-Scott to the accelerator — official A100 numbers included.","md",[615],{"label":616,"icon":617,"to":540,"target":618},"Course repository","i-lucide-git-branch","_blank",{"icon":620},"lucide:zap",true,{"title":39,"description":612},"cbZp0QA8QBzJLbuaUuU6uRO7198cUkwOQPPEV7ddaao",[625,627],{"title":35,"path":36,"stem":37,"description":626,"children":-1},"June 29, two sessions: Joël Falcou opens the week with EVE and Kiwaku (explicit, portable C++20 SIMD), Pierre Aubert follows with the GPU architecture that carries the last three days.",{"title":43,"path":44,"stem":45,"description":628,"children":-1},"Standard Fortran on the GPU via do concurrent, compared with OpenACC and OpenMP target from a single source — plus the closing polyglot session.",1783172490754]