[{"data":1,"prerenderedAt":294},["ShallowReactive",2],{"navigation_docs_en":3,"-en-gray-scott-school-jour-1":66,"-en-gray-scott-school-jour-1-surround":289},[4,50,60],{"title":5,"path":6,"stem":7,"children":8},"The Gray Scott School","/en/gray-scott-school","en/1.gray-scott-school/01.index",[9,10,14,18,22,26,30,34,38,42,46],{"title":5,"path":6,"stem":7},{"title":11,"path":12,"stem":13},"CINERI Presentation","/en/gray-scott-school/presentation-cineri","en/1.gray-scott-school/02.presentation-cineri",{"title":15,"path":16,"stem":17},"Day 1 — Foundations","/en/gray-scott-school/jour-1","en/1.gray-scott-school/03.jour-1",{"title":19,"path":20,"stem":21},"Day 2 — C++ on CPU","/en/gray-scott-school/jour-2","en/1.gray-scott-school/04.jour-2",{"title":23,"path":24,"stem":25},"Day 3 — Fortran on CPU","/en/gray-scott-school/jour-3","en/1.gray-scott-school/05.jour-3",{"title":27,"path":28,"stem":29},"Day 4 — Kokkos on CPU","/en/gray-scott-school/jour-4","en/1.gray-scott-school/06.jour-4",{"title":31,"path":32,"stem":33},"Day 5 — Python on CPU","/en/gray-scott-school/jour-5","en/1.gray-scott-school/07.jour-5",{"title":35,"path":36,"stem":37},"Day 6 — SIMD with EVE + GPU architecture","/en/gray-scott-school/jour-6","en/1.gray-scott-school/08.jour-6",{"title":39,"path":40,"stem":41},"Day 7 — Python on GPU","/en/gray-scott-school/jour-7","en/1.gray-scott-school/09.jour-7",{"title":43,"path":44,"stem":45},"Day 8 — Fortran on GPU","/en/gray-scott-school/jour-8","en/1.gray-scott-school/10.jour-8",{"title":47,"path":48,"stem":49},"Day 9 — Kokkos on GPU","/en/gray-scott-school/jour-9","en/1.gray-scott-school/11.jour-9",{"title":51,"path":52,"stem":53,"children":54},"Projects","/en/projets","en/2.projets/1.index",[55,56],{"title":51,"path":52,"stem":53},{"title":57,"path":58,"stem":59},"SenLand","/en/projets/senland","en/2.projets/2.senland",{"title":61,"path":62,"stem":63,"children":64},"About","/en/a-propos","en/3.a-propos/1.index",[65],{"title":61,"path":62,"stem":63},{"id":67,"title":15,"badge":68,"body":69,"category":68,"description":282,"extension":283,"links":68,"meta":284,"navigation":286,"path":16,"seo":287,"stem":17,"tags":68,"__hash__":288},"docs_en/en/1.gray-scott-school/03.jour-1.md",null,{"type":70,"value":71,"toc":272},"minimark",[72,77,86,91,94,98,101,104,116,120,127,130,133,137,192,195,198,202,218,225,228,236,240,247,250,255,259,262],[73,74,76],"h2",{"id":75},"_1-the-cpu","1. The CPU",[78,79,80,81,85],"p",{},"A processor is made of several independent ",[82,83,84],"strong",{},"cores",". Clock frequency has stagnated since\n~2005 (thermal limit); performance now comes from adding cores, which makes parallelism\nunavoidable.",[78,87,88],{},[82,89,90],{},"Performance comes from the number of cores, not raw frequency.",[92,93],"d1-cpu",{},[73,95,97],{"id":96},"_2-compilation","2. Compilation",[78,99,100],{},"Compilation translates C++ into machine code ahead of execution, with optimization. Its four\nstages: preprocessor, compilation (assembly + optimization), assembling, linking. Optimization\nand vectorization are born at compile time.",[102,103],"d1-compile",{},[78,105,106,107,111,112,115],{},"Recommended flags: ",[108,109,110],"code",{},"-O3 -march=native",". Without optimization (",[108,113,114],{},"-O0","), code can be an order of\nmagnitude slower.",[73,117,119],{"id":118},"_3-vectorization-simd","3. Vectorization (SIMD)",[78,121,122,126],{},[123,124,125],"em",{},"Single Instruction, Multiple Data",": one instruction processes several values at once. The\ncompiler generates it automatically when the loop is regular (contiguous access, no\ndependencies).",[128,129],"d1-simd",{},[78,131,132],{},"Typical gain: ×4 to ×16 depending on register width (SSE 128-bit, AVX2 256-bit, AVX-512 512-bit).",[73,134,136],{"id":135},"_4-concurrency-vs-parallelism","4. Concurrency vs parallelism",[138,139,140,155],"table",{},[141,142,143],"thead",{},[144,145,146,149,152],"tr",{},[147,148],"th",{},[147,150,151],{},"Concurrency",[147,153,154],{},"Parallelism",[156,157,158,170,181],"tbody",{},[144,159,160,164,167],{},[161,162,163],"td",{},"Goal",[161,165,166],{},"hide waiting (I/O, network)",[161,168,169],{},"speed up computation",[144,171,172,175,178],{},[161,173,174],{},"Hardware",[161,176,177],{},"possible on a single core",[161,179,180],{},"requires several cores",[144,182,183,186,189],{},[161,184,185],{},"Domain",[161,187,188],{},"servers, async I/O",[161,190,191],{},"compute-intensive work",[193,194],"d1-conc",{},[78,196,197],{},"Concurrency is a structuring tool; parallelism is the result when the hardware allows it.",[73,199,201],{"id":200},"_5-memory-bound-and-compute-bound","5. Memory-bound and compute-bound",[203,204,205,212],"ul",{},[206,207,208,211],"li",{},[82,209,210],{},"Compute-bound"," — limited by the compute units → adding cores helps.",[206,213,214,217],{},[82,215,216],{},"Memory-bound"," — limited by bandwidth; cores wait for data → adding cores barely helps.",[78,219,220,221,224],{},"Arithmetic intensity (operations per byte loaded) sets the regime. Stencils have low arithmetic\nintensity and are usually memory-bound. The reference analysis tool is the ",[82,222,223],{},"roofline model",".",[226,227],"d1-roofline",{},[229,230],"gs-bar-chart",{":categories":231,":series":232,"note":233,"title":234,"unit":235},"[\"Register\",\"L1\",\"L2\",\"L3\",\"RAM\"]","[{\"name\":\"Latency\",\"values\":[0.3,1,4,15,100]}]","Typical orders of magnitude — a RAM access costs ~100× an L1 hit.","The memory wall: cost of one access by level"," ns",[73,237,239],{"id":238},"_6-purity","6. Purity",[78,241,242,243,246],{},"A function is ",[82,244,245],{},"pure"," when its output depends only on its inputs, with no side effects. A pure\ncomputation has no hidden dependency: it is parallelizable without data races. A double-buffer\nscheme (read one array, write another) is pure and trivially parallel.",[248,249],"d1-buffers",{},[78,251,252],{},[82,253,254],{},"Purity is what makes parallelization safe.",[73,256,258],{"id":257},"_7-unit-tests-in-scientific-computing","7. Unit tests in scientific computing",[78,260,261],{},"Floating-point rounding forbids exact comparison. Three rules guide validation:",[263,264,270],"pre",{"className":265,"code":267,"language":268,"meta":269},[266],"language-text","1.  Compare with tolerance    |a − b| \u003C ε   (never strict equality)\n2.  Check invariants          energy, symmetry, no NaN, analytic case\n3.  Seq. / parallel equality  a divergence reveals a data race\n","text","",[108,271,267],{"__ignoreMap":269},{"title":269,"searchDepth":273,"depth":273,"links":274},2,[275,276,277,278,279,280,281],{"id":75,"depth":273,"text":76},{"id":96,"depth":273,"text":97},{"id":118,"depth":273,"text":119},{"id":135,"depth":273,"text":136},{"id":200,"depth":273,"text":201},{"id":238,"depth":273,"text":239},{"id":257,"depth":273,"text":258},"The vocabulary and principles. Optimizing means understanding how the hardware works and locating where time is lost.","md",{"icon":285},"lucide:cpu",true,{"title":15,"description":282},"y0ca-wZVaTLSA-ez53lKZZqkB_U6y75XaixaItq3EcQ",[290,292],{"title":11,"path":12,"stem":13,"description":291,"children":-1},"The presentation of CINERI live to the whole Gray Scott School 2026 — special session of June 25, broadcast on the official live stream.",{"title":19,"path":20,"stem":21,"description":293,"children":-1},"Two sessions on June 23: C++ 17/20/23 on CPU in the morning, advanced optimization (blocking & Pyramid) in the afternoon. Measure, understand the stencil, exploit the cache.",1783172490753]