[{"data":1,"prerenderedAt":914},["ShallowReactive",2],{"navigation_docs_en":3,"-en-projets-senland":66,"-en-projets-senland-surround":909},[4,50,60],{"title":5,"path":6,"stem":7,"children":8},"The Gray Scott School","/en/gray-scott-school","en/1.gray-scott-school/01.index",[9,10,14,18,22,26,30,34,38,42,46],{"title":5,"path":6,"stem":7},{"title":11,"path":12,"stem":13},"CINERI Presentation","/en/gray-scott-school/presentation-cineri","en/1.gray-scott-school/02.presentation-cineri",{"title":15,"path":16,"stem":17},"Day 1 — Foundations","/en/gray-scott-school/jour-1","en/1.gray-scott-school/03.jour-1",{"title":19,"path":20,"stem":21},"Day 2 — C++ on CPU","/en/gray-scott-school/jour-2","en/1.gray-scott-school/04.jour-2",{"title":23,"path":24,"stem":25},"Day 3 — Fortran on CPU","/en/gray-scott-school/jour-3","en/1.gray-scott-school/05.jour-3",{"title":27,"path":28,"stem":29},"Day 4 — Kokkos on CPU","/en/gray-scott-school/jour-4","en/1.gray-scott-school/06.jour-4",{"title":31,"path":32,"stem":33},"Day 5 — Python on CPU","/en/gray-scott-school/jour-5","en/1.gray-scott-school/07.jour-5",{"title":35,"path":36,"stem":37},"Day 6 — SIMD with EVE + GPU architecture","/en/gray-scott-school/jour-6","en/1.gray-scott-school/08.jour-6",{"title":39,"path":40,"stem":41},"Day 7 — Python on GPU","/en/gray-scott-school/jour-7","en/1.gray-scott-school/09.jour-7",{"title":43,"path":44,"stem":45},"Day 8 — Fortran on GPU","/en/gray-scott-school/jour-8","en/1.gray-scott-school/10.jour-8",{"title":47,"path":48,"stem":49},"Day 9 — Kokkos on GPU","/en/gray-scott-school/jour-9","en/1.gray-scott-school/11.jour-9",{"title":51,"path":52,"stem":53,"children":54},"Projects","/en/projets","en/2.projets/1.index",[55,56],{"title":51,"path":52,"stem":53},{"title":57,"path":58,"stem":59},"SenLand","/en/projets/senland","en/2.projets/2.senland",{"title":61,"path":62,"stem":63,"children":64},"About","/en/a-propos","en/3.a-propos/1.index",[65],{"title":61,"path":62,"stem":63},{"id":67,"title":57,"badge":68,"body":69,"category":68,"description":896,"extension":897,"links":898,"meta":904,"navigation":906,"path":58,"seo":907,"stem":59,"tags":68,"__hash__":908},"docs_en/en/2.projets/2.senland.md",null,{"type":70,"value":71,"toc":879},"minimark",[72,77,98,104,108,115,119,122,141,145,155,160,164,178,185,191,197,203,207,214,263,275,278,282,309,312,327,331,338,344,368,378,382,393,417,425,429,436,442,446,484,550,553,582,590,594,609,622,698,703,709,712,749,762,766,773,871,874],[73,74,76],"h2",{"id":75},"senland-senegals-land-cover-with-deep-learning","SenLand — Senegal's land cover with deep learning",[78,79,80,81,85,86,89,90,93,94,97],"p",{},"A real AI project, actually run — ",[82,83,84],"strong",{},"entirely on a single laptop",", with the techniques taught at\nthe Gray Scott School (CINERI). The same neural network maps land cover — water, cropland,\nforest, built-up, mangroves — from open satellite imagery. The engineering question that shapes\nthe whole project: run this code first on the ",[82,87,88],{},"CPU",", then on the ",[82,91,92],{},"GPU"," of the same machine,\nand compare the two compute engines honestly — their architecture, how you exploit them, their\nlimits, and the ",[82,95,96],{},"measured"," results.",[99,100],"media-video",{"caption":101,"poster":102,"src":103},"SenLand in motion — from satellite imagery to a land-cover map.","/projets/senland/poster.jpg","/projets/senland/senland-song.mp4",[73,105,107],{"id":106},"the-problem","The problem",[78,109,110,111,114],{},"Knowing ",[82,112,113],{},"where"," cropland, water, forests and cities are — and how they change year to year —\nis a national question: agriculture, water resources, urbanization, climate. Manual mapping does\nnot scale to a country. A neural network learns to read satellite imagery and produces the map\nautomatically, everywhere, at the same resolution.",[73,116,118],{"id":117},"the-pipeline-end-to-end","The pipeline, end to end",[120,121],"sen-land-pipeline",{},[78,123,124,125,128,129,132,133,136,137,140],{},"Four stages: read the ",[82,126,127],{},"open data"," (imagery + labels), ",[82,130,131],{},"tile"," it into small patches, a\n",[82,134,135],{},"U-Net"," segments each patch, and the ",[82,138,139],{},"land-cover map"," is recomposed.",[73,142,144],{"id":143},"the-data-100-open","The data (100% open)",[78,146,147,150,151,154],{},[82,148,149],{},"Sentinel-2"," imagery (cloudless, 10 m) pixel-aligned with ",[82,152,153],{},"ESA WorldCover"," labels (10 m).\nFour deliberately different landscapes, read live from public servers — no private data, no bulk\ndownload.",[156,157],"fig-grid",{":cols":158,":images":159},"2","[{\"src\":\"/projets/senland/lac-de-guiers_preview.png\",\"caption\":\"Lac de Guiers — lake + irrigated river valley: water, cropland, wetlands\"},{\"src\":\"/projets/senland/dakar_preview.png\",\"caption\":\"Dakar — peninsula: dense built-up, ocean, bare soil\"},{\"src\":\"/projets/senland/casamance_preview.png\",\"caption\":\"Casamance — forest, mangroves, rivers, cropland\"},{\"src\":\"/projets/senland/sine-saloum_preview.png\",\"caption\":\"Sine-Saloum — the delta and its great mangrove belt\"}]",[73,161,163],{"id":162},"results-the-maps","Results — the maps",[78,165,166,167,169,170,173,174,177],{},"For each area: the ",[82,168,149],{}," image, the ",[82,171,172],{},"ground truth"," (WorldCover) and the model's\n",[82,175,176],{},"prediction",", side by side. The lake, the ocean and the large agricultural structures are\nfaithfully reconstructed.",[78,179,180],{},[181,182],"img",{"alt":183,"src":184},"Lac de Guiers — Sentinel-2, ground truth, prediction","/projets/senland/segmentation_lac-de-guiers.png",[78,186,187],{},[181,188],{"alt":189,"src":190},"Sine-Saloum — Sentinel-2, ground truth, prediction","/projets/senland/segmentation_sine-saloum.png",[78,192,193],{},[181,194],{"alt":195,"src":196},"Casamance — Sentinel-2, ground truth, prediction","/projets/senland/segmentation_casamance.png",[78,198,199],{},[181,200],{"alt":201,"src":202},"Dakar — Sentinel-2, ground truth, prediction","/projets/senland/segmentation_dakar.png",[73,204,206],{"id":205},"metrics","Metrics",[78,208,209,210,213],{},"Segmentation is evaluated on a ",[82,211,212],{},"spatial holdout"," (a validation area kept apart, for an honest\nmeasure of generalization).",[215,216,217,233],"table",{},[218,219,220],"thead",{},[221,222,223,227,230],"tr",{},[224,225,226],"th",{},"Task",[224,228,229],{},"Metric",[224,231,232],{},"Value",[234,235,236,250],"tbody",{},[221,237,238,242,245],{},[239,240,241],"td",{},"Classification (EuroSAT, 10 classes, from scratch)",[239,243,244],{},"validation accuracy",[239,246,247],{},[82,248,249],{},"91.8%",[221,251,252,255,258],{},[239,253,254],{},"Segmentation (4 areas, spatial holdout)",[239,256,257],{},"mean IoU",[239,259,260],{},[82,261,262],{},"0.62",[78,264,265,266,269,270,274],{},"Adding the Saloum delta to training lifts mangroves from ",[82,267,268],{},"3% to 83%"," IoU — the demonstration\nthat ",[271,272,273],"em",{},"the right data beats a bigger model",".",[156,276],{":cols":158,":images":277},"[{\"src\":\"/projets/senland/seg_curves.png\",\"caption\":\"Segmentation learning — mean IoU & loss\"},{\"src\":\"/projets/senland/per_class_iou.png\",\"caption\":\"Per-class IoU — water and built-up lead, mangroves follow\"},{\"src\":\"/projets/senland/training_curves.png\",\"caption\":\"Classification warm-up (EuroSAT) — accuracy & loss\"},{\"src\":\"/projets/senland/confusion_matrix.png\",\"caption\":\"Confusion matrix (classification)\"}]",[73,279,281],{"id":280},"the-architecture-one-code-two-engines","The architecture — one code, two engines",[78,283,284,285,287,288,291,292,295,296,299,300,304,305,308],{},"The model is a ",[82,286,135],{}," (ResNet-34 encoder) for segmentation, preceded by a ResNet-18 for the\nclassification warm-up, all in ",[82,289,290],{},"PyTorch",". The core of the project reuses the ",[82,293,294],{},"Kokkos"," idea\nfrom Day 4: ",[271,297,298],{},"one source, two backends",". It is exactly the same code that runs on the CPU or the\nGPU — a hardware introspection layer (",[301,302,303],"code",{},"hw.py",") picks the device and ",[82,306,307],{},"declares it honestly"," in\nevery figure, never a CPU run labeled \"GPU\".",[310,311],"sen-land-engine",{},[78,313,314,315,318,319,322,323,326],{},"On CPU, parallelism goes through ",[82,316,317],{},"OpenMP intra-op threads"," (Day 1 / TBB); on GPU, through the\n",[82,320,321],{},"massive SIMT parallelism"," of CUDA cores (Day 2 / 3). No line of science changes: only the\n",[82,324,325],{},"iteration throughput"," does.",[73,328,330],{"id":329},"on-cpu","On CPU",[78,332,333,334,337],{},"No GPU required: the same code uses all cores via OpenMP threads. Strong scaling measured on the\nlaptop (Intel i5-10300H, 1 → 8 threads). Efficiency drops ",[82,335,336],{},"beyond 4 threads",": the machine has\nonly 4 physical cores, HyperThreading doesn't help dense compute.",[78,339,340],{},[181,341],{"alt":342,"src":343},"Multicore CPU scaling — throughput & efficiency, 1 → 8 threads","/projets/senland/cpu_scaling.png",[345,346,347,362],"ul",{},[348,349,350,353,354,357,358,361],"li",{},[82,351,352],{},"Pros"," — available everywhere (no specialized hardware); plenty of RAM, ideal for geo I/O\n(",[301,355,356],{},"/vsicurl"," reads of Sentinel-2 + WorldCover); simple, reproducible debugging; real strong\nscaling (",[82,359,360],{},"×2.2"," from 1 to 4 cores, 54 → 117 img/s).",[348,363,364,367],{},[82,365,366],{},"Cons"," — caps at 4 physical cores (HyperThreading adds nothing: 117 → 109 img/s); ≈ 4× slower\nthan the same machine's modest GPU; an epoch in ~25 s vs ~5 s on GPU.",[369,370,371],"blockquote",{},[78,372,373,374,377],{},"Same weights, same model quality — only slower. The CPU produces exactly the ",[82,375,376],{},"same science",",\nat a lower throughput.",[73,379,381],{"id":380},"on-gpu","On GPU",[78,383,384,385,388,389,392],{},"The GPU applies ",[82,386,387],{},"massive parallelism (SIMT)",": thousands of CUDA cores process the batch at\nonce. On the laptop's GTX 1650, throughput reaches ",[82,390,391],{},"466 img/s",", iteration becomes fluid — which\nenabled the long training runs (120 segmentation epochs).",[345,394,395,408],{},[348,396,397,399,400,403,404,407],{},[82,398,352],{}," — ≈ ",[82,401,402],{},"×4"," faster than the best CPU, ",[82,405,406],{},"×8.6"," vs a single core; fluid iteration\n(5.5 s/epoch); architecture built for dense convolutions — SIMT fits the model.",[348,409,410,412,413,416],{},[82,411,366],{}," — limited VRAM (4 GB), constraining batch and tile size; mixed precision (AMP fp16)\n",[82,414,415],{},"diverges to NaN"," on this consumer card — disabled locally (an honest limit); host→device\ntransfer overhead.",[369,418,419],{},[78,420,421,422,274],{},"Strictly the same model and final accuracy as on CPU — the GPU doesn't change the result, it\ndelivers it ~4× faster. The gain is entirely ",[82,423,424],{},"engineer time recovered",[73,426,428],{"id":427},"head-to-head-measured-throughput","Head to head — measured throughput",[78,430,431,432,435],{},"Same problem, same code, three engines of the same machine. Everything is ",[82,433,434],{},"measured here",", on\nthe laptop — nothing is projected.",[78,437,438],{},[181,439],{"alt":440,"src":441},"CPU 1 core · multicore CPU · GPU — measured throughput (log scale)","/projets/senland/device_ladder.png",[73,443,445],{"id":444},"the-jax-port-comparing-frameworks-not-just-engines","The JAX port — comparing frameworks, not just engines",[78,447,448,449,452,453,456,457,460,461,464,465,468,469,472,473,476,477,480,481,274],{},"A direct extension of the school's ",[82,450,451],{},"Days 5 and 7"," (Python on CPU, then on GPU): the same\nsegmentation task is ",[82,454,455],{},"reimplemented in JAX/Flax"," (",[301,458,459],{},"src/senland_jax/",") to compare ",[271,462,463],{},"frameworks","\non top of ",[271,466,467],{},"engines"," — same Senegal areas, same IoU yardstick. The model is a ",[82,470,471],{},"from-scratch","\nU-Net (GroupNorm, 7.8 M parameters, no ImageNet pretraining); the training step is a ",[82,474,475],{},"pure\nfunction"," compiled with ",[301,478,479],{},"jax.jit"," and differentiated with ",[301,482,483],{},"jax.value_and_grad",[215,485,486,502],{},[218,487,488],{},[221,489,490,493,496,499],{},[224,491,492],{},"Framework",[224,494,495],{},"Engine",[224,497,498],{},"mIoU",[224,500,501],{},"Throughput",[234,503,504,520,536],{},[221,505,506,512,515,517],{},[239,507,508,509],{},"PyTorch — U-Net ResNet-34, ",[82,510,511],{},"ImageNet",[239,513,514],{},"GPU GTX 1650",[239,516,262],{},[239,518,519],{},"470 patch/s",[221,521,522,528,530,533],{},[239,523,524,525],{},"JAX — U-Net GroupNorm, ",[82,526,527],{},"from scratch",[239,529,514],{},[239,531,532],{},"0.57",[239,534,535],{},"211 patch/s",[221,537,538,541,544,547],{},[239,539,540],{},"JAX — same model",[239,542,543],{},"CPU i5-10300H",[239,545,546],{},"—",[239,548,549],{},"13 patch/s",[78,551,552],{},"Reading these numbers honestly:",[345,554,555,565],{},[348,556,557,560,561,564],{},[82,558,559],{},"mIoU 0.57 vs 0.62"," — the gap is the ",[82,562,563],{},"ImageNet-pretrained"," encoder on the PyTorch side;\nthe JAX model trains from scratch. Per-class IoU still tracks closely (permanent water\n0.90 vs 0.91, mangroves 0.77 vs 0.78).",[348,566,567,568,571,572,575,576,579,580,274],{},"The \"",[82,569,570],{},"one code, two engines","\" story holds in JAX too: the same jitted code runs\n",[82,573,574],{},"≈ ×16 faster on GPU than on CPU"," — here via device placement (",[301,577,578],{},"JAX_PLATFORMS",") rather\nthan ",[301,581,303],{},[583,584],"gs-bar-chart",{":categories":585,":series":586,"note":587,"title":588,"unit":589},"[\"GPU · GTX 1650\",\"CPU · i5-10300H\"]","[{\"name\":\"Throughput\",\"values\":[211,13]}]","Same JAX code, device placement via JAX_PLATFORMS — ≈ ×16 on GPU.","JAX — the same jitted code, two engines (patch/s)"," patch/s",[73,591,593],{"id":592},"the-fair-benchmark-identical-model-in-both-frameworks","The fair benchmark — identical model in both frameworks",[78,595,596,597,600,601,604,605,608],{},"Comparing a pretrained ResNet-34 to a small hand-written U-Net is not a fair race.\n",[301,598,599],{},"scripts/bench_unet.py"," therefore builds the ",[82,602,603],{},"strictly identical"," U-Net (GroupNorm, 11\nclasses) in both frameworks and times the ",[82,606,607],{},"full training step"," (forward + CE+Dice + backward",[345,610,611],{},[348,612,613,614,617,618,621],{},"AdamW) on a device-resident batch — isolating the ",[271,615,616],{},"framework/compiler"," from the\n",[271,619,620],{},"architecture",". GTX 1650, fp32:",[215,623,624,638],{},[218,625,626],{},[221,627,628,631,635],{},[224,629,630],{},"Mode",[224,632,634],{"align":633},"right","batch 8",[224,636,637],{"align":633},"batch 16",[234,639,640,654,664,678],{},[221,641,642,649,652],{},[239,643,644,645,648],{},"JAX — naive (per-step ",[301,646,647],{},"jit",")",[239,650,651],{"align":633},"246",[239,653,546],{"align":633},[221,655,656,659,662],{},[239,657,658],{},"PyTorch — eager",[239,660,661],{"align":633},"334",[239,663,546],{"align":633},[221,665,666,672,675],{},[239,667,668,669],{},"PyTorch — ",[301,670,671],{},"torch.compile",[239,673,674],{"align":633},"356",[239,676,677],{"align":633},"460",[221,679,680,688,693],{},[239,681,682],{},[82,683,684,685,648],{},"JAX — fused multi-step (",[301,686,687],{},"lax.fori_loop",[239,689,690],{"align":633},[82,691,692],{},"371",[239,694,695],{"align":633},[82,696,697],{},"463",[78,699,700],{},[271,701,702],{},"(patches/s; higher is better)",[583,704],{":categories":705,":series":706,"note":707,"title":708,"unit":589},"[\"JAX naive (per-step jit)\",\"PyTorch eager\",\"torch.compile\",\"JAX fused (fori_loop)\"]","[{\"name\":\"Throughput\",\"values\":[246,334,356,371]}]","Identical GroupNorm U-Net in both frameworks, full training step, GTX 1650 fp32.","Identical model, batch 8 — the framework alone (patch/s)",[78,710,711],{},"Takeaways:",[345,713,714,740],{},[348,715,716,719,720,722,723,726,727,456,730,732,733,739],{},[82,717,718],{},"Naive"," JAX (one dispatch per step) is the slowest. Once its real levers are engaged —\n",[301,721,647],{},", ",[301,724,725],{},"donate_argnums"," (buffer reuse), device-resident inputs, and above all ",[82,728,729],{},"step\nfusion",[301,731,687],{},": 20 steps for the cost of one dispatch) — JAX ",[82,734,735,736,738],{},"edges past\n",[301,737,671],{}," at batch 8 (+4%)"," and ties it at batch 16.",[348,741,742,745,746,748],{},[82,743,744],{},"At an identical model, fully-tuned JAX ≈ fully-tuned PyTorch"," on this card. The earlier\n~2× gap was the ",[271,747,620],{}," (smp ResNet-34 vs a hand-written U-Net), not the framework.\nXLA's advantage would widen on TPUs, larger batches/models, or more fusable graphs — none of\nwhich a 4 GB consumer GPU exercises.",[369,750,751],{},[78,752,753,754,757,758,761],{},"Honest reproduction pitfall: ",[301,755,756],{},"jax[cuda12]"," and PyTorch pin different ",[301,759,760],{},"nvidia-cudnn-cu12","\nversions — in one shared venv only one of the two has working GPU at a time. Use two\nseparate environments.",[73,763,765],{"id":764},"built-with-the-gray-scott-school-techniques","Built with the Gray Scott School techniques",[78,767,768,769,772],{},"Every engineering brick of SenLand reuses a technique from the Gray Scott School (CINERI) and\napplies it to deep learning. The through-line — ",[271,770,771],{},"one code, CPU then GPU"," — is the very idea of\nKokkos.",[215,774,775,785],{},[218,776,777],{},[221,778,779,782],{},[224,780,781],{},"Brick",[224,783,784],{},"School day",[234,786,787,801,815,823,831,839,847,855,863],{},[221,788,789,792],{},[239,790,791],{},"One code, two backends",[239,793,794,795,797,798,800],{},"Day 4 · Kokkos → here ",[301,796,303],{}," (PyTorch) and ",[301,799,578],{}," (JAX)",[221,802,803,806],{},[239,804,805],{},"Frameworks & compilers",[239,807,808,809,811,812,648],{},"Days 5 & 7 · Python/JAX — ",[301,810,647],{},", XLA, step fusion (",[301,813,814],{},"fori_loop",[221,816,817,820],{},[239,818,819],{},"Multicore CPU",[239,821,822],{},"Day 1 · parallelism + Day 2 · TBB — shared memory, strong scaling",[221,824,825,828],{},[239,826,827],{},"GPU SIMT / CUDA",[239,829,830],{},"Day 2 · GPU + Day 3 · CUDA — massive parallelism",[221,832,833,836],{},[239,834,835],{},"Benchmark & timing",[239,837,838],{},"Day 2 · fixed workload, img/s throughput",[221,840,841,844],{},[239,842,843],{},"Vectorization / SIMD",[239,845,846],{},"Day 1 + Day 6 · EVE — mixed-precision (AMP) analogue",[221,848,849,852],{},[239,850,851],{},"Floating-point precision",[239,853,854],{},"Day 3 · fp32/fp16 — explains the AMP NaN on GTX 1650",[221,856,857,860],{},[239,858,859],{},"I/O & data",[239,861,862],{},"Day 2 · HDF5 → here geo I/O Sentinel-2 / WorldCover",[221,864,865,868],{},[239,866,867],{},"Containers & repro",[239,869,870],{},"pixi / Apptainer — fixed seeds, versioned experiments",[872,873],"hr",{},[78,875,876],{},[271,877,878],{},"SenLand is an open, reproducible project: every number on this page is backed by a figure\ncommitted to the repository. The code runs without a GPU; with one, everything simply goes ~4×\nfaster.",{"title":880,"searchDepth":881,"depth":881,"links":882},"",2,[883,884,885,886,887,888,889,890,891,892,893,894,895],{"id":75,"depth":881,"text":76},{"id":106,"depth":881,"text":107},{"id":117,"depth":881,"text":118},{"id":143,"depth":881,"text":144},{"id":162,"depth":881,"text":163},{"id":205,"depth":881,"text":206},{"id":280,"depth":881,"text":281},{"id":329,"depth":881,"text":330},{"id":380,"depth":881,"text":381},{"id":427,"depth":881,"text":428},{"id":444,"depth":881,"text":445},{"id":592,"depth":881,"text":593},{"id":764,"depth":881,"text":765},"Mapping Senegal's land cover with deep learning — the same code on CPU and GPU, compared honestly. A learner project built with the Gray Scott School techniques.","md",[899],{"label":900,"icon":901,"to":902,"target":903},"Source code","i-simple-icons-github","https://github.com/aniasse/senland","_blank",{"icon":905},"lucide:map",true,{"title":57,"description":896},"1FpgBHEvlekNqPxJgfOebvgm9VLxlVU6NTULCKWZykA",[910,912],{"title":51,"path":52,"stem":53,"description":911,"children":-1},"Projects built by Gray Scott School learners, applying the HPC techniques taught at CINERI to real problems.",{"title":61,"path":62,"stem":63,"description":913,"children":-1},"CINERI, the TAOUEY supercomputer and the Gray Scott School — high-performance computing in the service of science, in Senegal.",1783172493392]