← Blog & Papers

⚠ DRAFT — outline scaffolded from notes, prose to be written.

Zero-Copy from NIC to GPU inside a Kernel-Space Engine

A systems-paper outline: how LplPlugin, running in ring-0 of LplKernel, collapses the network→render latency chain.

Abstract

Draft outline — fill with the real numbers and diagrams.

LplPlugin is a dual-build (client|server) game and graphics engine compiled natively into LplKernel and executed in ring-0. This paper describes the data path that carries a UDP packet from the network interface to the GPU without intermediate copies, and the memory and concurrency architecture that makes it safe.

1. Motivation

  • The classic path (NIC → kernel socket buffer → user copy → engine → GPU upload) crosses the syscall boundary twice and copies the payload several times.
  • By placing the engine in kernel space, the boundary disappears; the remaining problem is eliminating the copies.

2. Architecture

  • Physical memory: deterministic Free-List PMM (client) vs. throughput allocator (server).
  • Paging: dynamic mapping of DMA regions shared between NIC and GPU.
  • SoA ECS: contiguous component storage feeding the GPU upload directly.

TODO: insert the Mermaid pipeline diagram from docs/.

3. Concurrency model

  • Lock-free SPSC ring between the IRQ producer and the main-loop consumer.
  • Acquire/release barriers to defeat CPU reordering.
  • Generational IDs to make entity handles ABA-safe.

4. Evaluation

  • 10,000 entities physics step: 23 µs (CUDA path).
  • Network loop average frame time: 70.15 µs.

TODO: describe the bench harness, hardware, and variance.

5. Lessons

  • What the kernel-space placement bought us, and what it cost.