Shardy Lattice Node: Architecture & Verifiable Compute
The Shardy network represents a paradigm shift in decentralized physical infrastructure. Unlike traditional distributed cloud architectures, Shardy aggregates the untamed, volatile computational power of consumer devices (via regular web browsers) and organizes them into a highly resilient, permissionless, and mathematically verifiable supercomputer.
This document provides a deep dive into how a Shardy node operates under the hood, detailing its execution engine, hardware resource management, transactional lifecycle, and the cryptography that ensures zero-trust deterministic outputs.
1. Node Formation & The Multi-Threaded Runtime
When a user connects to the Shardy network via the node dashboard, the application initializes a sandbox environment within the browser. To bypass the restrictive single-threaded nature of standard web applications, the node spawns multiple parallel execution environments using Web Workers.
1.1 The Threading Model
- UI/Host Thread: Handles the React-based frontend, user interactions, and the primary WebSocket connection to the Orchestrator. It acts as the traffic controller, routing binary task frames to the appropriate compute context.
- Compute Worker (
compute.worker.ts): A Dedicated Web Worker completely decoupled from the DOM. Its sole purpose is to run high-intensity mathematical operations. It interfaces directly with the host’s GPU via thenavigator.gpuWebGPU API. - WebRTC P2P Thread: In the background, Libp2p manages mesh gossiping, routing tasks peer-to-peer if the primary WebSocket connection degrades.
1.2 Resource Allocation & OPFS
The node dynamically interrogates the host machine for its hardware limits. Rather than crashing a user’s browser by requesting too much VRAM, Shardy respects strict constraints:
- Chunking strategy: Massive tensor matrices are chunked to fit within the
maxStorageBufferBindingSizelimits of the host GPU. - Origin Private File System (OPFS): For tasks requiring hundreds of megabytes of scratch space that exceed the WASM linear memory boundaries or GPU VRAM, the compute worker dynamically reserves block storage directly on the user’s hard drive using the OPFS API (
navigator.storage.getDirectory()). It reads and writes binary chunks in streaming mode, bypassing traditional RAM completely. - SharedArrayBuffer (SAB): To prevent performance bottlenecks caused by copying data between the UI thread and the Compute Worker, large task payloads are written directly into a shared memory segment context (
SharedArrayBuffer), enabling zero-copy execution.
2. Task Execution Pipeline
The heart of the Shardy node is its ability to execute AI tensor operations deterministically across vastly different hardware configurations (from an M3 Max MacBook to an Nvidia RTX 4090 to a low-end Android phone).
2.1 Preprocessing via WebAssembly (Rust)
Before biological-scale compute hits the GPU, parameters must be normalized:
- WASM Engine Integration: A hyper-optimized WebAssembly binary compiled from Rust (
wasmNodeEngine) receives the raw byte sequence. - Linear Memory Control: The WASM runtime allocates custom boundaries (
alloc_bytes). - Deterministic Smoothing: Hardware architectures calculate floating-point numbers differently. To ensure consensus isn’t broken by a rounding error on AMD vs. NVIDIA cards, the WASM engine applies an Exponential Moving Average (EMA) smoothing algorithm to the tensor dataset before dispatching it to WebGPU.
2.2 WebGPU Execution Context
Shardy utilizes TypeGPU to bridge the gap between TypeScript strict typing and WGSL (WebGPU Shading Language):
- Adapter Initialization: The worker requests the most powerful discrete GPU available, falling back to integrated graphics if needed.
- Schema Alignment: The task payload is mapped to a memory-safe
TaskPayloadSchema, ensuring alignment down to the byte. - Shader Dispatch: The node compiles the requested WGSL compute shader, binds the memory layouts, and dispatches the workgroups (e.g., executing multiply-accumulate operations across thousands of parallel threads).
- Readback: The GPU executes the pass, and a copy-encoder safely maps the VRAM matrix back into an accessible
Float32Arrayon the CPU layer.
2.3 Graceful Degradation (Lite Tasks)
If a user is running on a restrictive environment (e.g., Safari on an older generation iPhone) lacking WebGPU capabilities, the node defaults to CPU-only fallback algorithms (Lite and Super-Lite tasks) executing iterative FNV-1a mathematical shifts to contribute to the network without utilizing full tensor pipelines.
3. Transactions & the Proof Lifecycle
The fundamental transaction in Shardy is the exchange of completed node computations for network verification, which ultimately leads to decentralized rewards. However, since the network is permissionless, nodes are strictly untrusted. Orchestrators must enforce cryptographic guarantees that the node actually did the hardware work and didn’t simply spoof a result.
3.1 Generating the Result Digest
Upon completing the hardware execution, the compute.worker.ts produces a raw floating-point array.
The worker then runs a hashing algorithm (foldOutputToField) across this resulting buffer. It seeds this operation with a unique mathematical constant (DOMAIN_TAG, e.g., 0x44504e32534e4152n) to compress the megabytes of output into a single highly collision-resistant scalar integer: the resultDigest.
3.2 The Zero-Knowledge (ZK) Circuit
Shardy uses the Groth16 SNARK protocol (via snarkjs) to prove computational integrity with zero knowledge.
The verification schema revolves around the circuit component shardy_task_proof_v2.circom. The constraint algorithm ensures algebraic equivalence:
witness + taskId + seed + outputLen === resultDigest;taskId: The unique identifier granted by the orchestrator.seed: Network entropy randomly assigned to the computation.outputLen: The size constraint of the hardware readback.resultDigest: The folded mathematical digest of the GPU output.
3.3 The Proof Generation Sequence
For the orchestrator to accept the computation as true, the node must generate a valid proof of knowledge for the witness:
- Witness Calculation: Using the field arithmetic constrained by a prime modulo (the BN128 scalar field), the worker calculates the correct
witnessbased on its uniquely producedresultDigest. - Trusted Setup Binaries: The worker downloads binary keys (
.wasmcircuits and.zkeyfiles) from the network’s active versionmanifest.json. - SNARK Proving: The node executes
groth16.fullProveentirely within the browser sandboxed environment, yielding aproofpayload and correspondingpublicSignals. - Verification Pre-Check: Before spending bandwidth, the node runs a fast local validation against its own
VerificationKey. If valid, the task is marked completed.
3.4 Orchestrator Quorum & Payout
The node posts the ZkProofPayload (and potentially the raw binary buffer) back to the Orchestrator.
Because the witness perfectly satisfied the polynomial equation represented by the .zkey proving key, the Orchestrator’s groth16.verify routine yields true instantly, mathematically confirming that the digest was born of the exact inputs assigned to the node.
When a redundancy threshold (BFT quorum) of multiple distinct nodes returns verified proofs holding identical public signals, the network state transitions to Verified, and the orchestrator triggers on-chain telemetry mechanisms indicating the execution was successful and resources were expended.
Technical Summary
By unifying WebGPU for raw parallel tensor slicing, WASM for cross-platform deterministic preprocessing, and Circom / SnarkJS for robust cryptographic execution verification, a Shardy Node transforms a common web request into an elite, trustless high-performance computing edge device.