Shardy Lattice Node: Architecture & Verifiable Compute
The Shardy network represents a paradigm shift in decentralized physical infrastructure. Unlike traditional distributed cloud architectures, Shardy aggregates the untamed, volatile computational power of consumer devices (via regular web browsers) and organizes them into a highly resilient, permissionless, and mathematically verifiable supercomputer.
This document provides a deep dive into how a Shardy node operates under the hood, detailing its execution engine, hardware resource management, transactional lifecycle, and the cryptography that ensures zero-trust deterministic outputs.
1. Node Formation & The Multi-Threaded Runtime
When a user connects to the Shardy network via the node dashboard, the application initializes a sandbox environment within the browser. To bypass the restrictive single-threaded nature of standard web applications, the node spawns multiple parallel execution environments using Web Workers.
1.1 The Threading Model
- UI/Host Thread: Handles the React frontend, user interaction, and the WebSocket connection to the Orchestrator. It routes binary task frames to the compute worker and sends ACK/progress/results back to the server.
- Compute Worker (
compute.worker.ts): Dedicated Web Worker for high-intensity computation. It usesnavigator.gpufor WebGPU tasks and can run CPU-only fallbacks. - P2P + WebRTC Layer: libp2p handles presence gossip and task digest messages. WebRTC is used for optional peer-to-peer data transfer once a signaling channel is negotiated through the orchestrator.
1.2 Resource Allocation & OPFS
The node dynamically interrogates the host machine for its hardware limits. Rather than crashing a user’s browser by requesting too much VRAM, Shardy respects strict constraints:
- Chunking strategy: Payloads are prepared in orchestrator chunker logic; the worker consumes binary payloads sized for GPU and memory limits.
- OPFS checkpoints: The node uses OPFS to persist task payload + metadata for recovery after refresh.
- OPFS scratch (stress tests): Stress tasks can reserve disk scratch as a best-effort path.
- SharedArrayBuffer (optional): The worker supports shared buffers for zero-copy results when provided.
2. Task Execution Pipeline
The heart of the Shardy node is its ability to execute AI tensor operations deterministically across vastly different hardware configurations (from an M3 Max MacBook to an Nvidia RTX 4090 to a low-end Android phone).
2.1 Preprocessing via WebAssembly (Rust)
Before biological-scale compute hits the GPU, parameters must be normalized:
- WASM Engine Integration:
preprocess_node_engine.wasmreceives the raw byte sequence. - Linear Memory Control: The WASM runtime allocates custom boundaries (
alloc_bytes). - Deterministic Smoothing: Applies clamping and EMA smoothing to stabilize float variance, with a JS fallback path when WASM is unavailable.
2.2 WebGPU Execution Context
Shardy utilizes TypeGPU to bridge the gap between TypeScript strict typing and WGSL (WebGPU Shading Language):
- Adapter Initialization: The worker requests the most powerful discrete GPU available, falling back to integrated graphics if needed.
- Schema Alignment: The task payload is mapped to a memory-safe
TaskPayloadSchema, ensuring alignment down to the byte. - Shader Dispatch: The node compiles the requested WGSL compute shader, binds the memory layouts, and dispatches the workgroups (e.g., executing multiply-accumulate operations across thousands of parallel threads).
- Readback: The GPU executes the pass, and a copy-encoder safely maps the VRAM matrix back into an accessible
Float32Arrayon the CPU layer.
2.3 Graceful Degradation (Lite Tasks)
If WebGPU is unavailable or the task type is CPU-bound, the node executes lite_test or super_lite_test workloads on the CPU. These are deterministic loops with lightweight checksums.
3. Transactions & the Proof Lifecycle
The fundamental transaction in Shardy is the exchange of completed node computations for network verification, which ultimately leads to decentralized rewards. However, since the network is permissionless, nodes are strictly untrusted. Orchestrators must enforce cryptographic guarantees that the node actually did the hardware work and didn’t simply spoof a result.
3.1 Generating the Result Digest
Upon completing the hardware execution, the compute.worker.ts produces a raw floating-point array.
The worker runs foldOutputToField across the output buffer. This uses a domain tag and 32-bit word folding to produce a field element resultDigest.
3.2 The Zero-Knowledge (ZK) Circuit
Shardy uses the Groth16 SNARK protocol (via snarkjs) to prove computational integrity with zero knowledge.
The verification schema revolves around the circuit component shardy_task_proof_v2.circom. The constraint algorithm ensures algebraic equivalence:
witness + taskId + seed + outputLen === resultDigest;taskId: The unique identifier granted by the orchestrator.seed: Network entropy randomly assigned to the computation.outputLen: The size constraint of the hardware readback.resultDigest: The folded mathematical digest of the GPU output.
3.3 The Proof Generation Sequence
For the orchestrator to accept the computation as true, the node must generate a valid proof of knowledge for the witness:
- Witness Calculation: Using the field arithmetic constrained by a prime modulo (the BN128 scalar field), the worker calculates the correct
witnessbased on its uniquely producedresultDigest. - Trusted Setup Binaries: The worker downloads binary keys (
.wasmcircuits and.zkeyfiles) from the network’s active versionmanifest.json. - SNARK Proving: The node executes
groth16.fullProveentirely within the browser sandboxed environment, yielding aproofpayload and correspondingpublicSignals. - Verification Pre-Check: Before spending bandwidth, the node runs a fast local validation against its own
VerificationKey. If valid, the task is marked completed.
3.4 Orchestrator Quorum & Payout
The node posts task_result with checksum and proof. The orchestrator validates the proof, verifies checksum consistency, and waits for redundancy quorum before emitting task_verified.
Because the witness perfectly satisfied the polynomial equation represented by the .zkey proving key, the Orchestrator’s groth16.verify routine yields true instantly, mathematically confirming that the digest was born of the exact inputs assigned to the node.
When a redundancy threshold (BFT quorum) of multiple distinct nodes returns verified proofs holding identical public signals, the network state transitions to Verified, and the orchestrator triggers on-chain telemetry mechanisms indicating the execution was successful and resources were expended.
Technical Summary
By unifying WebGPU for raw parallel tensor slicing, WASM for cross-platform deterministic preprocessing, and Circom / SnarkJS for robust cryptographic execution verification, a Shardy Node transforms a common web request into an elite, trustless high-performance computing edge device.