Shardy Lattice Node: Architecture & Verifiable Compute

The Shardy network represents a paradigm shift in decentralized physical infrastructure. Unlike traditional distributed cloud architectures, Shardy aggregates the untamed, volatile computational power of consumer devices (via regular web browsers) and organizes them into a highly resilient, permissionless, and mathematically verifiable supercomputer.

This document provides a deep dive into how a Shardy node operates under the hood, detailing its execution engine, hardware resource management, transactional lifecycle, and the cryptography that ensures zero-trust deterministic outputs.

1. Node Formation & The Multi-Threaded Runtime

When a user connects to the Shardy network via the node dashboard, the application initializes a sandbox environment within the browser. To bypass the restrictive single-threaded nature of standard web applications, the node spawns multiple parallel execution environments using Web Workers.

1.1 The Threading Model

UI/Host Thread: Handles the React frontend, user interaction, and the WebSocket connection to the Orchestrator. It routes binary task frames to the compute worker and sends ACK/progress/results back to the server.
Compute Worker (compute.worker.ts): Dedicated Web Worker for high-intensity computation. It uses navigator.gpu for WebGPU tasks and can run CPU-only fallbacks.
P2P + WebRTC Layer: libp2p handles presence gossip and task digest messages. WebRTC is used for optional peer-to-peer data transfer once a signaling channel is negotiated through the orchestrator.

1.2 Resource Allocation & OPFS

The node dynamically interrogates the host machine for its hardware limits. Rather than crashing a user’s browser by requesting too much VRAM, Shardy respects strict constraints:

Chunking strategy: Payloads are prepared in orchestrator chunker logic; the worker consumes binary payloads sized for GPU and memory limits.
OPFS checkpoints: The node uses OPFS to persist task payload + metadata for recovery after refresh.
OPFS scratch (stress tests): Stress tasks can reserve disk scratch as a best-effort path.
SharedArrayBuffer (optional): The worker supports shared buffers for zero-copy results when provided.

2. Task Execution Pipeline

The heart of the Shardy node is its ability to execute AI tensor operations deterministically across vastly different hardware configurations (from an M3 Max MacBook to an Nvidia RTX 4090 to a low-end Android phone).

2.1 Preprocessing via WebAssembly (Rust)

Before biological-scale compute hits the GPU, parameters must be normalized:

WASM Engine Integration: preprocess_node_engine.wasm receives the raw byte sequence.
Linear Memory Control: The WASM runtime allocates custom boundaries (alloc_bytes).
Deterministic Smoothing: Applies clamping and EMA smoothing to stabilize float variance, with a JS fallback path when WASM is unavailable.

2.2 WebGPU Execution Context

Shardy utilizes TypeGPU to bridge the gap between TypeScript strict typing and WGSL (WebGPU Shading Language):

Adapter Initialization: The worker requests the most powerful discrete GPU available, falling back to integrated graphics if needed.
Schema Alignment: The task payload is mapped to a memory-safe TaskPayloadSchema, ensuring alignment down to the byte.
Shader Dispatch: The node compiles the requested WGSL compute shader, binds the memory layouts, and dispatches the workgroups (e.g., executing multiply-accumulate operations across thousands of parallel threads).
Readback: The GPU executes the pass, and a copy-encoder safely maps the VRAM matrix back into an accessible Float32Array on the CPU layer.

2.3 Graceful Degradation (Lite Tasks)

If WebGPU is unavailable or the task type is CPU-bound, the node executes lite_test or super_lite_test workloads on the CPU. These are deterministic loops with lightweight checksums.

3. Transactions & the Proof Lifecycle

The fundamental transaction in Shardy is the exchange of completed node computations for network verification, which ultimately leads to decentralized rewards. However, since the network is permissionless, nodes are strictly untrusted. Orchestrators must enforce cryptographic guarantees that the node actually did the hardware work and didn’t simply spoof a result.

3.1 Generating the Result Digest

Upon completing the hardware execution, the compute.worker.ts produces a raw floating-point array. The worker runs foldOutputToField across the output buffer. This uses a domain tag and 32-bit word folding to produce a field element resultDigest.

3.2 The Zero-Knowledge (ZK) Circuit

Shardy uses the Groth16 SNARK protocol (via snarkjs) to prove computational integrity with zero knowledge.

The verification schema revolves around the circuit component shardy_task_proof_v2.circom. The constraint algorithm ensures algebraic equivalence:


witness + taskId + seed + outputLen === resultDigest;

taskId: The unique identifier granted by the orchestrator.
seed: Network entropy randomly assigned to the computation.
outputLen: The size constraint of the hardware readback.
resultDigest: The folded mathematical digest of the GPU output.

3.3 The Proof Generation Sequence

For the orchestrator to accept the computation as true, the node must generate a valid proof of knowledge for the witness:

Witness Calculation: Using the field arithmetic constrained by a prime modulo (the BN128 scalar field), the worker calculates the correct witness based on its uniquely produced resultDigest.
Trusted Setup Binaries: The worker downloads binary keys (.wasm circuits and .zkey files) from the network’s active version manifest.json.
SNARK Proving: The node executes groth16.fullProve entirely within the browser sandboxed environment, yielding a proof payload and corresponding publicSignals.
Verification Pre-Check: Before spending bandwidth, the node runs a fast local validation against its own VerificationKey. If valid, the task is marked completed.

3.4 Orchestrator Quorum & Payout

The node posts task_result with checksum and proof. The orchestrator validates the proof, verifies checksum consistency, and waits for redundancy quorum before emitting task_verified.

Because the witness perfectly satisfied the polynomial equation represented by the .zkey proving key, the Orchestrator’s groth16.verify routine yields true instantly, mathematically confirming that the digest was born of the exact inputs assigned to the node.

When a redundancy threshold (BFT quorum) of multiple distinct nodes returns verified proofs holding identical public signals, the network state transitions to Verified, and the orchestrator triggers on-chain telemetry mechanisms indicating the execution was successful and resources were expended.

Technical Summary

By unifying WebGPU for raw parallel tensor slicing, WASM for cross-platform deterministic preprocessing, and Circom / SnarkJS for robust cryptographic execution verification, a Shardy Node transforms a common web request into an elite, trustless high-performance computing edge device.