Skip to Content
Shardy Whitepaper (Draft)

SHARDY: The Decentralized WebGPU Supercomputer

Layer 1 Whitepaper & Architecture Blueprint (Draft)

Version 1.0 — Investor & Technical Overview


1. Executive Summary: The AI Compute Crisis & The Shardy Revolution

The global demand for artificial intelligence inference and training has created an unprecedented bottleneck in physical hardware. Centralized cloud providers (AWS, Google Cloud, Azure) control the market, resulting in exorbitant costs, severe hardware shortages, and monolithic points of failure.

At the same time, billions of consumer devices—high-end gaming PCs, modern Apple Silicon MacBooks, and premium smartphones—sit completely underutilized. Their combined Graphical Processing Unit (GPU) capacity dwarfs the data centers of the world’s largest tech conglomerates.

Shardy is a decentralized physical infrastructure network (DePIN) and Layer 1 blockchain designed to upend traditional cloud monopolies. By leveraging the untamed computational power of everyday devices through a standard web browser, Shardy transforms the global web into a massively parallel, permissionless supercomputer.

In Shardy, mining is not the useless busting of cryptographic hashes. It is the execution of real AI and GPU computations. Using an elegant combination of WebGPU, WebAssembly (WASM), and Zero-Knowledge (ZK) Cryptography, we turn any computer into a trusted, billing cell of physical infrastructure in one click—with zero software installation required.


2. Core Technical Philosophy & Advantages

Shardy bypasses the friction of traditional distributed computing protocols (which often require complex Docker setups, CLI knowledge, and heavy installations) by relying entirely on the modern browser sandbox.

2.0 End-to-End Pipeline Overview

The Shardy pipeline is a full loop: a task enters the network, is split into deterministic shards, computed on WebGPU, proven by ZK, and settled on-chain. The loop is designed so that no single actor can fake results or extract value without delivering computation.

2.1 The Bio-Mechanical Runtime (Browser Nodes)

When a user connects to the Shardy network via the node dashboard, the application initializes a sandbox environment within the browser using Dedicated Web Workers.

  • Zero-Copy Execution via WebGPU & WASM: Matrices and tensors are processed in parallel on user graphics cards through the browser (via TypeGPU). Shardy employs SharedArrayBuffer to move data between the network layer and the GPU worker with zero redundant memory copies.
  • Hardware Profiling: A worker is automatically benchmarked upon joining the network calculating GFLOPS and VRAM capabilities. Nodes are then categorized into Tiers (from Tier 1 integrated graphics to Tier 3 high-end multi-GPU workstations). Complex tasks are intelligently routed only to capable devices.
  • Computational Determinism (WASM EMA Smoothing): GPU hardware architectures (Nvidia vs. AMD vs. Apple) calculate floating-point numbers slightly differently. To ensure consensus isn’t broken by a rounding error, Shardy runs a hyper-optimized WebAssembly binary (compiled from Rust) that applies an Exponential Moving Average (EMA) smoothing algorithm to the tensor dataset before dispatching it to WebGPU, ensuring bit-for-bit matching results.
  • OPFS Mass Storage: Massive tensor chunks (gigabytes in size) that exceed RAM limits are streamed and cached directly to the host’s hard drive leveraging the Origin Private File System (OPFS), bypassing typical browser crashes.

Additional runtime mechanics (deep dive):

  • Task Bootstrapping: The Orchestrator supplies a signed “work package” that includes code hash, shader/compute graph ID, input shard metadata, expected output schema, and resource caps. This package is verified locally before execution to prevent arbitrary code execution or resource abuse.
  • Deterministic Execution Envelope: Shardy constrains non-deterministic sources by fixing shader compilation flags, enforcing consistent math modes, and applying a stable quantization profile on input tensors. The EMA step is a final stabilizer, but determinism is actively enforced end-to-end.
  • Progressive Warmup & Calibration: Each worker performs a lightweight calibration run to measure kernel launch overhead, memory bandwidth, and browser throttling behavior. These values are used to normalize throughput, avoid over-scheduling, and protect the user experience.
  • Adaptive Chunking: Large jobs are chunked into multiple micro-batches sized by VRAM and thermal limits. If a laptop starts throttling or the browser loses focus, chunk sizes auto-shrink to preserve completion rates.
  • Local Fault Handling: The runtime maintains a checkpoint cursor so that a browser refresh or tab crash does not force a full restart. When possible, the node can resume from the last verified shard.
  • Privacy Boundaries: Only task-specific tensors, not user files or system data, are accessible. The browser sandbox and explicit memory fences enforce a strict boundary between Shardy data and the host device.

Runtime performance metrics:

  • Effective Throughput (ET): ET = (FLOPs_completed / wall_time) * utilization_factor
  • Utilization Factor: utilization_factor = min(1, active_gpu_time / wall_time) and is tracked per node to penalize idle throttling or background tab suspension.
  • Shard Success Rate (SSR): SSR = successful_shards / assigned_shards
  • Normalized Throughput (NT): NT = ET / tier_baseline, where tier_baseline is a reference GFLOPS for each hardware tier.

2.2 Mathematical Trustlessness (Zero-Knowledge Proofs)

In a permissionless network, participants are anonymous and cannot be trusted. How do we guarantee the computations are real and not faked for rewards?

Shardy integrates ZK-SNARKs (Groth16) via Circom and SnarkJS to mathematically enforce computational integrity.

  1. Once WebGPU finishes computing, the worker folds the megabytes of output into a single highly collision-resistant scalar integer: the resultDigest.
  2. The node runs a ZK circuit acting as a cryptographic algebra constraint map. It proves the node knows the intermediate variables (witnesses) that satisfy the output.
  3. The incredibly heavy and long computation happening on the GPU generates a tiny .json proof. The Shardy Network verifies this proof in less than 10 milliseconds. This eliminates the need for the network to recalculate massive terabytes of data, achieving absolute fault-finding with zero network overhead.

ZK proof lifecycle (expanded):

  • Witness Construction: The worker replays a minimal, deterministic trace of the computation over the same input shard, extracting the witness values required by the circuit. This is smaller and more structured than the raw output.
  • Digest Commitment: The resultDigest is derived from both the output tensors and execution metadata (kernel version, quantization profile, and shard index). This ties the proof to a precise execution context.
  • Proof Compression: Proofs are compacted and gzip-encoded before submission, minimizing network overhead for mobile devices and constrained connections.
  • Verifier Parallelism: Validators can batch-verify multiple proofs in a single block, reducing total verification costs and enabling high throughput.
  • Fraud Containment: If the digest does not match cross-node redundancy results, the proof is rejected and the node is marked for slashing review.

Proof cost and verification budget:

  • Proof Overhead Ratio (POR): POR = proof_time / compute_time and is bounded by protocol targets to keep proof generation economically feasible.
  • Verifier Load (VL): VL = proofs_per_block * verify_cost_ms
  • Digest Collision Budget: P(collision) <= 2^(-k) where k is digest bit-length; protocol targets a collision probability far below validator failure rates.

3. The Orchestrator Mesh (L1 P2P Control Plane)

The Shardy Orchestrators represent the control plane of the decentralized ecosystem. They govern how thousands of anonymous clients function collectively, utilizing an event-driven Bun architecture.

3.1 Network Telemetry & GossipSub

To decentralize connectivity and scale infinitely without single-point bottlenecks, Shardy Orchestrators embed Libp2p.

  • Transaction Gossiping: Meaningful state changes, ZK-proof certificates, and task assignments float through encrypted P2P channels utilizing GossipSub v1.1.
  • Shared Mempool: Instead of a master-slave HTTP architecture, Orchestrators push incoming tasks into a global Mempool. Orchestrators discover each other through Kademlia DHT.

3.2 Byzantine Fault Tolerance (BFT) & Dispatch

Tasks are not mapped 1-to-1. To enforce BFT:

  • Redundancy Matrix: A single AI payload is securely duplicated and routed to multiple distinct nodes (e.g., REDUNDANCY_FACTOR = 2).
  • Watchdogs & DLQ: If a user closes their browser or their mobile device suspends the app, the Orchestrator’s Watchdog triggers an ack_timeout or execution_timeout. The task is instantaneously and seamlessly reassigned to the next high-availability node. If a payload is permanently failing due to complexity, it’s cordoned into a Dead Letter Queue (DLQ) to protect network latency.

Dispatch internals (expanded):

  • Bidirectional Scheduling: Nodes advertise a live capability vector (compute tier, latency band, thermal headroom). Orchestrators match tasks to nodes using a weighted scoring function that balances cost, speed, and reliability.
  • Admission Control: To prevent overload, each node has a per-device concurrency cap derived from its calibration profile and battery status.
  • Proof-Aware Routing: Tasks that require stronger trust or higher value are routed to multiple, higher-tier nodes to reduce dispute probability.
  • Result Reconciliation: Redundant outputs are compared against a canonical digest; mismatches trigger a re-run on a third node to break ties.
  • Network Backpressure: Mempool size and DLQ growth signal congestion and automatically reduce task intake, preventing cascading failures.

3.3 Task → Proof → Settlement (Detailed Sequence)


4. Consensus & Replicated State Machine (L1 Settlement)

To transition from a “Proof of Concept” centralized server into a sovereign Layer 1 Blockchain, Shardy relies on a decentralized Replicated State Machine (RSM). Orchestrators act as Validators.

4.1 Block Formulation and State Root

  • Leader Slots: The blockchain time is divided into slots. Validating Orchestrators take turns collecting verified ZK-results from the Mempool and formulating “Candidate Blocks”.
  • Global State Root (Merkle Tree): Each block commits a stateRoot—a cryptographic hash of the entire network’s database. If a rogue orchestrator applies a mutated database, their state hash will differ, and the consensus mechanism will halt and drop them from the chain, enforcing network truth.

4.2 Economic Finalization

When nodes evaluate identical math, produce identical proofs, and multiple verifying orchestrators confirm it, the block is sealed. The Shardy State Machine executes:

  1. Reward Distribution: Verified tasks command a programmatic payout in the native tSHRD token directly to the connected Web3 wallet of the participating Worker node.
  2. Consensus Mismatch (Slashing): If a malicious user alters the client code to forge witness outputs and returns a manipulated output, the Orchestrator detects the discrepancy instantly against the redundant sister-computations. applySlashing() is invoked: the malicious node’s balance is slashed, their staked collateral is seized, and their identity is aggressively blocked from the mesh.

Consensus mechanics (expanded):

  • State Transitions as Tasks: Each task completion results in a deterministic state transition that updates balances, reputation scores, and validator fees.
  • Validator Quorum: Finality requires a quorum of validators to attest to the same state root; if quorum cannot be reached in a slot, the block is skipped without reorg chaos.
  • Fork Resolution: In the event of competing candidate blocks, the chain selects the block with the most validator attestations and the highest cumulative stake weight.
  • Economic Security Envelope: Validator rewards are proportional to uptime, proof verification throughput, and honest participation; misbehavior costs both stake and future earning potential.

Settlement metrics:

  • Finality Latency (FL): FL = slot_time * slots_to_finality
  • Settlement Throughput (ST): ST = verified_results_per_block / block_time
  • State Divergence Risk (SDR): SDR = 1 - quorum_attestation_rate (targeting near-zero)

5. Tokenomics & Security Incentives (Overview)

Shardy thrives by aligning the greed of network operators with mathematical obedience.

  • Workers (Compute Providers): Required to stake a micro-minimum of $SHRD to throttle Sybil attacks. They earn high yields matching the market value of inference delivery, transforming dead graphics cards into cash flows.
  • Validators (Orchestrators): Required to stake significantly higher capital. They earn transaction fees for packaging blocks, managing Mempools, and serving the Libp2p networking mesh.
  • Slashing Mechanics: Scammers are mathematically caught via mismatch protocols, meaning an attempt to manipulate calculations or spoof WebGPU loads carries a direct, guaranteed financial loss.

Economic design details (expanded):

  • Dynamic Pricing: Compute pricing adapts to demand and supply on the network. Peak demand pays a premium, while idle periods discount tasks to stimulate usage.
  • Reputation Multiplier: Nodes with consistent proof acceptance receive a modest throughput bonus, incentivizing long-term honest participation without centralization.
  • Uptime Insurance Pool: A portion of validator fees flows into a pool that compensates users for failed or delayed jobs, creating a quality-of-service backstop.
  • Sybil Dampening: Staking requirements scale with node count per wallet, making large-scale identity attacks prohibitively expensive.
  • Fair-Use Guardrails: Minimum payouts and anti-dust rules prevent exploitative micro-task spam that could waste network bandwidth.

5.1 Pricing & Cost Model (Expanded)

Shardy pricing is computed per shard, and aggregates to job-level cost.

  • Base Compute Cost:
    Cost_compute = (FLOPs / 10^12) * price_per_TFLOP
  • Bandwidth Cost:
    Cost_bw = (bytes_in + bytes_out) * price_per_byte
  • Reliability Premium:
    Cost_rel = Cost_compute * redundancy_factor * reliability_multiplier
  • Total Job Cost:
    Cost_total = sum(Cost_compute + Cost_bw + Cost_rel) + network_fee

Suggested coefficients (protocol targets):

  • redundancy_factor default 2
  • reliability_multiplier from 0.05 to 0.30 depending on SLA
  • network_fee dynamically sized to keep validator participation profitable even at low demand

Worked example (single job, anchored to market references):

  • Assumptions (USD, March 11, 2026): Shardy prices are pegged to prevailing cloud GPU list rates. For reference, AWS EC2 Capacity Blocks list a single H100 (p5.4xlarge) at $3.933/hour (US East, Ohio), and Google Cloud lists A100 80GB accelerators starting at $4.713696/hour in Serverless Spark pricing.
  • Derived target rate: price_per_TFLOP = 0.035 (maps to the ~$3.5–$4.7/hour GPU class after overheads)
  • Job size: FLOPs = 25e12 (25 TFLOPs total), bytes_in = 600e6, bytes_out = 150e6
  • Pricing: price_per_TFLOP = 0.035, price_per_byte = 2e-10, redundancy_factor = 2, reliability_multiplier = 0.10, network_fee = 0.25
  • Compute cost: Cost_compute = (25e12 / 1e12) * 0.035 = 0.875
  • Bandwidth cost: Cost_bw = (750e6) * 2e-10 = 0.15
  • Reliability cost: Cost_rel = 0.875 * 2 * 0.10 = 0.175
  • Total cost: Cost_total = (0.875 + 0.15 + 0.175) + 0.25 = 1.45
  • Interpretation: The job costs 1.45 in the network unit of account (e.g., $SHRD), aligned to real cloud GPU list prices.

5.2 Rewards & Payout Formula

Worker payout is proportional to verified work and adjusted for honesty and reliability.

  • Base Payout:
    P_base = Cost_compute * payout_ratio
  • Quality Multiplier:
    Q = 0.8 + 0.2 * SSR
  • Reputation Multiplier:
    R = clamp(1 + reputation_score / 1000, 0.9, 1.2)
  • Final Worker Payout:
    P_worker = P_base * Q * R - penalties

Validator payout is tied to settlement throughput and proof verification.

  • Validator Reward:
    P_validator = network_fee * (uptime_weight * verify_weight)

Worked example (worker payout, anchored to market references):

  • Inputs: Cost_compute = 0.875, payout_ratio = 0.80, SSR = 0.98, reputation_score = 120, penalties = 0.02
  • Base payout: P_base = 0.875 * 0.80 = 0.70
  • Quality multiplier: Q = 0.8 + 0.2 * 0.98 = 0.996
  • Reputation multiplier: R = clamp(1 + 120/1000, 0.9, 1.2) = 1.12
  • Final payout: P_worker = 0.70 * 0.996 * 1.12 - 0.02 = 0.76
  • Interpretation: The worker receives 0.76 net for this shard group, with penalties already deducted.

5.3 Slashing, Insurance, and Dispute Resolution

  • Slashing Trigger: If proof_invalid || digest_mismatch, then slash = stake * slash_rate.
  • Dispute Re-Run: If two valid proofs disagree, a third node re-runs the shard; the majority result wins.
  • Insurance Payout: Users receive credits for shards delayed beyond SLA thresholds.

6. The Roadmap & To-Market Strategy

Shardy has successfully proven the absolute hardest technical challenge: un-falsifiable distributed compute directly inside a consumer web-browser.

To reach full capitalization and consumer readiness, the Immediate Engineering Roadmap includes:

  1. L1 Mainnet Consensus: Transitioning the P2P libp2p layer to support decentralized Validating Nodes (Multi-Orchestrator scaling with automated Merkle root validation).
  2. Visual Block Explorer: A globally facing blockchain explorer proving the sheer transaction volume, tracking TFLOPS in real-time, and publicizing live SNARK block generations.
  3. The “Killer-Demo”: Launching a complex applied AI inference (e.g. LLM text generation or image inference) entirely partitioned, processed, and validated across a swarm of anonymous mobile devices and laptops, proving the commercial viability of browser-based DePIN to enterprise clients.

Go-to-market depth (expanded):

  • Developer Onboarding: Provide SDKs for browser-compatible inference workloads, starter templates, and pre-built pipelines for common ML tasks.
  • Enterprise Pilots: Target customers with elastic inference needs, such as media rendering, recommendation engines, or large-scale image processing.
  • Ecosystem Incentives: Grant programs for teams that build workloads optimized for WebGPU and verifiable computation.
  • Regulatory Readiness: Formalize compliance and privacy policies for regions with strict data residency requirements.

7. Formal Metrics & System Targets

This section defines the core metrics the protocol optimizes, and how they interact.

  • Network Throughput (NTW): NTW = sum(ET_i) for all active workers
  • Task Completion Time (TCT): TCT = queue_time + compute_time + proof_time + settlement_time
  • Cost Efficiency (CE): CE = Cost_total / effective_TFLOPs
  • Proof Acceptance Rate (PAR): PAR = accepted_proofs / total_proofs
  • Redundancy Overhead (RO): RO = (redundancy_factor - 1) * 100%
  • Energy Efficiency Proxy (EEP): EEP = ET / estimated_watts where watts are approximated by tier and device class.

Target ranges (initial protocol goals):

  • PAR >= 0.995
  • TCT <= 2x compute_time for latency-sensitive jobs
  • RO <= 100% at default redundancy
  • CE competitive with low-end cloud inference for burst workloads

7.1 Comparative Metrics vs. Traditional Cloud

The following table uses illustrative values to compare Shardy to a typical cloud inference pipeline for burst workloads. The numbers are placeholders meant to show relative direction, not final pricing.

MetricShardy (Browser DePIN)Traditional Cloud
Provisioning timeSeconds (no install)Minutes to hours
Elastic scaleUser devices, high varianceDatacenter, predictable
Unit cost (burst)Lower during idle supplyHigher during peak
Trust modelZK proof + redundancyVendor trust
Fault toleranceCross-node re-runRegion failover
Data residencyEdge-localizableRegion-bound
Verification costMilliseconds per proofN/A
SLA profileEvolving, pool-backedContractual

Cloud price anchors (USD list rates, for comparison):

  • AWS EC2 Capacity Blocks (H100): p5.4xlarge at $3.933/hour in US East (Ohio).
  • Google Cloud Serverless Spark Accelerators: A100 80GB starting at $4.713696/hour, A100 40GB at $3.52069/hour, L4 at $0.672048/hour.

7.2 Numerical Example: End-to-End Latency

  • Queue time: 0.8s
  • Compute time: 4.0s
  • Proof time: 1.0s
  • Settlement time: 2.2s (1 slot finality)
  • TCT: 0.8 + 4.0 + 1.0 + 2.2 = 8.0s
  • Interpretation: A short inference job completes in about 8 seconds with verifiable settlement.

8. Technical Appendix (Auditor-Focused)

8.0 Formal Guarantees (Invariants)

The protocol maintains the following invariants under the assumed cryptographic hardness of Groth16 and the honesty threshold of validators.

  • I1 — Deterministic Shard Mapping: For any task T and shard index i, Shard(T,i) is uniquely defined by (task_id, shard_index, circuit_version) and cannot be altered without changing the shard hash.
  • I2 — Proof Binding: A valid proof π is bound to exactly one (digest, circuit_version, shard_index) tuple; any change invalidates verification.
  • I3 — Settlement Atomicity: A shard is either fully settled (balance updates + reputation updates + validator fees) or not settled at all; partial application is forbidden by state root checks.
  • I4 — Slashing Safety: A worker cannot be slashed without at least one verifiable mismatch (invalid proof or digest conflict against redundant outputs).
  • I5 — Non-Double-Pay: For a given shard, rewards are paid once and only once; replays are prevented by shard nonce and block height window.

8.1 Threat Model

Assumptions and attacker capabilities:

  • Adversary can control multiple worker nodes (Sybil), submit malicious tasks, and attempt to forge outputs.
  • Adversary can delay or drop network packets (partial DoS), but cannot break standard cryptography or forge ZK proofs.
  • Adversary can compromise a minority of validators, but not a stake majority required for finality.

Threats and mitigations:

  • Forged computation: Mitigated by ZK proofs + redundancy; invalid proofs are rejected and slashable.
  • Result withholding: Mitigated by timeouts and DLQ re-assignments; non-responsive nodes lose reputation.
  • Sybil swarm: Mitigated by stake requirements and scaling stake with node count per wallet.
  • Consensus manipulation: Mitigated by quorum attestation and stake-weighted fork resolution.
  • Replay attacks: Mitigated by shard nonces and block height windows in proofs.

8.1 Determinism and Numeric Stability

  • Deterministic Input Sharding: Each shard is indexed and hashed; the same shard always produces the same digest given identical inputs and kernel versions.
  • Quantization Profile: Inputs are normalized to a fixed precision profile prior to GPU dispatch, reducing floating-point drift across hardware.
  • EMA Stabilization: EMA smoothing is applied to sensitive tensor segments; if EMA output deviates beyond tolerance, the shard is automatically retried.

8.2 Proof System Integrity

  • Circuit Separation: Proof circuits are versioned; a proof must reference the exact circuit hash accepted by validators.
  • Witness Minimality: Witness data includes only required intermediate values, reducing leakage and proof size.
  • Replay Protection: Proofs include a shard nonce and block height window to prevent replay attacks.

8.3 Security and Slashing

  • Dishonest Node Detection: Triggered by digest_mismatch, proof_invalid, or repeated execution timeouts.
  • Slashing Gradient: Penalties increase with repeated infractions; first offenses may be partial, chronic offenders are fully slashed.
  • Appeal Window: A short on-chain dispute window allows re-run validation for conflicting proofs.

8.4 Data Handling and Privacy

  • Browser Isolation: Tasks run in isolated workers; no access to local file system except OPFS scope assigned to Shardy.
  • Data Retention: Shards are deleted after settlement unless retention is explicitly requested by the submitter.
  • Audit Logs: Orchestrators maintain verifiable logs (hash-chained) for task assignment, proof receipt, and settlement outcomes.

8.5 Compliance Readiness

  • Geofenced Dispatch: Orchestrators can constrain shard routing to specific regions.
  • PII Redaction: Optional preprocessing step to strip or tokenize sensitive fields before compute.

8.6 Audit Checklist

Use this checklist to verify protocol safety, determinism, and economic correctness.

  • Determinism & Sharding
  • Confirm shard hashing is stable across platforms and identical inputs.
  • Confirm quantization profile is fixed and versioned.
  • Validate EMA smoothing bounds and retry thresholds.
  • Proof System
  • Verify circuit versioning and hash pinning.
  • Validate witness minimality and absence of leaked data.
  • Confirm proof includes nonce and block height window.
  • Consensus & Finality
  • Verify state root is computed deterministically for each block.
  • Confirm quorum threshold and stake-weighting rules.
  • Validate fork-choice rules under competing blocks.
  • Economic Integrity
  • Confirm payout calculations match on-chain formulas.
  • Validate slashing triggers and penalty gradient.
  • Check insurance pool accounting and payout caps.
  • Security & Abuse
  • Simulate Sybil attempts and ensure stake escalation.
  • Validate timeout, DLQ, and reassignment behaviors.
  • Check replay protection for shard proofs.
  • Privacy & Compliance
  • Confirm OPFS data retention and deletion policies.
  • Verify geofenced dispatch is enforced.
  • Audit logs are hash-chained and tamper-evident.

9. Conclusion

Cloud computing is a $500 Billion industry reliant on physical real estate, server cooling, and monopoly pricing. Shardy sidesteps capital expenditure entirely, unlocking the hardware sitting idle in millions of homes worldwide.

By unifying WebGPU for raw parallel tensor slicing, WASM for cross-platform deterministic smoothing, and Circom / SnarkJS for bulletproof cryptographic execution verification, a Shardy Node transforms a standard web request into an elite, trustless high-performance computing cluster.

Shardy isn’t just a blockchain. It is the distributed brain of the open internet.

Last updated on