SHARDY: The Decentralized WebGPU Supercomputer

Layer 1 Whitepaper & Architecture Blueprint (Draft)

Version 1.0 — Investor & Technical Overview

1. Executive Summary: The AI Compute Crisis & The Shardy Revolution

The global demand for artificial intelligence inference and training has created an unprecedented bottleneck in physical hardware. Centralized cloud providers (AWS, Google Cloud, Azure) control the market, resulting in exorbitant costs, severe hardware shortages, and monolithic points of failure.

At the same time, billions of consumer devices—high-end gaming PCs, modern Apple Silicon MacBooks, and premium smartphones—sit completely underutilized. Their combined Graphical Processing Unit (GPU) capacity dwarfs the data centers of the world’s largest tech conglomerates.

Shardy is a decentralized physical infrastructure network (DePIN) with a state-based consensus layer designed to upend traditional cloud monopolies. By leveraging everyday devices through a standard web browser, Shardy transforms the global web into a massively parallel, permissionless supercomputer.

In Shardy, mining is not hash grinding. It is real compute verified with WebGPU, WebAssembly (WASM), and Zero-Knowledge (ZK) proofs, all inside a browser with zero software installation.

2. Core Technical Philosophy & Advantages

Shardy bypasses the friction of traditional distributed computing protocols (which often require complex Docker setups, CLI knowledge, and heavy installations) by relying entirely on the modern browser sandbox.

2.0 End-to-End Pipeline Overview

The Shardy pipeline is a full loop: a task enters the network, is split into deterministic shards, computed on WebGPU, proven by ZK, and settled on-chain. The loop is designed so that no single actor can fake results or extract value without delivering computation.

2.1 The Browser Node Runtime (Current)

When a user connects to the Shardy network via the node dashboard, the application initializes a sandbox environment within the browser using Dedicated Web Workers.

WebGPU + TypeGPU: GPU execution happens in compute.worker.ts using TypeGPU-backed WGSL shaders.
Hardware Profiling: Workers benchmark GFLOPS and memory stability, then receive Tier 1/2/3 admission.
Deterministic Preprocessing: A Rust→WASM preprocessor clamps/EMA-smooths inputs, with JS fallback.
OPFS Checkpoints: Task payloads are checkpointed in OPFS for recovery.

Additional runtime mechanics (deep dive):

Task Bootstrapping: Orchestrator sends a protobuf meta frame followed by binary payload.
Local Fault Handling: The runtime maintains OPFS checkpoints for recovery after refresh.
Privacy Boundaries: Execution is constrained by browser sandbox APIs.

Runtime performance metrics:

Effective Throughput (ET): ET = (FLOPs_completed / wall_time) * utilization_factor
Utilization Factor: utilization_factor = min(1, active_gpu_time / wall_time) and is tracked per node to penalize idle throttling or background tab suspension.
Shard Success Rate (SSR): SSR = successful_shards / assigned_shards
Normalized Throughput (NT): NT = ET / tier_baseline, where tier_baseline is a reference GFLOPS for each hardware tier.

2.2 Mathematical Trustlessness (Zero-Knowledge Proofs)

In a permissionless network, participants are anonymous and cannot be trusted. How do we guarantee the computations are real and not faked for rewards?

Shardy integrates ZK-SNARKs (Groth16) via Circom and SnarkJS to mathematically enforce computational integrity.

WebGPU output is folded into a field element resultDigest.
The node constructs a witness such that witness + taskId + seed + outputLen = resultDigest.
The node generates a Groth16 proof and locally verifies it before submission. This eliminates the need for the network to recalculate massive terabytes of data, achieving absolute fault-finding with zero network overhead.

ZK proof lifecycle (expanded):

Witness Construction: The worker derives witness from resultDigest, taskId, seed, outputLen.
Verifier Versioning: Proofs are tied to verifierVersion from the manifest.
Fraud Containment: Proof invalidity or checksum mismatch triggers slashing.

Proof cost and verification budget:

Proof Overhead Ratio (POR): POR = proof_time / compute_time and is bounded by protocol targets to keep proof generation economically feasible.
Verifier Load (VL): VL = proofs_per_block * verify_cost_ms
Digest Collision Budget: P(collision) <= 2^(-k) where k is digest bit-length; protocol targets a collision probability far below validator failure rates.

3. The Orchestrator Mesh (P2P Control Plane)

The Shardy Orchestrators represent the control plane of the decentralized ecosystem. They govern how thousands of anonymous clients function collectively, utilizing an event-driven Bun architecture.

3.1 Network Telemetry & GossipSub

To decentralize connectivity and scale infinitely without single-point bottlenecks, Shardy Orchestrators embed Libp2p.

Transaction Gossiping: Consensus transactions flow through shardy.blockchain.transactions.v1.
Presence Gossip: Nodes publish signed presence through shardy.node.presence.v1.

3.2 Byzantine Fault Tolerance (BFT) & Dispatch

Tasks are not mapped 1-to-1. To enforce BFT:

Redundancy Matrix: Standard tasks use REDUNDANCY_FACTOR = 2; stress/lite tasks require 1 completion.
Watchdogs & DLQ: ack_timeout and execution_timeout trigger retry/reassign/dead-letter actions.

Dispatch internals (expanded):

Admission Control: Nodes must pass GFLOPS and memory stability thresholds.
Result Reconciliation: Redundant outputs are compared; mismatches trigger consensus mismatch and slashing.

3.3 Task → Proof → Settlement (Detailed Sequence)

4. Consensus & Replicated State Machine (Settlement)

To transition from a “Proof of Concept” centralized server into a sovereign Layer 1 Blockchain, Shardy relies on a decentralized Replicated State Machine (RSM). Orchestrators act as Validators.

4.1 Block Formulation and State Root

Leader Slots: Time is divided into slots; leaders propose blocks from mempool.
Global State Root: Each block commits a state root hash; mismatches halt consensus.

4.2 Economic Finalization

When nodes evaluate identical math, produce identical proofs, and multiple verifying orchestrators confirm it, the block is sealed. The Shardy State Machine executes:

Reward Distribution: Verified tasks command a programmatic payout in the native tSHRD token directly to the connected Web3 wallet of the participating Worker node.
Consensus Mismatch (Slashing): If a malicious user alters the client code to forge witness outputs and returns a manipulated output, the Orchestrator detects the discrepancy instantly against the redundant sister-computations. applySlashing() is invoked: the malicious node’s balance is slashed, their staked collateral is seized, and their identity is aggressively blocked from the mesh.

Consensus mechanics (expanded):

State Transitions as Tasks: Each task completion results in a deterministic state transition that updates balances, reputation scores, and validator fees.
Validator Quorum: Finality requires a quorum of validators to attest to the same state root; if quorum cannot be reached in a slot, the block is skipped without reorg chaos.
Fork Resolution: In the event of competing candidate blocks, the chain selects the block with the most validator attestations and the highest cumulative stake weight.
Economic Security Envelope: Validator rewards are proportional to uptime, proof verification throughput, and honest participation; misbehavior costs both stake and future earning potential.

Settlement metrics:

Finality Latency (FL): FL = slot_time * slots_to_finality
Settlement Throughput (ST): ST = verified_results_per_block / block_time
State Divergence Risk (SDR): SDR = 1 - quorum_attestation_rate (targeting near-zero)

5. Tokenomics & Security Incentives (Overview)

Shardy thrives by aligning the greed of network operators with mathematical obedience.

Workers (Compute Providers): Required to stake a micro-minimum of $SHRD to throttle Sybil attacks. They earn high yields matching the market value of inference delivery, transforming dead graphics cards into cash flows.
Validators (Orchestrators): Required to stake significantly higher capital. They earn transaction fees for packaging blocks, managing Mempools, and serving the Libp2p networking mesh.
Slashing Mechanics: Scammers are mathematically caught via mismatch protocols, meaning an attempt to manipulate calculations or spoof WebGPU loads carries a direct, guaranteed financial loss.

Economic design details (expanded):

Dynamic Pricing: Compute pricing adapts to demand and supply on the network. Peak demand pays a premium, while idle periods discount tasks to stimulate usage.
Reputation Multiplier: Nodes with consistent proof acceptance receive a modest throughput bonus, incentivizing long-term honest participation without centralization.
Uptime Insurance Pool: A portion of validator fees flows into a pool that compensates users for failed or delayed jobs, creating a quality-of-service backstop.
Sybil Dampening: Staking requirements scale with node count per wallet, making large-scale identity attacks prohibitively expensive.
Fair-Use Guardrails: Minimum payouts and anti-dust rules prevent exploitative micro-task spam that could waste network bandwidth.

5.1 Pricing & Cost Model (Expanded)

Shardy pricing is computed per shard, and aggregates to job-level cost.

Base Compute Cost:
Cost_compute = (FLOPs / 10^12) * price_per_TFLOP
Bandwidth Cost:
Cost_bw = (bytes_in + bytes_out) * price_per_byte
Reliability Premium:
Cost_rel = Cost_compute * redundancy_factor * reliability_multiplier
Total Job Cost:
Cost_total = sum(Cost_compute + Cost_bw + Cost_rel) + network_fee

Suggested coefficients (protocol targets):

redundancy_factor default 2
reliability_multiplier from 0.05 to 0.30 depending on SLA
network_fee dynamically sized to keep validator participation profitable even at low demand

Worked example (single job, anchored to market references):

Assumptions (USD, March 11, 2026): Shardy prices are pegged to prevailing cloud GPU list rates. For reference, AWS EC2 Capacity Blocks list a single H100 (p5.4xlarge) at $3.933/hour (US East, Ohio), and Google Cloud lists A100 80GB accelerators starting at $4.713696/hour in Serverless Spark pricing.
Derived target rate: price_per_TFLOP = 0.035 (maps to the ~$3.5–$4.7/hour GPU class after overheads)
Job size: FLOPs = 25e12 (25 TFLOPs total), bytes_in = 600e6, bytes_out = 150e6
Pricing: price_per_TFLOP = 0.035, price_per_byte = 2e-10, redundancy_factor = 2, reliability_multiplier = 0.10, network_fee = 0.25
Compute cost: Cost_compute = (25e12 / 1e12) * 0.035 = 0.875
Bandwidth cost: Cost_bw = (750e6) * 2e-10 = 0.15
Reliability cost: Cost_rel = 0.875 * 2 * 0.10 = 0.175
Total cost: Cost_total = (0.875 + 0.15 + 0.175) + 0.25 = 1.45
Interpretation: The job costs 1.45 in the network unit of account (e.g., $SHRD), aligned to real cloud GPU list prices.

5.2 Rewards & Payout Formula

Worker payout is proportional to verified work and adjusted for honesty and reliability.

Base Payout:
P_base = Cost_compute * payout_ratio
Quality Multiplier:
Q = 0.8 + 0.2 * SSR
Reputation Multiplier:
R = clamp(1 + reputation_score / 1000, 0.9, 1.2)
Final Worker Payout:
P_worker = P_base * Q * R - penalties

Validator payout is tied to settlement throughput and proof verification.

Validator Reward:
P_validator = network_fee * (uptime_weight * verify_weight)

Worked example (worker payout, anchored to market references):

Inputs: Cost_compute = 0.875, payout_ratio = 0.80, SSR = 0.98, reputation_score = 120, penalties = 0.02
Base payout: P_base = 0.875 * 0.80 = 0.70
Quality multiplier: Q = 0.8 + 0.2 * 0.98 = 0.996
Reputation multiplier: R = clamp(1 + 120/1000, 0.9, 1.2) = 1.12
Final payout: P_worker = 0.70 * 0.996 * 1.12 - 0.02 = 0.76
Interpretation: The worker receives 0.76 net for this shard group, with penalties already deducted.

5.3 Slashing, Insurance, and Dispute Resolution

Slashing Trigger: If proof_invalid || digest_mismatch, then slash = stake * slash_rate.
Dispute Re-Run: If two valid proofs disagree, a third node re-runs the shard; the majority result wins.
Insurance Payout: Users receive credits for shards delayed beyond SLA thresholds.

6. The Roadmap & To-Market Strategy

Shardy has successfully proven the absolute hardest technical challenge: un-falsifiable distributed compute directly inside a consumer web-browser.

To reach full capitalization and consumer readiness, the Immediate Engineering Roadmap includes:

L1 Mainnet Consensus: Transitioning the P2P libp2p layer to support decentralized Validating Nodes (Multi-Orchestrator scaling with automated Merkle root validation).
Visual Block Explorer: A globally facing blockchain explorer proving the sheer transaction volume, tracking TFLOPS in real-time, and publicizing live SNARK block generations.
The “Killer-Demo”: Launching a complex applied AI inference (e.g. LLM text generation or image inference) entirely partitioned, processed, and validated across a swarm of anonymous mobile devices and laptops, proving the commercial viability of browser-based DePIN to enterprise clients.

Go-to-market depth (expanded):

Developer Onboarding: Provide SDKs for browser-compatible inference workloads, starter templates, and pre-built pipelines for common ML tasks.
Enterprise Pilots: Target customers with elastic inference needs, such as media rendering, recommendation engines, or large-scale image processing.
Ecosystem Incentives: Grant programs for teams that build workloads optimized for WebGPU and verifiable computation.
Regulatory Readiness: Formalize compliance and privacy policies for regions with strict data residency requirements.

7. Formal Metrics & System Targets

This section defines the core metrics the protocol optimizes, and how they interact.

Network Throughput (NTW): NTW = sum(ET_i) for all active workers
Task Completion Time (TCT): TCT = queue_time + compute_time + proof_time + settlement_time
Cost Efficiency (CE): CE = Cost_total / effective_TFLOPs
Proof Acceptance Rate (PAR): PAR = accepted_proofs / total_proofs
Redundancy Overhead (RO): RO = (redundancy_factor - 1) * 100%
Energy Efficiency Proxy (EEP): EEP = ET / estimated_watts where watts are approximated by tier and device class.

Target ranges (initial protocol goals):

PAR >= 0.995
TCT <= 2x compute_time for latency-sensitive jobs
RO <= 100% at default redundancy
CE competitive with low-end cloud inference for burst workloads

7.1 Comparative Metrics vs. Traditional Cloud

The following table uses illustrative values to compare Shardy to a typical cloud inference pipeline for burst workloads. The numbers are placeholders meant to show relative direction, not final pricing.

Metric	Shardy (Browser DePIN)	Traditional Cloud
Provisioning time	Seconds (no install)	Minutes to hours
Elastic scale	User devices, high variance	Datacenter, predictable
Unit cost (burst)	Lower during idle supply	Higher during peak
Trust model	ZK proof + redundancy	Vendor trust
Fault tolerance	Cross-node re-run	Region failover
Data residency	Edge-localizable	Region-bound
Verification cost	Milliseconds per proof	N/A
SLA profile	Evolving, pool-backed	Contractual

Cloud price anchors (USD list rates, for comparison):

AWS EC2 Capacity Blocks (H100): p5.4xlarge at $3.933/hour in US East (Ohio).
Google Cloud Serverless Spark Accelerators: A100 80GB starting at $4.713696/hour, A100 40GB at $3.52069/hour, L4 at $0.672048/hour.

7.2 Numerical Example: End-to-End Latency

Queue time: 0.8s
Compute time: 4.0s
Proof time: 1.0s
Settlement time: 2.2s (1 slot finality)
TCT: 0.8 + 4.0 + 1.0 + 2.2 = 8.0s
Interpretation: A short inference job completes in about 8 seconds with verifiable settlement.

8. Technical Appendix (Auditor-Focused)

8.0 Formal Guarantees (Invariants)

The protocol maintains the following invariants under the assumed cryptographic hardness of Groth16 and the honesty threshold of validators.

I1 — Deterministic Shard Mapping: For any task T and shard index i, Shard(T,i) is uniquely defined by (task_id, shard_index, circuit_version) and cannot be altered without changing the shard hash.
I2 — Proof Binding: A valid proof π is bound to exactly one (digest, circuit_version, shard_index) tuple; any change invalidates verification.
I3 — Settlement Atomicity: A shard is either fully settled (balance updates + reputation updates + validator fees) or not settled at all; partial application is forbidden by state root checks.
I4 — Slashing Safety: A worker cannot be slashed without at least one verifiable mismatch (invalid proof or digest conflict against redundant outputs).
I5 — Non-Double-Pay: For a given shard, rewards are paid once and only once; replays are prevented by shard nonce and block height window.

8.1 Threat Model

Assumptions and attacker capabilities:

Adversary can control multiple worker nodes (Sybil), submit malicious tasks, and attempt to forge outputs.
Adversary can delay or drop network packets (partial DoS), but cannot break standard cryptography or forge ZK proofs.
Adversary can compromise a minority of validators, but not a stake majority required for finality.

Threats and mitigations:

Forged computation: Mitigated by ZK proofs + redundancy; invalid proofs are rejected and slashable.
Result withholding: Mitigated by timeouts and DLQ re-assignments; non-responsive nodes lose reputation.
Sybil swarm: Mitigated by stake requirements and scaling stake with node count per wallet.
Consensus manipulation: Mitigated by quorum attestation and stake-weighted fork resolution.
Replay attacks: Mitigated by shard nonces and block height windows in proofs.

8.1 Determinism and Numeric Stability

Deterministic Input Sharding: Each shard is indexed and hashed; the same shard always produces the same digest given identical inputs and kernel versions.
Quantization Profile: Inputs are normalized to a fixed precision profile prior to GPU dispatch, reducing floating-point drift across hardware.
EMA Stabilization: EMA smoothing is applied to sensitive tensor segments; if EMA output deviates beyond tolerance, the shard is automatically retried.

8.2 Proof System Integrity

Circuit Separation: Proof circuits are versioned; a proof must reference the exact circuit hash accepted by validators.
Witness Minimality: Witness data includes only required intermediate values, reducing leakage and proof size.
Replay Protection: Proofs include a shard nonce and block height window to prevent replay attacks.

8.3 Security and Slashing

Dishonest Node Detection: Triggered by digest_mismatch, proof_invalid, or repeated execution timeouts.
Slashing Gradient: Penalties increase with repeated infractions; first offenses may be partial, chronic offenders are fully slashed.
Appeal Window: A short on-chain dispute window allows re-run validation for conflicting proofs.

8.4 Data Handling and Privacy

Browser Isolation: Tasks run in isolated workers; no access to local file system except OPFS scope assigned to Shardy.
Data Retention: Shards are deleted after settlement unless retention is explicitly requested by the submitter.
Audit Logs: Orchestrators maintain verifiable logs (hash-chained) for task assignment, proof receipt, and settlement outcomes.

8.5 Compliance Readiness

Geofenced Dispatch: Orchestrators can constrain shard routing to specific regions.
PII Redaction: Optional preprocessing step to strip or tokenize sensitive fields before compute.

8.6 Audit Checklist

Use this checklist to verify protocol safety, determinism, and economic correctness.

Determinism & Sharding
Confirm shard hashing is stable across platforms and identical inputs.
Confirm quantization profile is fixed and versioned.
Validate EMA smoothing bounds and retry thresholds.
Proof System
Verify circuit versioning and hash pinning.
Validate witness minimality and absence of leaked data.
Confirm proof includes nonce and block height window.
Consensus & Finality
Verify state root is computed deterministically for each block.
Confirm quorum threshold and stake-weighting rules.
Validate fork-choice rules under competing blocks.
Economic Integrity
Confirm payout calculations match on-chain formulas.
Validate slashing triggers and penalty gradient.
Check insurance pool accounting and payout caps.
Security & Abuse
Simulate Sybil attempts and ensure stake escalation.
Validate timeout, DLQ, and reassignment behaviors.
Check replay protection for shard proofs.
Privacy & Compliance
Confirm OPFS data retention and deletion policies.
Verify geofenced dispatch is enforced.
Audit logs are hash-chained and tamper-evident.

9. Conclusion

Cloud computing is a $500 Billion industry reliant on physical real estate, server cooling, and monopoly pricing. Shardy sidesteps capital expenditure entirely, unlocking the hardware sitting idle in millions of homes worldwide.

By unifying WebGPU for raw parallel tensor slicing, WASM for cross-platform deterministic smoothing, and Circom / SnarkJS for bulletproof cryptographic execution verification, a Shardy Node transforms a standard web request into an elite, trustless high-performance computing cluster.

Shardy isn’t just a blockchain. It is the distributed brain of the open internet.