Network Telemetry & Mesh Architecture
The Shardy Orchestrator represents the control plane of the decentralized ecosystem. While individual Shardy nodes execute atomic operations across isolated hardware environments, the Orchestrator governs how these thousands of anonymous clients function collectively as a monolithic supercomputer.
This document serves as the definitive technical blueprint of the architectural decisions underpinning the Shardy network routing, job match-making, peer-to-peer telemetry, and consensus settlement layers.
1. The Orchestrator Concept
The Orchestrator is written in TypeScript on Bun and coordinates API, WebSocket workers, dispatch, consensus, and persistence.
Its primary function is Trustless Job Management. It must act under the assumption that 100% of its connected node workers could disconnect at any moment, spoof hardware metrics, or return mathematically invalid arrays.
1.1 Multi-Tier Admission Protocol
Before a node can process neural network pipelines, it must be profiled.
- Hardware Profiling (
profile_v2): When a worker joins the WebSocket stream, it executes a benchmark and reports GFLOPS, max stable allocation, and chunk size. - Tier Categorization (
scheduler.ts): Based on GFLOPS and stable allocation, the orchestrator assigns aWorkerTier:- Tier 1: Low-end Integrated CPUs/GPUs (assigned
lite_test, simple validation) - Tier 2: Standard mid-range graphical hardware
- Tier 3: High End Desktop (HEDT) GPUs like RTX 3090/4090s or Apple M3 Maxes.
- Tier 1: Low-end Integrated CPUs/GPUs (assigned
- Admission:
passesAdmission()requires minimum GFLOPS and memory stability; otherwise the node is rejected.
2. Dispatch Mechanics & Reliability (BFT)
Jobs introduced into the Shardy network aren’t simply mapped 1-to-1 to a compute node. To enforce Byzantine Fault Tolerance (BFT), identical jobs are securely duplicated and validated.
2.1 The Redundancy Matrix
REDUNDANCY_FACTOR: Standard tasks require redundancy (default 2). Stress/lite tasks require 1 assignment.- Dispatcher Engine: The
Dispatcherclass identifies matchingidlenodes filtered explicitly by their designatedWorkerTierrequirements. - Binary Delivery Protocol: Two frames are sent: a protobuf meta frame followed by the raw binary payload.
2.2 Watchdog Auto-Recovery & Dead-Letter Queues (DLQ)
Network turbulence is actively managed by a localized consensus loop (the “Watchdog”):
- Ack Timeout: If a socket disconnects mid-delivery or mobile devices enter a suspended state, the worker fails to provide a
task_ack. - Execution Timeout: Tasks flagged as running that blow past complexity-based computational caps.
- Migrating Offline Loads (
tryReassignOfflineAssignment): The protocol seamlessly strips dead assignments from dropped connections and routes the chunk to the next highly-available hardware node seamlessly—the end user issuing the task assumes no network drop occurred. - DLQ Mechanics: Failed deliveries after
MAX_DELIVERY_ATTEMPTSare recorded as dead letters in state storage.
3. P2P Mesh Implementation & Libp2p
To decentralize connectivity and scale infinitely without single-point bottlenecks, the Orchestrator embeds Libp2p.
3.1 Network Topology (libp2pHost.ts)
- GossipSub Pub/Sub: Presence, transaction broadcast, and block announcements.
- Kademlia DHT: Handshakes are handled peer-to-peer for node discovery.
- Encrypted Tunnels: Payload configurations utilize Noise protocol encryptions dynamically multiplexed via Yamux streams.
- Transaction Gossiping: All consensus transactions flow through
shardy.blockchain.transactions.v1. - Block Announce: New blocks are announced via
shardy.block.announce.v1, enabling fast P2P sync. - Sync Streams:
/shardy/blocks/sync/1.0.0and/shardy/state/snapshot/1.0.0provide block and snapshot sync.
4. Consensus & Settlement Protocols
The most critical mechanism in Shardy is verifying untrusted computation through algorithmic consensus in state_machine.ts and consensus.ts.
4.1 State Machine Verification
- Worker results arrive as
task_resultwith checksum and optional proof. - For standard tasks, proof validity and checksum consistency are enforced.
- When required completions match, the state machine emits
task_verified.
4.2 Congruency and Slashing
- Match Result: Matching checksums across required assignments trigger
task_verified. - Reward Distribution: Rewards are distributed with stake-weighted multipliers.
- Consensus Mismatch: Mismatched checksums trigger
consensus_mismatchand slashing.
5. Consensus Diagram (Actual)
6. Sync Diagram (Block Announce + Snapshot)
Technical Summary
The orchestrator layer combines task dispatch, watchdog recovery, libp2p gossip, and a lightweight consensus engine to keep network state consistent and verifiable.