Skip to Content
Network Telemetry & Mesh Architecture

Network Telemetry & Mesh Architecture

The Shardy Orchestrator represents the control plane of the decentralized ecosystem. While individual Shardy nodes execute atomic operations across isolated hardware environments, the Orchestrator governs how these thousands of anonymous clients function collectively as a monolithic supercomputer.

This document serves as the definitive technical blueprint of the architectural decisions underpinning the Shardy network routing, job match-making, peer-to-peer telemetry, and consensus settlement layers.


1. The Orchestrator Concept

The Orchestrator is written in TypeScript on Bun and coordinates API, WebSocket workers, dispatch, consensus, and persistence.

Its primary function is Trustless Job Management. It must act under the assumption that 100% of its connected node workers could disconnect at any moment, spoof hardware metrics, or return mathematically invalid arrays.

1.1 Multi-Tier Admission Protocol

Before a node can process neural network pipelines, it must be profiled.

  1. Hardware Profiling (profile_v2): When a worker joins the WebSocket stream, it executes a benchmark and reports GFLOPS, max stable allocation, and chunk size.
  2. Tier Categorization (scheduler.ts): Based on GFLOPS and stable allocation, the orchestrator assigns a WorkerTier:
    • Tier 1: Low-end Integrated CPUs/GPUs (assigned lite_test, simple validation)
    • Tier 2: Standard mid-range graphical hardware
    • Tier 3: High End Desktop (HEDT) GPUs like RTX 3090/4090s or Apple M3 Maxes.
  3. Admission: passesAdmission() requires minimum GFLOPS and memory stability; otherwise the node is rejected.

2. Dispatch Mechanics & Reliability (BFT)

Jobs introduced into the Shardy network aren’t simply mapped 1-to-1 to a compute node. To enforce Byzantine Fault Tolerance (BFT), identical jobs are securely duplicated and validated.

2.1 The Redundancy Matrix

  • REDUNDANCY_FACTOR: Standard tasks require redundancy (default 2). Stress/lite tasks require 1 assignment.
  • Dispatcher Engine: The Dispatcher class identifies matching idle nodes filtered explicitly by their designated WorkerTier requirements.
  • Binary Delivery Protocol: Two frames are sent: a protobuf meta frame followed by the raw binary payload.

2.2 Watchdog Auto-Recovery & Dead-Letter Queues (DLQ)

Network turbulence is actively managed by a localized consensus loop (the “Watchdog”):

  • Ack Timeout: If a socket disconnects mid-delivery or mobile devices enter a suspended state, the worker fails to provide a task_ack.
  • Execution Timeout: Tasks flagged as running that blow past complexity-based computational caps.
  • Migrating Offline Loads (tryReassignOfflineAssignment): The protocol seamlessly strips dead assignments from dropped connections and routes the chunk to the next highly-available hardware node seamlessly—the end user issuing the task assumes no network drop occurred.
  • DLQ Mechanics: Failed deliveries after MAX_DELIVERY_ATTEMPTS are recorded as dead letters in state storage.

3. P2P Mesh Implementation & Libp2p

To decentralize connectivity and scale infinitely without single-point bottlenecks, the Orchestrator embeds Libp2p.

3.1 Network Topology (libp2pHost.ts)

  • GossipSub Pub/Sub: Presence, transaction broadcast, and block announcements.
  • Kademlia DHT: Handshakes are handled peer-to-peer for node discovery.
  • Encrypted Tunnels: Payload configurations utilize Noise protocol encryptions dynamically multiplexed via Yamux streams.
  • Transaction Gossiping: All consensus transactions flow through shardy.blockchain.transactions.v1.
  • Block Announce: New blocks are announced via shardy.block.announce.v1, enabling fast P2P sync.
  • Sync Streams: /shardy/blocks/sync/1.0.0 and /shardy/state/snapshot/1.0.0 provide block and snapshot sync.

4. Consensus & Settlement Protocols

The most critical mechanism in Shardy is verifying untrusted computation through algorithmic consensus in state_machine.ts and consensus.ts.

4.1 State Machine Verification

  1. Worker results arrive as task_result with checksum and optional proof.
  2. For standard tasks, proof validity and checksum consistency are enforced.
  3. When required completions match, the state machine emits task_verified.

4.2 Congruency and Slashing

  • Match Result: Matching checksums across required assignments trigger task_verified.
  • Reward Distribution: Rewards are distributed with stake-weighted multipliers.
  • Consensus Mismatch: Mismatched checksums trigger consensus_mismatch and slashing.

5. Consensus Diagram (Actual)


6. Sync Diagram (Block Announce + Snapshot)


Technical Summary

The orchestrator layer combines task dispatch, watchdog recovery, libp2p gossip, and a lightweight consensus engine to keep network state consistent and verifiable.

Last updated on