Sheaft

Resilience intelligence for networked and agent systems

Sheaft turns telemetry, execution traces, topology and domain constraints into a live reliability model, stress-tests disruption scenarios, and shows which critical operations or agent workflows are fragile before failures propagate.

Built for graph-structured systems: software infrastructure, agent harnesses, smart mobility, and scientific reaction or interaction networks.

View GitHub See use cases

One resilience model for graph-structured and agent systems

Sheaft represents a system as an attributed graph: nodes, edges, constraints, telemetry, execution traces, stressors and critical operations. The domain changes; the resilience question stays the same: what breaks, how far it propagates, and which operations or workflows are affected.

Nodes

Services, agent steps, tools, road segments, sensors, proteins, metabolites, reactants.

Edges

Calls, tool invocations, state reads/writes, flows, roads, reactions, interactions.

Attributes

Latency, capacity, confidence, permission state, retrieval age, energy, rate, state.

Constraints

Domain contracts, policy checks, evaluator gates, signal rules, conservation laws, biological priors.

Stressors

Crashes, tool timeouts, schema drift, stale retrieval, missing observations, perturbations.

Verdict

Fragile paths, blast radius, affected operations or workflows, green/yellow/red posture.

How Sheaft works

Bering creates explicit graph artifacts from telemetry, topology and execution traces. Sheaft evaluates disruption scenarios and produces resilience verdicts for critical operations and agent workflows.

Discover the system graph

Bering ingests telemetry, topology files, execution traces, event streams or explicit graph descriptions and builds a typed system model.
Attach domain contracts

Critical operations, constraints and success predicates are attached to the graph: service journeys, agent workflows, corridor operations, reaction routes or biological pathways.
Stress-test disruption scenarios

Sheaft simulates failures, missing observations, degraded components, tool timeouts, schema drift, stale retrieval, demand spikes or perturbations without breaking the real system.
Produce a resilience verdict

The result is a graph-level report: fragile components, affected operations, blast radius, posture trend and recommended validation targets.

Four high-value domains. One resilience engine.

Most mature: public MVP + experiments

SRE and digital infrastructure

Sheaft evaluates resilience of software systems before release and between releases. It builds service graphs from trace data or topology artifacts, simulates dependency failures, and returns a gate or posture verdict for critical user journeys.

Digital infrastructure is graph-structured operational infrastructure. Services, APIs, queues, databases, replicas, release policies and user operations form one dependency graph. Validation evidence

Inputs

Traces/OTLP
Service topology
Endpoint contracts
CI/CD artifacts
Incident history

Outputs

Release-risk verdict
Fragile dependencies
Affected endpoints
Chaos-test priorities
Posture history

Engineering research line

Agent reliability and AI harnesses

Sheaft builds a reliability model from agent harness execution traces: LLM calls, tools, retrieval, memory, policy checks, evaluators, permissions, retries, fallback paths and human gates.

Single-run task success and average pass rate miss boundary failures. Real incidents appear between the model, tools, retrieval, state, guardrails and release gates. OpenTelemetry GenAI conventions Agent spans

Inputs

Execution traces / GenAI spans
Tool calls
Retrieval and memory reads/writes
Policy/evaluator/human-gate events
Retry/fallback history

Outputs

Reliability graph
Virtual chaos scenarios
Fragile boundaries
Green/yellow/red release verdict
Test priorities

Pilot-ready

Smart mobility and traffic corridors

Sheaft evaluates resilience of smart mobility infrastructure represented as a live attributed graph: road segments, intersections, toll gates, sensors, signal controllers, payment systems and operations centers.

Smart mobility is cyber-physical infrastructure. Road networks, tolling, sensors and digital services form one distributed operational graph. NIST CPS context SUMO road graph

Inputs

Road topology
Traffic/toll events
Sensor observations
Signal timings
Incident logs

Outputs

Fragile corridors
Missing observations
Incident propagation
Affected mobility journeys
Operator resilience verdict

Research and partner pilots

Chemistry and biology networks

Sheaft analyzes chemical reaction networks, biological interaction networks, metabolic pathways and bioprocess graphs to find fragile routes, critical intermediates, observation gaps and perturbation-sensitive modules.

These systems are routinely modeled as graphs: reactions connect reactants, intermediates and products; biological networks connect proteins, genes, metabolites and signaling interactions. EMBL-EBI biological networks RSC reaction networks

Inputs

Reaction networks
Pathway models
Biological interaction graphs
Experimental observations

Outputs

Critical intermediates
Alternative routes
Fragile modules
Perturbation blast radius

Two layers: model discovery and resilience assessment

Bering: system-graph discovery and artifact publishing

Bering builds typed graph artifacts from telemetry, topology inputs, execution traces, event streams or explicit domain models. It publishes stable model and snapshot artifacts for downstream resilience analysis.

Discovers typed graph structure from traces, OTLP, GenAI spans, event streams or explicit topology.
Normalizes nodes, edges, attributes, state reads/writes and domain contracts.
Publishes stable model/snapshot artifacts.
Keeps provenance so teams can inspect where each part of the model came from.

Bering on GitHub Bering docs

Sheaft: resilience simulation, verdicts and posture history

Sheaft consumes graph artifacts, stress-tests disruption scenarios, evaluates domain policies and tracks resilience posture or release verdicts over time.

Runs simulation-based resilience analysis on graph artifacts.
Evaluates policy rules for critical operations and agent workflows.
Reports fragile components, affected operations and blast radius.
Supports batch checks and continuous posture tracking.

Sheaft on GitHub Sheaft docs

Validated on distributed systems, extending to agent harnesses and broader infrastructure

ICSE 2026 Distinguished Paper Award

Model Discovery and Graph Simulation was recognized in ICSE 2026 NIER with a Distinguished Paper Award.

Official ICSE page

DeathStarBench experiments

Graph-discovered resilience models were evaluated against live fault-injection outcomes on a distributed benchmark.

OpenTelemetry GenAI/agent semantics

OpenTelemetry is forming GenAI, agent and tool-span semantics, which makes trace-derived reliability models a portable research abstraction rather than a local log format.

GenAI conventions Agent spans

Public MVP

Bering and Sheaft are available as public tools for graph discovery, simulation, verdicts and posture monitoring.

GitHub

What this proves

Sheaft can turn passive telemetry and execution traces into an explicit model and produce useful resilience signals without running broad live experiments every time.

Demo report: digital infrastructure use case

The current public report shows the most mature software-infrastructure workflow. The same report pattern extends to agent harnesses, mobility and scientific-network pilots.

Smart mobility report template

Corridor graph, missing observations, affected mobility journeys, disruption propagation.

Chemistry/biology report template

Reaction/pathway graph, critical intermediates, perturbation-sensitive modules, alternative routes.

Agent harness report template

Trace-derived call graph, stale retrieval, tool timeout, schema drift, permission denial, release verdict.

Start with one graph, one critical operation, one historical period

SRE pilot

Start from trace data, topology and incident history for one service domain. Sheaft builds a model, runs failure scenarios, and compares the result with known incidents or release risks.

Agent reliability research pilot

Start from one agent workflow and one set of historical execution traces. Sheaft builds the harness reliability graph, simulates virtual chaos scenarios, and returns a green/yellow/red release verdict.

Smart mobility pilot

Use topology and telemetry from one corridor, tolling domain, parking/payment flow or sensorized mobility area. Sheaft identifies fragile components, missing observations and affected mobility journeys.

Chemistry/biology research pilot

Use a reaction network, biological interaction graph or pathway model with perturbation scenarios. Sheaft identifies critical intermediates, fragile modules and robustness hypotheses.

Research base: graph models, resilience and emergence

Sheaft is grounded in model discovery, graph simulation, sheaf-theoretic consistency, causal emergence and resilience monitoring. The product packages these ideas into practical workflows for networked and agent systems.

Core engine

Emergence and self-governance

AI foundations and world models

Research extensions

Chemistry/biology network resilience
Traffic corridor resilience
Agent harness reliability and virtual chaos simulation

This project is implemented with grant support from the Foundation for Assistance to Small Innovative Enterprises. The link opens the official website of the Fund.

Resilience intelligence for networked and agent systems

One resilience model for graph-structured and agent systems

Nodes

Edges

Attributes

Constraints

Stressors

Verdict

How Sheaft works

Discover the system graph

Attach domain contracts

Stress-test disruption scenarios

Produce a resilience verdict

Four high-value domains. One resilience engine.

SRE and digital infrastructure

Inputs

Outputs

Agent reliability and AI harnesses

Inputs

Outputs

Smart mobility and traffic corridors

Inputs

Outputs

Chemistry and biology networks

Inputs

Outputs

Two layers: model discovery and resilience assessment

Bering: system-graph discovery and artifact publishing

Sheaft: resilience simulation, verdicts and posture history

Validated on distributed systems, extending to agent harnesses and broader infrastructure

ICSE 2026 Distinguished Paper Award

DeathStarBench experiments

OpenTelemetry GenAI/agent semantics

Public MVP

What this proves

Demo report: digital infrastructure use case

Smart mobility report template

Chemistry/biology report template

Agent harness report template

Start with one graph, one critical operation, one historical period

SRE pilot

Agent reliability research pilot

Smart mobility pilot

Chemistry/biology research pilot

Research base: graph models, resilience and emergence

Core engine

Emergence and self-governance

AI foundations and world models

Research extensions