Knowledge Graph

OpenSRE's knowledge graph is a Neo4j-powered map of your entire service topology. It stores services, their dependencies, ownership, recent changes, and infrastructure components. During incident investigation, AI agents query this graph to understand the blast radius of a failure, trace dependency chains, and identify what recently changed near the affected services.

What the Knowledge Graph Stores

Services

Every service in your platform is a node in the graph:

  • Name, team, language, criticality
  • API endpoints and communication protocols
  • Upstream and downstream dependencies
  • SLOs and error budgets (if configured)

Dependencies

Edges in the graph represent relationships:

  • DEPENDS_ON — service A calls service B
  • OWNS — team X owns service Y
  • DEPLOYED_ON — service runs on this infrastructure component
  • USES — service uses this database or external API

Infrastructure

Infrastructure components as nodes:

  • Kubernetes clusters, namespaces, deployments
  • Databases (PostgreSQL, Redis, MySQL)
  • Message queues (Kafka, RabbitMQ, SQS)
  • Cloud services (RDS, S3, Lambda)

How Investigations Use the Knowledge Graph

Blast Radius Analysis

Given a failing service, the knowledge graph answers: "what else is affected?"

When payments-service starts returning errors, OpenSRE queries the graph for all services that DEPEND_ON payments-service, either directly or transitively. This blast radius list is provided to investigation subagents so they can check the health of downstream services proactively.

Dependency Traversal

When investigating a performance degradation, OpenSRE traverses the dependency graph upstream: what does this service depend on? Has anything in that dependency chain changed recently?

Ownership Lookup

The graph knows which team owns which service. This enables OpenSRE to include the right team context in its report: "This incident affects checkout-service (owned by Platform team) and depends on payments-service (owned by Payments team)."

Recent Change Detection

Changes to the graph — new deployments, config changes, infrastructure modifications — are timestamped. Investigation subagents query: "What changed in the vicinity of this service in the last 2 hours?"

Querying the Graph

The knowledge graph is accessible via Cypher queries through OpenSRE's Neo4j integration. Example queries:

// Find all services that depend on payments-service
MATCH (s:Service)-[:DEPENDS_ON*]->(target:Service {name: "payments-service"})
RETURN s.name, s.team

// Find services deployed in the last hour
MATCH (d:Deployment)-[:DEPLOYED_TO]->(s:Service)
WHERE d.deployed_at > datetime() - duration('PT1H')
RETURN s.name, d.version, d.deployed_at

Building Your Service Topology

Automatic Discovery

OpenSRE can auto-discover service topology from:

  • Kubernetes service mesh (Istio, Linkerd) telemetry
  • Distributed tracing data (Jaeger, Datadog APM)
  • API gateway call graphs

Manual Registration

For services not auto-discoverable, register them via the config-service API or the web console's Knowledge Graph editor.

Viewing the Knowledge Graph

The web console includes a knowledge graph visualizer at http://localhost:3002/knowledge-graph. Explore service dependencies, run blast radius queries, and view recent topology changes.