mohammed firdous

K8s Health AI

·source

A Kubernetes operator that detects pod failures, collects diagnostics, and stores fix guidance in a custom resource.

K8s Health AI is an operator-style project for Kubernetes troubleshooting. It watches failing workloads, gathers runtime context, calls a configurable AI backend, and writes diagnosis output into a ClusterDiagnosis custom resource.

Instead of manually collecting pod events, specs, and logs under pressure, the system does that work in one place and returns concise guidance for each failure type.

What it is

A Kubernetes diagnostics assistant with:

  • Failure Detection: Watches for common pod failure states and classifies root patterns.
  • Context Collection: Pulls events, pod spec details, and logs automatically.
  • Diagnosis CRD: Writes findings into ClusterDiagnosis objects for team visibility.
  • Provider Flexibility: Supports mock and multiple cloud/local AI providers.

How It's Built

  • Operator Runtime: Go + controller-runtime for reconciliation and event handling.
  • Custom Resource: ClusterDiagnosis API to persist diagnosis and status.
  • Collection Layer: Internal modules for logs, events, and workload metadata.
  • LLM Layer: Pluggable adapters (OpenAI, Bedrock, Vertex, Azure OpenAI, Ollama, mock).
  • Operations: Prometheus metric diagnoses_total and diagctl CLI for quick inspection.