Skip to main content

Understanding SPIFFE and SPIRE

Part 2 of the Zero-Trust Workload Identity Series

← Previous: The Authentication Crisis | Next: How ICP Platform Works →


What is SPIFFE?

SPIFFE = Secure Production Identity Framework For Everyone

An open standard (CNCF) for workload identity, used by Google, Bloomberg, Netflix, Uber.

The Core Idea

Give every workload a cryptographic identity that:

  1. Uniquely identifies the workload
  2. Proves identity cryptographically (X.509 certificates)
  3. Automatically rotates (short-lived)
  4. Works everywhere (K8s, Docker, VMs, cloud, on-premise)

SPIFFE ID

Every workload gets a unique identifier:

spiffe://production.example.com/workload/frontend
spiffe://production.example.com/workload/payment-service

Components:

  • spiffe:// - Protocol
  • production.example.com - Trust Domain (your org)
  • /workload/frontend - Path (specific workload)

SVID (SPIFFE Verifiable Identity Document)

The proof of identity - cryptographic document proving "you are who you claim to be."

X.509-SVID (most common):

Certificate:
Subject: spiffe://production.example.com/workload/frontend
Validity:
Not Before: Jan 1 00:00:00 2024 GMT
Not After : Jan 1 01:00:00 2024 GMT (1 hour TTL!)
Subject Alternative Name:
URI: spiffe://production.example.com/workload/frontend

Key properties:

  • Standard X.509 certificate (works with TLS)
  • Short-lived (1 hour typical TTL)
  • Automatic renewal

What is SPIRE?

SPIRE = SPIFFE Runtime Environment

The reference implementation of SPIFFE - open source software that issues and manages identities.

Architecture

┌─────────────────────────────────────────┐
│ SPIRE Server │
│ - Certificate Authority (CA) │
│ - Registration API │
│ - Attestation Services │
└──────────────┬──────────────────────────┘
│ mTLS
┌───────┴────────┐
│ │
┌──────▼─────┐ ┌──────▼─────┐
│SPIRE Agent │ │SPIRE Agent │
│(Node 1) │ │(Node 2) │
│ │ │ │
│Unix Socket │ │Unix Socket │
└─────┬──────┘ └─────┬──────┘
│ │
┌─────▼──┐ ┌─────▼──┐
│Workload│ │Workload│
└────────┘ └────────┘

SPIRE Server

Role: Central authority that issues SVIDs

Key functions:

  • Certificate Authority - root of trust
  • Registration - stores workload-to-SPIFFE-ID mappings
  • SVID issuance - issues certificates to authenticated workloads

SPIRE Agent

Role: Runs on every node, provides SVIDs to workloads

Key functions:

  • Node attestation - proves node identity to server
  • Workload attestation - identifies workloads on node
  • Workload API - serves SVIDs via Unix socket
  • SVID renewal - automatic rotation before expiry

Workload API

Standard gRPC API over Unix socket (/run/spire/sockets/agent.sock):

Methods:
- FetchX509SVID() # Get certificate
- FetchX509Bundles() # Get trust bundles

Why Unix socket?

  • Secure (only local processes)
  • No network configuration
  • Automatic process identification (PID)

How Attestation Works

Attestation = proving identity before issuing SVID.

Node Attestation (Agent → Server)

Question: How does server know agent is legitimate?

Example: Kubernetes

1. Agent reads service account token
2. Sends token to SPIRE Server
3. Server validates with K8s API
4. Issues Agent SVID

Example: AWS EC2

1. Agent queries EC2 metadata API
2. Gets signed instance identity document
3. Server verifies with AWS public key
4. Issues Agent SVID

Workload Attestation (Workload → Agent)

Question: How does agent know which workload is requesting?

Answer: Selectors - metadata collected from runtime environment

Kubernetes selectors:

{
"k8s:ns": "production",
"k8s:sa": "frontend-sa",
"k8s:pod-label:app": "frontend"
}

Docker selectors:

{
"docker:label:app": "frontend",
"docker:container-name": "frontend",
"docker:image-name": "frontend:v1.2.3"
}

Unix selectors:

{
"unix:uid": "1000",
"unix:path": "/opt/app/bin/frontend",
"unix:sha256": "abc123..."
}

Selector Matching

Agent matches collected selectors against registered workload entries:

Registered Entry:
{
"spiffe_id": "spiffe://prod.example.com/workload/frontend",
"selectors": {
"k8s:ns": "production",
"k8s:pod-label:app": "frontend"
}
}

Collected at Runtime:
{
"k8s:ns": "production",
"k8s:sa": "frontend-sa",
"k8s:pod-label:app": "frontend",
"k8s:pod-label:version": "v1"
}

Match? YES! (registered selectors ⊆ collected selectors)
→ Issue SVID with registered spiffe_id

The SVID Lifecycle

1. Workload Startup (T+0)

Workload → Agent: "I need an SVID!"
Agent: Collects selectors
Agent: Matches workload entry
Agent → Server: Requests SVID
Server → Agent: Issues SVID (1-hour TTL)
Agent → Workload: Streams SVID
Workload: Writes to /tmp/spiffe-certs/

2. Automatic Renewal (T+54 min)

Agent: "SVID expiring soon (90% TTL)"
Agent → Server: Requests renewal
Server → Agent: New SVID (T+114 min expiry)
Agent → Workload: Streams updated SVID
Workload: Updates /tmp/spiffe-certs/

3. Workload Shutdown

Agent: Workload disconnected
Agent: Removes from cache
(No revocation needed - expires in <1 hour)

Security Properties

Why This Is Secure

  1. Short-Lived Credentials

    • Expires in 1 hour
    • Compromised SVID has minimal blast radius
    • Automatic rotation
  2. Cryptographic Attestation

    • Can't fake Kubernetes token (verified with K8s API)
    • Can't fake AWS instance identity
    • Can't fake process UID/GID
  3. Defense in Depth

    • Node attestation (agent proves node identity)
    • Workload attestation (matches selectors)
    • Certificate verification (mTLS)
  4. Zero Shared Secrets

    • No API keys, passwords, or static tokens
    • Everything dynamically issued

Threat Model

Q: What if SVID is stolen?

  • A: Expires in 1 hour. Short attack window.

Q: What if SPIRE Server is compromised?

  • A: Root of trust compromised. Mitigation: HSM, network isolation, audit logs.

Q: What if agent is compromised?

  • A: Can only issue SVIDs for that node. Mitigation: read-only filesystem, monitoring.

SPIFFE/SPIRE vs Alternatives

FeatureSPIFFE/SPIREService MeshCloud IAM
Standard✅ Open (CNCF)❌ Vendor-specific❌ Cloud-specific
Cross-platform✅ K8s/Docker/VMs⚠️ Mostly K8s❌ Cloud only
Multi-cloud✅ Yes⚠️ Complex❌ No
Resource overhead✅ Low❌ High (sidecars)✅ Low
mTLS✅ Native✅ Native❌ Manual
Managed option⚠️ DIY⚠️ DIY✅ Managed

Limitations of Open Source SPIRE

Running SPIRE yourself has challenges:

  1. Operational Burden

    • Setup HA SPIRE Server
    • Deploy agents across infrastructure
    • Monitoring, alerting, security patches
  2. Complexity

    • Configuration files (server, agents, plugins)
    • Manual workload entry registration
    • Trust bundle distribution
  3. Multi-Tenancy

    • SPIRE OSS is single-tenant
    • Need separate deployment per tenant

Time to production: 2-4 weeks for experienced teams


What's Next?

Now that we understand SPIFFE and SPIRE, the next post explores how ICP (Identity Control Plane) Platform built by AuthSec Team provides a managed, multi-tenant, production-ready solution.


Questions? support@authsec.dev