Understanding SPIFFE and SPIRE

Part 2 of the Zero-Trust Workload Identity Series

← Previous: The Authentication Crisis | Next: How ICP Platform Works →

What is SPIFFE?

SPIFFE = Secure Production Identity Framework For Everyone

An open standard (CNCF) for workload identity, used by Google, Bloomberg, Netflix, Uber.

The Core Idea

Give every workload a cryptographic identity that:

Uniquely identifies the workload
Proves identity cryptographically (X.509 certificates)
Automatically rotates (short-lived)
Works everywhere (K8s, Docker, VMs, cloud, on-premise)

SPIFFE ID

Every workload gets a unique identifier:

spiffe://production.example.com/workload/frontend
spiffe://production.example.com/workload/payment-service

Components:

spiffe:// - Protocol
production.example.com - Trust Domain (your org)
/workload/frontend - Path (specific workload)

SVID (SPIFFE Verifiable Identity Document)

The proof of identity - cryptographic document proving "you are who you claim to be."

X.509-SVID (most common):

Certificate:
    Subject: spiffe://production.example.com/workload/frontend
    Validity:
        Not Before: Jan 1 00:00:00 2024 GMT
        Not After : Jan 1 01:00:00 2024 GMT  (1 hour TTL!)
    Subject Alternative Name:
        URI: spiffe://production.example.com/workload/frontend

Key properties:

Standard X.509 certificate (works with TLS)
Short-lived (1 hour typical TTL)
Automatic renewal

What is SPIRE?

SPIRE = SPIFFE Runtime Environment

The reference implementation of SPIFFE - open source software that issues and manages identities.

Architecture

┌─────────────────────────────────────────┐
│         SPIRE Server                    │
│  - Certificate Authority (CA)           │
│  - Registration API                     │
│  - Attestation Services                 │
└──────────────┬──────────────────────────┘
               │ mTLS
       ┌───────┴────────┐
       │                │
┌──────▼─────┐   ┌──────▼─────┐
│SPIRE Agent │   │SPIRE Agent │
│(Node 1)    │   │(Node 2)    │
│            │   │            │
│Unix Socket │   │Unix Socket │
└─────┬──────┘   └─────┬──────┘
      │                │
┌─────▼──┐       ┌─────▼──┐
│Workload│       │Workload│
└────────┘       └────────┘

SPIRE Server

Role: Central authority that issues SVIDs

Key functions:

Certificate Authority - root of trust
Registration - stores workload-to-SPIFFE-ID mappings
SVID issuance - issues certificates to authenticated workloads

SPIRE Agent

Role: Runs on every node, provides SVIDs to workloads

Key functions:

Node attestation - proves node identity to server
Workload attestation - identifies workloads on node
Workload API - serves SVIDs via Unix socket
SVID renewal - automatic rotation before expiry

Workload API

Standard gRPC API over Unix socket (/run/spire/sockets/agent.sock):

Methods:
  - FetchX509SVID()    # Get certificate
  - FetchX509Bundles() # Get trust bundles

Why Unix socket?

Secure (only local processes)
No network configuration
Automatic process identification (PID)

How Attestation Works

Attestation = proving identity before issuing SVID.

Node Attestation (Agent → Server)

Question: How does server know agent is legitimate?

Example: Kubernetes

Agent reads service account token
Sends token to SPIRE Server
Server validates with K8s API
Issues Agent SVID

Example: AWS EC2

Agent queries EC2 metadata API
Gets signed instance identity document
Server verifies with AWS public key
Issues Agent SVID

Workload Attestation (Workload → Agent)

Question: How does agent know which workload is requesting?

Answer: Selectors - metadata collected from runtime environment

Kubernetes selectors:

{
  "k8s:ns": "production",
  "k8s:sa": "frontend-sa",
  "k8s:pod-label:app": "frontend"
}

Docker selectors:

{
  "docker:label:app": "frontend",
  "docker:container-name": "frontend",
  "docker:image-name": "frontend:v1.2.3"
}

Unix selectors:

{
  "unix:uid": "1000",
  "unix:path": "/opt/app/bin/frontend",
  "unix:sha256": "abc123..."
}

Selector Matching

Agent matches collected selectors against registered workload entries:

Registered Entry:
{
  "spiffe_id": "spiffe://prod.example.com/workload/frontend",
  "selectors": {
    "k8s:ns": "production",
    "k8s:pod-label:app": "frontend"
  }
}

Collected at Runtime:
{
  "k8s:ns": "production",
  "k8s:sa": "frontend-sa",
  "k8s:pod-label:app": "frontend",
  "k8s:pod-label:version": "v1"
}

Match? YES! (registered selectors ⊆ collected selectors)
→ Issue SVID with registered spiffe_id

The SVID Lifecycle

1. Workload Startup (T+0)

Workload → Agent: "I need an SVID!"
Agent: Collects selectors
Agent: Matches workload entry
Agent → Server: Requests SVID
Server → Agent: Issues SVID (1-hour TTL)
Agent → Workload: Streams SVID
Workload: Writes to /tmp/spiffe-certs/

2. Automatic Renewal (T+54 min)

Agent: "SVID expiring soon (90% TTL)"
Agent → Server: Requests renewal
Server → Agent: New SVID (T+114 min expiry)
Agent → Workload: Streams updated SVID
Workload: Updates /tmp/spiffe-certs/

3. Workload Shutdown

Agent: Workload disconnected
Agent: Removes from cache
(No revocation needed - expires in &lt;1 hour)

Security Properties

Why This Is Secure

Short-Lived Credentials
- Expires in 1 hour
- Compromised SVID has minimal blast radius
- Automatic rotation
Cryptographic Attestation
- Can't fake Kubernetes token (verified with K8s API)
- Can't fake AWS instance identity
- Can't fake process UID/GID
Defense in Depth
- Node attestation (agent proves node identity)
- Workload attestation (matches selectors)
- Certificate verification (mTLS)
Zero Shared Secrets
- No API keys, passwords, or static tokens
- Everything dynamically issued

Threat Model

Q: What if SVID is stolen?

A: Expires in 1 hour. Short attack window.

Q: What if SPIRE Server is compromised?

A: Root of trust compromised. Mitigation: HSM, network isolation, audit logs.

Q: What if agent is compromised?

A: Can only issue SVIDs for that node. Mitigation: read-only filesystem, monitoring.

SPIFFE/SPIRE vs Alternatives

Feature	SPIFFE/SPIRE	Service Mesh	Cloud IAM
Standard	✅ Open (CNCF)	❌ Vendor-specific	❌ Cloud-specific
Cross-platform	✅ K8s/Docker/VMs	⚠️ Mostly K8s	❌ Cloud only
Multi-cloud	✅ Yes	⚠️ Complex	❌ No
Resource overhead	✅ Low	❌ High (sidecars)	✅ Low
mTLS	✅ Native	✅ Native	❌ Manual
Managed option	⚠️ DIY	⚠️ DIY	✅ Managed

Limitations of Open Source SPIRE

Running SPIRE yourself has challenges:

Operational Burden
- Setup HA SPIRE Server
- Deploy agents across infrastructure
- Monitoring, alerting, security patches
Complexity
- Configuration files (server, agents, plugins)
- Manual workload entry registration
- Trust bundle distribution
Multi-Tenancy
- SPIRE OSS is single-tenant
- Need separate deployment per tenant

Time to production: 2-4 weeks for experienced teams

What's Next?

Now that we understand SPIFFE and SPIRE, the next post explores how ICP (Identity Control Plane) Platform built by AuthSec Team provides a managed, multi-tenant, production-ready solution.

Questions? support@authsec.dev

What is SPIFFE?​

The Core Idea​

SPIFFE ID​

SVID (SPIFFE Verifiable Identity Document)​

What is SPIRE?​

Architecture​

SPIRE Server​

SPIRE Agent​

Workload API​

How Attestation Works​

Node Attestation (Agent → Server)​

Workload Attestation (Workload → Agent)​

Selector Matching​

The SVID Lifecycle​

1. Workload Startup (T+0)​

2. Automatic Renewal (T+54 min)​

3. Workload Shutdown​

Security Properties​

Why This Is Secure​

Threat Model​

SPIFFE/SPIRE vs Alternatives​

Limitations of Open Source SPIRE​

What's Next?​