Understanding SPIFFE and SPIRE
Part 2 of the Zero-Trust Workload Identity Series
← Previous: The Authentication Crisis | Next: How ICP Platform Works →
What is SPIFFE?
SPIFFE = Secure Production Identity Framework For Everyone
An open standard (CNCF) for workload identity, used by Google, Bloomberg, Netflix, Uber.
The Core Idea
Give every workload a cryptographic identity that:
- Uniquely identifies the workload
- Proves identity cryptographically (X.509 certificates)
- Automatically rotates (short-lived)
- Works everywhere (K8s, Docker, VMs, cloud, on-premise)
SPIFFE ID
Every workload gets a unique identifier:
spiffe://production.example.com/workload/frontend
spiffe://production.example.com/workload/payment-service
Components:
spiffe://- Protocolproduction.example.com- Trust Domain (your org)/workload/frontend- Path (specific workload)
SVID (SPIFFE Verifiable Identity Document)
The proof of identity - cryptographic document proving "you are who you claim to be."
X.509-SVID (most common):
Certificate:
Subject: spiffe://production.example.com/workload/frontend
Validity:
Not Before: Jan 1 00:00:00 2024 GMT
Not After : Jan 1 01:00:00 2024 GMT (1 hour TTL!)
Subject Alternative Name:
URI: spiffe://production.example.com/workload/frontend
Key properties:
- Standard X.509 certificate (works with TLS)
- Short-lived (1 hour typical TTL)
- Automatic renewal
What is SPIRE?
SPIRE = SPIFFE Runtime Environment
The reference implementation of SPIFFE - open source software that issues and manages identities.
Architecture
┌─────────────────────────────────────────┐
│ SPIRE Server │
│ - Certificate Authority (CA) │
│ - Registration API │
│ - Attestation Services │
└──────────────┬──────────────────────────┘
│ mTLS
┌───────┴────────┐
│ │
┌──────▼─────┐ ┌──────▼─────┐
│SPIRE Agent │ │SPIRE Agent │
│(Node 1) │ │(Node 2) │
│ │ │ │
│Unix Socket │ │Unix Socket │
└─────┬──────┘ └─────┬──────┘
│ │
┌─────▼──┐ ┌─────▼──┐
│Workload│ │Workload│
└────────┘ └────────┘
SPIRE Server
Role: Central authority that issues SVIDs
Key functions:
- Certificate Authority - root of trust
- Registration - stores workload-to-SPIFFE-ID mappings
- SVID issuance - issues certificates to authenticated workloads
SPIRE Agent
Role: Runs on every node, provides SVIDs to workloads
Key functions:
- Node attestation - proves node identity to server
- Workload attestation - identifies workloads on node
- Workload API - serves SVIDs via Unix socket
- SVID renewal - automatic rotation before expiry
Workload API
Standard gRPC API over Unix socket (/run/spire/sockets/agent.sock):
Methods:
- FetchX509SVID() # Get certificate
- FetchX509Bundles() # Get trust bundles
Why Unix socket?
- Secure (only local processes)
- No network configuration
- Automatic process identification (PID)
How Attestation Works
Attestation = proving identity before issuing SVID.
Node Attestation (Agent → Server)
Question: How does server know agent is legitimate?
Example: Kubernetes
1. Agent reads service account token
2. Sends token to SPIRE Server
3. Server validates with K8s API
4. Issues Agent SVID
Example: AWS EC2
1. Agent queries EC2 metadata API
2. Gets signed instance identity document
3. Server verifies with AWS public key
4. Issues Agent SVID
Workload Attestation (Workload → Agent)
Question: How does agent know which workload is requesting?
Answer: Selectors - metadata collected from runtime environment
Kubernetes selectors:
{
"k8s:ns": "production",
"k8s:sa": "frontend-sa",
"k8s:pod-label:app": "frontend"
}
Docker selectors:
{
"docker:label:app": "frontend",
"docker:container-name": "frontend",
"docker:image-name": "frontend:v1.2.3"
}
Unix selectors:
{
"unix:uid": "1000",
"unix:path": "/opt/app/bin/frontend",
"unix:sha256": "abc123..."
}
Selector Matching
Agent matches collected selectors against registered workload entries:
Registered Entry:
{
"spiffe_id": "spiffe://prod.example.com/workload/frontend",
"selectors": {
"k8s:ns": "production",
"k8s:pod-label:app": "frontend"
}
}
Collected at Runtime:
{
"k8s:ns": "production",
"k8s:sa": "frontend-sa",
"k8s:pod-label:app": "frontend",
"k8s:pod-label:version": "v1"
}
Match? YES! (registered selectors ⊆ collected selectors)
→ Issue SVID with registered spiffe_id
The SVID Lifecycle
1. Workload Startup (T+0)
Workload → Agent: "I need an SVID!"
Agent: Collects selectors
Agent: Matches workload entry
Agent → Server: Requests SVID
Server → Agent: Issues SVID (1-hour TTL)
Agent → Workload: Streams SVID
Workload: Writes to /tmp/spiffe-certs/
2. Automatic Renewal (T+54 min)
Agent: "SVID expiring soon (90% TTL)"
Agent → Server: Requests renewal
Server → Agent: New SVID (T+114 min expiry)
Agent → Workload: Streams updated SVID
Workload: Updates /tmp/spiffe-certs/
3. Workload Shutdown
Agent: Workload disconnected
Agent: Removes from cache
(No revocation needed - expires in <1 hour)
Security Properties
Why This Is Secure
-
Short-Lived Credentials
- Expires in 1 hour
- Compromised SVID has minimal blast radius
- Automatic rotation
-
Cryptographic Attestation
- Can't fake Kubernetes token (verified with K8s API)
- Can't fake AWS instance identity
- Can't fake process UID/GID
-
Defense in Depth
- Node attestation (agent proves node identity)
- Workload attestation (matches selectors)
- Certificate verification (mTLS)
-
Zero Shared Secrets
- No API keys, passwords, or static tokens
- Everything dynamically issued
Threat Model
Q: What if SVID is stolen?
- A: Expires in 1 hour. Short attack window.
Q: What if SPIRE Server is compromised?
- A: Root of trust compromised. Mitigation: HSM, network isolation, audit logs.
Q: What if agent is compromised?
- A: Can only issue SVIDs for that node. Mitigation: read-only filesystem, monitoring.
SPIFFE/SPIRE vs Alternatives
| Feature | SPIFFE/SPIRE | Service Mesh | Cloud IAM |
|---|---|---|---|
| Standard | ✅ Open (CNCF) | ❌ Vendor-specific | ❌ Cloud-specific |
| Cross-platform | ✅ K8s/Docker/VMs | ⚠️ Mostly K8s | ❌ Cloud only |
| Multi-cloud | ✅ Yes | ⚠️ Complex | ❌ No |
| Resource overhead | ✅ Low | ❌ High (sidecars) | ✅ Low |
| mTLS | ✅ Native | ✅ Native | ❌ Manual |
| Managed option | ⚠️ DIY | ⚠️ DIY | ✅ Managed |
Limitations of Open Source SPIRE
Running SPIRE yourself has challenges:
-
Operational Burden
- Setup HA SPIRE Server
- Deploy agents across infrastructure
- Monitoring, alerting, security patches
-
Complexity
- Configuration files (server, agents, plugins)
- Manual workload entry registration
- Trust bundle distribution
-
Multi-Tenancy
- SPIRE OSS is single-tenant
- Need separate deployment per tenant
Time to production: 2-4 weeks for experienced teams
What's Next?
Now that we understand SPIFFE and SPIRE, the next post explores how ICP (Identity Control Plane) Platform built by AuthSec Team provides a managed, multi-tenant, production-ready solution.
- Part 3: How ICP Platform Works
- Part 4: Comparing Solutions
- Part 5: Get Started in 5 Minutes
Questions? support@authsec.dev