Security Best Practices
Follow these best practices to ensure secure M2M authentication for your autonomous agents.
Credential Management
Never Hard-Code Secrets
Bad Practice:
CLIENT_SECRET = "secret_abc123_THIS_IS_WRONG"
Good Practice:
import os
CLIENT_SECRET = os.getenv('AUTHSEC_CLIENT_SECRET')
Use Secret Management Systems
Recommended Solutions:
- HashiCorp Vault - Enterprise secret management
- AWS Secrets Manager - Cloud-native for AWS
- Azure Key Vault - Cloud-native for Azure
- Google Secret Manager - Cloud-native for GCP
- Kubernetes Secrets - Container orchestration
Example with Vault:
import hvac
client = hvac.Client(url='https://vault.example.com')
secret = client.secrets.kv.v2.read_secret_version(path='authsec/client')
CLIENT_ID = secret['data']['data']['client_id']
CLIENT_SECRET = secret['data']['data']['client_secret']
Token Lifecycle
Use Short-Lived Tokens
Recommended Lifetimes:
- Access Tokens: 1-12 hours
- JWT SVIDs: 1-24 hours
- X.509 Certificates: 1-7 days
Shorter lifetimes reduce risk if credentials are compromised.
Implement Token Caching
Don't request a new token for every API call:
import time
from threading import Lock
class TokenCache:
def __init__(self):
self.token = None
self.expires_at = 0
self.lock = Lock()
def get_token(self):
with self.lock:
now = time.time()
# Refresh if token expires in < 5 minutes
if now >= (self.expires_at - 300):
self._refresh_token()
return self.token
def _refresh_token(self):
response = request_new_token()
self.token = response['access_token']
self.expires_at = time.time() + response['expires_in']
Revoke Unused Tokens
Immediately revoke credentials when:
- Workload is decommissioned
- Security incident detected
- Service no longer needs access
Access Control
Principle of Least Privilege
Grant only necessary permissions:
Bad:
{
"scopes": ["*", "admin:all"]
}
Good:
{
"scopes": ["read:data", "write:logs"]
}
Use Role-Based Access Control (RBAC)
Define roles and assign them to workloads:
roles:
ml-trainer:
scopes:
- read:training-data
- write:model-registry
api-gateway:
scopes:
- read:user-profiles
- write:audit-logs
Implement Resource-Level Authorization
Don't just authenticate—authorize specific actions:
def can_access_resource(token, resource_id):
claims = verify_token(token)
# Check if token has required scope
if 'read:data' not in claims['scope']:
return False
# Check if resource belongs to same tenant
if claims['tenant_id'] != get_resource_tenant(resource_id):
return False
return True
Network Security
Use Mutual TLS (mTLS)
Verify both client and server identities:
SPIRE mTLS:
tlsConfig := tlsconfig.MTLSServerConfig(
source,
source,
tlsconfig.AuthorizeID(spiffeid.RequireFromString(
"spiffe://authsec.example.com/production/allowed-client",
)),
)
Encrypt All Communication
Never transmit credentials over HTTP:
- ✗
http://auth.example.com/token - ✓
https://auth.authsec.ai/token
Restrict Network Access
Use network policies to limit workload communication:
Kubernetes Network Policy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-spire-agent
spec:
podSelector:
matchLabels:
app: ml-agent
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
name: spire
ports:
- protocol: TCP
port: 8081
Monitoring & Auditing
Log All Authentication Events
Track every token request and usage:
import logging
logger = logging.getLogger('authsec')
def get_access_token():
logger.info("Requesting access token", extra={
'client_id': CLIENT_ID,
'scopes': 'read:data write:logs',
'timestamp': time.time()
})
token = request_token()
logger.info("Access token obtained", extra={
'token_id': extract_token_id(token),
'expires_at': extract_expiry(token)
})
return token
Set Up Alerts
Monitor for suspicious activity:
Alert Triggers:
- Failed authentication attempts (> 5 in 1 minute)
- Token requests from unexpected IPs
- Access to unauthorized resources
- Certificate rotation failures
- Expired credentials still being used
Example with Prometheus:
groups:
- name: authsec_alerts
rules:
- alert: HighAuthFailureRate
expr: rate(authsec_auth_failures[5m]) > 0.1
annotations:
summary: "High authentication failure rate detected"
Review Access Logs Regularly
Audit who accessed what resources:
-- Query AuthSec audit logs
SELECT
timestamp,
client_id,
resource,
action,
result
FROM audit_logs
WHERE timestamp > NOW() - INTERVAL '7 days'
AND result = 'denied'
ORDER BY timestamp DESC;
Certificate Management (SPIRE)
Automatic Rotation
Ensure SVIDs rotate before expiration:
SPIRE Agent Config:
agent {
# Rotate certificates when 50% of lifetime remains
svid_store_cache_expiry = "50%"
}
Validate Certificate Chains
Always verify the full trust chain:
import (
"crypto/x509"
"github.com/spiffe/go-spiffe/v2/bundle/x509bundle"
)
func verifyCertificate(cert *x509.Certificate, bundle *x509bundle.Bundle) error {
_, err := cert.Verify(x509.VerifyOptions{
Roots: bundle.X509Authorities(),
CurrentTime: time.Now(),
KeyUsages: []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth},
})
return err
}
Revoke Compromised Certificates
Immediately revoke if a workload is compromised:
kubectl exec -n spire spire-server-0 -- \
/opt/spire/bin/spire-server entry delete \
-entryID <ENTRY_ID>
Incident Response
Have a Revocation Plan
Steps if credentials are compromised:
-
Immediate Actions:
- Revoke compromised credentials in AuthSec
- Delete SPIRE registration entries
- Block workload network access
-
Investigation:
- Review audit logs for unauthorized access
- Identify affected resources
- Determine breach timeline
-
Remediation:
- Rotate all potentially affected credentials
- Apply security patches
- Update access policies
-
Prevention:
- Implement additional monitoring
- Review and strengthen security practices
- Conduct post-mortem analysis
Emergency Contact
Keep these contacts readily available:
- Security Team: security@yourcompany.com
- AuthSec Support: support@authsec.ai
- On-Call Engineer: +1-XXX-XXX-XXXX
Compliance & Standards
Follow Industry Standards
NIST Guidelines:
- Use strong cryptography (RSA 2048+, ECDSA P-256+)
- Implement multi-factor authentication where possible
- Maintain audit trails for 1+ years
OWASP Recommendations:
- Validate all inputs
- Use secure random number generation
- Implement rate limiting
Data Residency
Ensure compliance with data regulations:
- GDPR (Europe)
- CCPA (California)
- HIPAA (Healthcare)
- SOC 2 (Enterprise)
Regular Security Audits
Quarterly Reviews:
- Active client credentials
- Granted scopes and permissions
- Certificate expiration dates
- Network access policies
Annual Penetration Testing:
- Engage external security firms
- Test M2M authentication flows
- Validate incident response procedures
Common Vulnerabilities
Avoid These Mistakes
Credential Leakage:
- ✗ Logging full tokens
- ✗ Committing secrets to Git
- ✗ Storing in plaintext files
Insufficient Validation:
- ✗ Not verifying token signatures
- ✗ Accepting expired tokens
- ✗ Skipping certificate validation
Overprivileged Access:
- ✗ Granting admin scopes by default
- ✗ Reusing credentials across environments
- ✗ No scope limitations
Security Checklist
Use this checklist before deploying to production:
- Credentials stored in secret manager (not code)
- Short-lived tokens (< 24 hours)
- Token caching implemented
- Minimal scopes granted (least privilege)
- mTLS enabled for service communication
- All traffic encrypted (HTTPS/TLS)
- Network policies restrict access
- Authentication/authorization logs enabled
- Alerts configured for failures
- Certificate rotation automated
- Revocation procedure documented
- Security audit completed
- Compliance requirements met
Additional Resources
Official Documentation:
Tools:
AuthSec Support:
- Email: security@authsec.ai
- Documentation: docs.authsec.ai
By following these best practices, you'll ensure your autonomous agents communicate securely with zero-trust authentication! 🔒