Users report sporadic 401s right after login. The token was minted three seconds ago and the verifier swears it expired. The cause is almost never the token — it is clock drift between the auth server that signed iat/exp and the resource server that checked exp. Same story for nbf (“not before”) rejections immediately after issuance. Fix by running NTP everywhere, adding a small leeway in the verifier, and shortening token TTL once clocks are reliable.
Common causes
Ordered by hit rate.
1. Resource server clock drifted forward
The verifying server is 8 seconds ahead. Tokens that have not yet expired in the issuer’s clock are rejected.
How to spot it: chronyc tracking shows large System time offset, or timedatectl status reports System clock synchronized: no.
2. Container with no NTP
Docker containers run a clock inherited from the host kernel — fine on a real host, but virtualized hosts or kubelet nodes can drift, and stopped/resumed VMs can jump.
How to spot it: date -u inside two pods differs by seconds.
3. JWT library has no leeway by default
jsonwebtoken (node), python-jose, and golang-jwt accept zero leeway by default. A 100 ms skew over the wire can cause occasional false rejections at exp boundary.
How to spot it: Errors cluster around the very last second of the token lifetime.
4. nbf set to issuance time on a slightly fast issuer
Issuer’s clock is ahead, so nbf is “in the future” from the verifier’s view for a few seconds.
How to spot it: Error message says “token not yet valid” or NotBeforeError.
5. Cached time on serverless cold start
Some FaaS runtimes lazily sync time on cold start. The first request after a long idle gap can see skew until the OS catches up.
How to spot it: Error spike correlates with cold starts.
Shortest path to fix
Step 1: Confirm and quantify the skew
# Compare wall clock to a known good source
curl -s --head https://www.google.com | grep -i ^date
date -u
# On the host
timedatectl status
chronyc tracking
If System time offset is anything over 100 ms, fix the clock first.
Step 2: Run NTP everywhere
For systemd hosts, prefer systemd-timesyncd (simple) or chrony (richer).
# Ubuntu/Debian with chrony
sudo apt install -y chrony
sudo systemctl enable --now chronyd
chronyc sources -v
chronyc tracking
# /etc/chrony/chrony.conf
pool time.google.com iburst
pool time.cloudflare.com iburst
makestep 1.0 3
rtcsync
For Kubernetes nodes, ensure the node has chrony or systemd-timesyncd — pods inherit. For VMs after a snapshot/restore, force a step.
sudo chronyc makestep
Step 3: Add leeway in the JWT verifier
Node (jsonwebtoken):
import jwt from 'jsonwebtoken';
jwt.verify(token, publicKey, {
algorithms: ['RS256'],
clockTolerance: 5, // seconds of leeway for iat/exp/nbf
});
Node (jose, modern):
import { jwtVerify } from 'jose';
const { payload } = await jwtVerify(token, key, {
clockTolerance: '5s',
});
Python (python-jose):
from jose import jwt
jwt.decode(token, key, algorithms=['RS256'], options={'leeway': 5})
Go (golang-jwt/jwt/v5):
parser := jwt.NewParser(jwt.WithLeeway(5 * time.Second))
token, err := parser.Parse(tokenStr, keyFunc)
Five seconds is a safe default. Avoid more than 30 — it weakens the time bound.
Step 4: Shorten token TTL once clocks are trustworthy
With NTP in place and 5 s leeway, you can keep short access tokens (5-15 min) and use refresh tokens for the long tail. That limits the damage if a token leaks.
// Issue 10 min access, 30 day refresh
const access = jwt.sign({ sub }, key, { algorithm: 'RS256', expiresIn: '10m' });
const refresh = jwt.sign({ sub, typ: 'refresh' }, key, { algorithm: 'RS256', expiresIn: '30d' });
Step 5: Monitor for skew
Emit a metric of now() - iat at the verifier. If the distribution shifts, you have drift somewhere new.
const ageSeconds = Math.floor(Date.now() / 1000) - payload.iat;
metrics.histogram('jwt_age_at_verify_seconds').record(ageSeconds);
Alert if p99 goes negative (meaning verifier clock is behind issuer) by more than 2 s.
Prevention
- NTP (chrony) on every host, with at least two pools listed.
- Every JWT verifier sets explicit
clockToleranceof 5 s. - Bake a
chronyc makestepinto VM resume hooks and large maintenance windows. - Treat clock skew as an SLO — alert when nodes drift more than 100 ms.
- For multi-region deployments, prefer the same NTP source per region to avoid bouncing.
Related
- Auth redirect wrong
- CORS error
- Backend Postgres connection pool exhausted
- Backend gRPC deadline exceeded
- Local vs prod mismatch
- Rate limit issue
Tags: #Backend #Troubleshooting #jwt