SSL Cert Auto-Renewal Failed Silently, Site Now Untrusted

Certbot renewal stopped months ago and you only noticed when the browser flashed NET::ERR_CERT_DATE_INVALID. Diagnose the real cause and restore HTTPS in under an hour.

Published: May 24, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

The browser flashes red: NET::ERR_CERT_DATE_INVALID. Your Let’s Encrypt cert expired two days ago, but you “set up auto-renewal months ago and never touched it again.” That is exactly how this fails. Let’s Encrypt’s default certificate still lasts 90 days as of June 2026, the auto-renewal job silently broke at some cycle, no one was watching the logs, and now you’re scrambling at 11pm.

Fastest path: SSH into the host and run sudo certbot renew --force-renewal --dry-run. That single command reproduces the failure against Let’s Encrypt’s staging server (so it costs no rate-limit quota) and prints the exact reason renewal is failing: blocked challenge, stale DNS token, full disk, or a hook that never reloaded your web server. Read that output, fix the one error it names, then re-run without --dry-run. The rest of this page is the per-cause fix and how to stop it from silently happening again.

The renewal job has three independent failure surfaces (the scheduler, the challenge mechanism, the deploy hook) and any one breaking is enough to leave you with a stale cert quietly counting down to expiry.

Which bucket are you in?

Run the diagnostics in the next section, then match the signal to a cause. Most outages are one of these six.

Signal you see	Most likely cause	Jump to
`systemctl list-timers` shows no `certbot.timer`; `journalctl` is empty for months	Scheduler never ran	Cause 1 / Step 4
`curl` to `.well-known/acme-challenge/` returns 301/401/403, not 404	HTTP-01 challenge blocked by redirect, WAF, or auth	Cause 2 / Step 2
`curl` times out or connection refused on port 80	Firewall / security group dropped port 80	Cause 2 / Step 2
Log shows `403 Forbidden` or `Authentication error` from a DNS API	DNS-01 token expired or rotated	Cause 3 / Step 3
Log shows `No space left on device` or filesystem is `ro`	Disk full / read-only `/etc`	Cause 4
Cert file on disk is fresh but `s_client` still shows the old date	Deploy hook never reloaded the server	Cause 5 / Step 5
Log shows `rateLimited` or `too many failed authorizations`	Rate limit hit during a retry loop	Cause 6 / FAQ

Common causes

Ordered by what we see most often in incident postmortems.

1. The cron / systemd timer never actually ran

Certbot’s renewal is typically scheduled via cron, systemd timer, or a package-provided unit. If the timer was disabled during a server reboot, OS upgrade, or systemctl mask, it has been silently no-op for months.

How to spot it: systemctl list-timers | grep certbot shows no active timer, or journalctl -u certbot.timer --since '90 days ago' is empty. crontab -l and /etc/cron.d/certbot are missing or commented out.

2. HTTP-01 challenge can no longer reach `.well-known/acme-challenge/`

You added a CDN, a WAF, an auth_basic block, or a “force HTTPS” rewrite that intercepts /.well-known/acme-challenge/* and returns 401/403/redirect. A common silent version: a return 301 https://... redirect on the :80 server block now bounces the challenge before it reaches the token file. Let’s Encrypt validates over plain HTTP on port 80, so any redirect on that path breaks it. A second common version: a cloud firewall, security group, or ufw rule closed inbound port 80 entirely, so the validation request never connects. Either way Let’s Encrypt cannot fetch the token and renewal fails.

How to spot it: curl -I http://yourdomain.com/.well-known/acme-challenge/test returns anything but 404. A 401/403/301 means a proxy or redirect is intercepting; a connection timeout or “connection refused” means port 80 is firewalled.

3. DNS-01 challenge API credentials expired or rotated

You used a DNS plugin (Cloudflare, Route 53, Google Cloud DNS) with an API token. The token had an expiry, or you rotated keys, or the IAM policy changed — but the certbot config still has the old credentials.

How to spot it: /var/log/letsencrypt/letsencrypt.log shows 403 Forbidden or Authentication error from the DNS API during the renewal attempt.

4. Disk full or read-only filesystem blocks the renewal

Certbot writes to /etc/letsencrypt/live/, /etc/letsencrypt/archive/, and stages temporary files. If the partition is full, or if the system is in degraded mode and /etc is read-only, renewal aborts.

How to spot it: df -h /etc shows 100% used or mount | grep ' / ' shows (ro,...). Certbot log shows OSError: [Errno 28] No space left on device.

5. Cert renewed but the deploy hook never reloaded nginx / haproxy

The cert on disk is new — openssl x509 -in fullchain.pem -noout -dates shows a fresh expiry — but the running web server is still holding the old cert in memory because the post-renewal hook (--deploy-hook 'systemctl reload nginx') was never set or silently failed.

How to spot it: File mtime on fullchain.pem is recent, but openssl s_client -connect yourdomain.com:443 -servername yourdomain.com shows the expired cert.

6. Rate limit hit during a failed loop

If a broken cron retried over and over, Let’s Encrypt’s rate limits kick in. As of June 2026 the two that bite here are 5 authorization failures per identifier per hour (refilling 1 every 12 minutes) and 50 certificates per registered domain per 7 days (refilling 1 every ~202 minutes). Once tripped, even correct renewal attempts get blocked until the window clears.

Important nuance: a genuine renewal of an existing cert via ARI (ACME Renewal Information) is exempt from all rate limits as of 2026, so a properly configured Certbot rarely hits these. You only trip them when a misconfigured script keeps requesting brand-new orders instead of renewing, or hammers a broken challenge.

How to spot it: Log shows urn:ietf:params:acme:error:rateLimited or too many failed authorizations recently.

Before you start

Note exactly how many hours/days the cert has been expired — that determines user impact and triage urgency.
Identify the cert tool in use: certbot, acme.sh, caddy-built-in, cert-manager (k8s), Vercel/Netlify/Cloudflare managed.
Have shell / sudo access to the host that runs the renewal.
Have a fallback HTTPS option in your back pocket: Cloudflare proxy in front (provides edge cert), or a manually issued cert as a one-off.

Information to collect

Output of openssl s_client -connect yourdomain.com:443 -servername yourdomain.com 2>/dev/null | openssl x509 -noout -dates -subject.
Output of ls -la /etc/letsencrypt/live/yourdomain.com/.
Last 200 lines of /var/log/letsencrypt/letsencrypt.log.
systemctl status certbot.timer and systemctl list-timers --all | grep cert.
crontab -l and cat /etc/cron.d/certbot (or equivalent).
Whether anything sits in front of the origin: CDN, WAF, load balancer.

Step-by-step fix

Ordered to get HTTPS back fastest, then fix automation properly.

Step 1: Run the renewal manually and capture the real error

sudo certbot renew --force-renewal --dry-run

--dry-run hits Let’s Encrypt’s staging environment so it does not eat your rate limit. The output tells you exactly what is failing — challenge fetch, DNS API, hook, filesystem. Read it line by line.

If dry-run succeeds:

sudo certbot renew --force-renewal

If dry-run fails, fix that error first before consuming production cert quota.

Step 2: If HTTP-01 challenge is being blocked

Test the challenge path directly:

sudo mkdir -p /var/www/letsencrypt/.well-known/acme-challenge
echo "test123" | sudo tee /var/www/letsencrypt/.well-known/acme-challenge/test
curl -I http://yourdomain.com/.well-known/acme-challenge/test

You want HTTP/1.1 200 OK over plain HTTP. If you get redirect-to-HTTPS or 401, find the offending nginx location block / WAF rule and exempt /.well-known/acme-challenge/ from auth and redirect:

location /.well-known/acme-challenge/ {
    root /var/www/letsencrypt;
    allow all;
    auth_basic off;
}

Place this block BEFORE any return 301 https://... redirect inside the :80 server block.

If curl instead times out or says “connection refused”, the problem is port 80 itself, not nginx. Check that inbound TCP 80 is open at every layer: host firewall (sudo ufw status, look for 80/tcp ALLOW), the cloud security group / network ACL, and any upstream load balancer. Let’s Encrypt’s HTTP-01 validator connects to port 80 from the public internet; if a security-group change closed it, renewal fails even though your HTTPS site on 443 still works.

Step 3: If DNS-01 challenge credentials are stale

Regenerate the API token (Cloudflare, Route 53, etc.). Update the credentials file:

sudo nano /etc/letsencrypt/cloudflare.ini
# dns_cloudflare_api_token = NEW_TOKEN_HERE
sudo chmod 600 /etc/letsencrypt/cloudflare.ini

Re-run renewal. If you also use the same API token elsewhere (Terraform, monitoring), update those references in the same change so you don’t break a different consumer.

Step 4: Restore the renewal timer / cron

Most systems use the systemd timer that Certbot installs by default. Re-enable it:

sudo systemctl enable --now certbot.timer
systemctl list-timers | grep certbot

You should see a future Next time within ~12h. Certbot’s timer runs twice a day; each run renews any cert within 30 days of expiry. On Certbot 4.1.0 or newer it also consults the ACME Renewal Information (ARI) endpoint, so it can renew earlier if Let’s Encrypt advises it (for example during an incident or a profile change). Check your version with certbot --version; if it is older than 4.1, upgrade so you get ARI and rate-limit-exempt renewals.

The cleanest way to attach a reload is a deploy hook file, not an inline flag. The timer ignores inline --deploy-hook arguments, so put a script in the hooks directory instead:

sudo tee /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh >/dev/null <<'EOF'
#!/bin/bash
nginx -t && systemctl reload nginx
EOF
sudo chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh

Scripts in /etc/letsencrypt/renewal-hooks/deploy/ run automatically after any successful renewal, whether triggered by the timer or by hand. The nginx -t guard means a bad config aborts the reload instead of taking the site down.

If you genuinely run renewals from cron rather than the timer, the inline form does work there:

sudo crontab -e -u root

Add:

0 3,15 * * * certbot renew --quiet --deploy-hook "nginx -t && systemctl reload nginx"

The --deploy-hook is critical: it reloads nginx ONLY when a cert actually renewed, so it does not flap unnecessarily.

Step 5: Reload the web server with the new cert

sudo systemctl reload nginx   # or haproxy, apache2, caddy
openssl s_client -connect yourdomain.com:443 -servername yourdomain.com 2>/dev/null | openssl x509 -noout -dates

The notAfter= date should now be roughly 90 days out on the default classic profile (about 45 days if you opted into the newer tlsserver profile, or ~6 days on the shortlived profile). If the on-disk cert is new but s_client still shows the old date, the web server is holding the file open: reload (not restart) should re-read it, and if it does not, do restart.

Step 6: Add proactive monitoring so this never silently breaks again

sudo crontab -e -u root

Add a check that emails / pings if the cert is within 25 days of expiry:

0 9 * * * /usr/local/bin/cert-expiry-check.sh

Where cert-expiry-check.sh is:

#!/bin/bash
DOMAIN="yourdomain.com"
END=$(echo | openssl s_client -connect ${DOMAIN}:443 -servername ${DOMAIN} 2>/dev/null \
      | openssl x509 -noout -enddate | cut -d= -f2)
END_EPOCH=$(date -d "${END}" +%s)
NOW_EPOCH=$(date +%s)
DAYS=$(( (END_EPOCH - NOW_EPOCH) / 86400 ))
if [ "${DAYS}" -lt 25 ]; then
  echo "Cert for ${DOMAIN} expires in ${DAYS} days" | mail -s "CERT WARN" you@example.com
fi

25 days gives you 35 days of slack before the cert actually expires — plenty of time to fix automation calmly.

How to confirm it’s fixed

Browser loads https://yourdomain.com with no warning (hard-refresh; Chrome caches the bad cert state for a few minutes).
openssl s_client -connect yourdomain.com:443 -servername yourdomain.com returns a cert whose notAfter is a fresh date (~90 days out on the default profile).
sudo certbot certificates lists the cert as VALID with a recent expiry, no INVALID/EXPIRED rows.
systemctl list-timers | grep certbot shows an active timer with a Next time in the next ~12h.
sudo certbot renew --dry-run exits clean (it should print “Cert not yet due for renewal” or “Congratulations, all simulated renewals succeeded”).
Your monitoring (the Step 6 script, or external like UptimeRobot / Better Stack) confirms the new expiry date is being tracked.

Long-term prevention

Always pair renewal with a deploy hook (a script in /etc/letsencrypt/renewal-hooks/deploy/ for the timer, or --deploy-hook for cron). Without it, renewed certs sit on disk while the running process serves the old one.
Keep Certbot at 4.1.0 or newer so you get ARI-driven renewals, which are exempt from rate limits and renew early when Let’s Encrypt advises it. This matters more as lifetimes shrink: the default 90-day cert is set to drop to 64 days in early 2027 and 45 days in 2028, which doubles renewal frequency.
Add external cert-expiry monitoring (UptimeRobot, Better Stack, Datadog, Pingdom) out-of-band, not running on the same box that issues the cert. Note that Let’s Encrypt stopped sending expiration-warning emails in June 2025, so you cannot rely on their reminders anymore. Monitoring is now mandatory, not optional.
After any OS upgrade, immediately re-test that certbot.timer is enabled. Upgrades sometimes disable third-party timers.
Document the renewal flow in your runbook so the next on-call engineer can fix it in 5 minutes, not 5 hours.

Common pitfalls

Running certbot renew over and over to “force it” while the underlying cause (blocked challenge) is unfixed — burns rate limit quota and now you can’t renew for a week.
Assuming the platform auto-renews when you actually disabled the managed cert and switched to your own. Some platforms (Vercel, Netlify) only auto-renew certs they issued.
Renewing the cert but forgetting systemctl reload nginx — file is new, served cert is old.
Using --standalone in a cron when nginx is already bound to :80, which makes certbot fail to bind the port.
Setting up renewal monitoring on the same server that does the renewing — if the box dies, monitoring dies with it. Always external.

FAQ

Q: My cert just expired. Will users lose data or sessions?

Sessions are not lost server-side, but every user gets a browser warning. Mobile apps with strict cert pinning may stop working entirely. Restore HTTPS before worrying about sessions.

Q: Can I issue a new cert from a different CA as a fast fallback?

Yes — ZeroSSL, Buypass, and Sectigo all support ACME. Add a second issuer in your config. You can also park behind Cloudflare’s proxy for an edge cert in minutes (DNS change) while you fix Let’s Encrypt.

Q: I hit the rate limit. How long until I can issue again?

As of June 2026, the 50-certs-per-registered-domain limit is a rolling 7-day window that refills one slot every ~202 minutes. The 5-authorization-failures-per-identifier limit refills one slot every 12 minutes, so a single fixed failure frees up in about that long. Use staging (certbot renew --test-cert or --dry-run) to confirm your fix while a production limit clears. And remember: an ARI-coordinated renewal of an existing cert is exempt from these limits entirely, so once Certbot 4.1+ is renewing properly you should not see them again.

Q: Does certbot renew early enough? When does it actually fire?

The timer checks twice a day and renews any cert inside 30 days of expiry. With ARI (Certbot 4.1.0+) it can renew earlier when Let’s Encrypt advertises a renewal window. So if you fix automation today, the next valid cert renews well before the new one expires. You do not need to force-renew on a schedule.

Q: Should I move from 90-day certs to 1-year certs to avoid this?

You can’t. Public CAs no longer issue certs longer than 200 days (per CA/Browser Forum rules tightening through 2026-2029), and Let’s Encrypt is going the other way: its default is 90 days now, dropping to 64 days in early 2027 and 45 days in 2028, with an opt-in 6-day shortlived profile already generally available. The fix is reliable automation plus monitoring, not longer certs.

Tags: #Troubleshooting #SSL #letsencrypt #automation #certbot