Webhook Retry Policy
Fanfare implements a robust retry mechanism to ensure reliable webhook delivery. When a webhook delivery fails, we automatically retry with exponential backoff.Delivery Expectations
Successful Delivery
A webhook is considered successfully delivered when your endpoint returns:- HTTP status code in the 2xx range (200-299)
- Response received within the timeout period (30 seconds)
Failed Delivery
A webhook delivery is considered failed when:| Condition | Description |
|---|---|
| Connection error | Cannot establish TCP connection |
| Timeout | No response within 30 seconds |
| DNS resolution failure | Cannot resolve hostname |
| TLS/SSL error | Certificate validation failed |
| HTTP 4xx response (except 410) | Client error (will still retry) |
| HTTP 5xx response | Server error |
| HTTP 410 Gone | Endpoint disabled (stops retries) |
Retry Schedule
When delivery fails, Fanfare retries with exponential backoff:| Attempt | Delay After Previous | Cumulative Time |
|---|---|---|
| 1 | Immediate | 0 |
| 2 | 1 minute | 1 minute |
| 3 | 5 minutes | 6 minutes |
| 4 | 30 minutes | 36 minutes |
| 5 | 2 hours | ~2.5 hours |
| 6 | 6 hours | ~8.5 hours |
| 7 | 12 hours | ~20.5 hours |
| 8 | 24 hours | ~44.5 hours |
Retry Headers
Retry attempts include additional headers:| Header | Description |
|---|---|
X-Fanfare-Retry-Count | Current retry attempt (0-7) |
X-Fanfare-Original-Time | Timestamp of original event |
X-Fanfare-Delivery-Id | Unique ID for this delivery |
X-Fanfare-Webhook-Id | Webhook endpoint ID |
Handling Retries
Idempotency
Because webhooks may be delivered multiple times (due to retries or network issues), your handler must be idempotent:Database-Based Idempotency
For simpler setups, use database constraints:Responding Appropriately
Quick Acknowledgment
Always respond quickly (< 5 seconds) and process asynchronously:When to Return Errors
| Scenario | Response | Effect |
|---|---|---|
| Signature invalid | 401 | Will retry (check config) |
| Event already processed | 200 | No retry (success) |
| Temporary processing error | 500 | Will retry |
| Event type not supported | 200 | No retry (acknowledge) |
| Endpoint permanently gone | 410 | Stops all retries |
| Payload validation error | 400 | Will retry (review schema) |
Monitoring Webhook Health
Dashboard Monitoring
Monitor webhook delivery in your Fanfare dashboard:- Go to Settings > Webhooks
- Select your endpoint
- View delivery history and success rates
Webhook Events
You can also receive webhooks about webhook delivery status:Disabling an Endpoint
Automatic Disabling
Endpoints are automatically disabled after consecutive failures:- 100 consecutive failures over 7 days
- Manual re-enabling required in dashboard
Manual Disabling
To stop receiving webhooks temporarily:- Dashboard: Settings > Webhooks > Disable
- API: Update endpoint status
Returning 410 Gone
If your endpoint is permanently removed, return 410 to stop retries:Recovering Missed Events
Event Replay
Request replay of events for a time window:Event Listing
List recent events for manual processing:Best Practices
1. Implement Circuit Breakers
Prevent cascade failures when your system is overloaded:2. Log Delivery Metadata
Log retry information for debugging:3. Set Up Alerts
Configure alerts for webhook failures:4. Test Retry Handling
Verify your retry handling in development:Troubleshooting
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| All retries failing | Endpoint unreachable | Check firewall, DNS, SSL certificates |
| Intermittent failures | Timeout exceeded | Optimize handler, use async processing |
| Duplicate processing | No idempotency check | Implement deduplication using event ID |
| Events arriving late | Previous retries queued | Check X-Fanfare-Original-Time header |
| Endpoint auto-disabled | Too many consecutive failures | Fix issues, re-enable in dashboard |
Debug Checklist
- Verify connectivity: Can you reach your endpoint from external networks?
- Check certificates: Is your SSL certificate valid and properly configured?
- Review logs: What status codes are you returning?
- Test manually: Can you process a test event successfully?
- Check timing: Are you responding within 30 seconds?