Skip to main content

Webhook Retry Policy

Fanfare implements a robust retry mechanism to ensure reliable webhook delivery. When a webhook delivery fails, we automatically retry with exponential backoff.

Delivery Expectations

Successful Delivery

A webhook is considered successfully delivered when your endpoint returns:
  • HTTP status code in the 2xx range (200-299)
  • Response received within the timeout period (30 seconds)
// Success responses
app.post("/webhooks/fanfare", (req, res) => {
  // Process the webhook...

  // Any 2xx status is acceptable
  res.status(200).send("OK");
  // or
  res.status(202).json({ received: true });
  // or
  res.status(204).send();
});

Failed Delivery

A webhook delivery is considered failed when:
ConditionDescription
Connection errorCannot establish TCP connection
TimeoutNo response within 30 seconds
DNS resolution failureCannot resolve hostname
TLS/SSL errorCertificate validation failed
HTTP 4xx response (except 410)Client error (will still retry)
HTTP 5xx responseServer error
HTTP 410 GoneEndpoint disabled (stops retries)

Retry Schedule

When delivery fails, Fanfare retries with exponential backoff:
AttemptDelay After PreviousCumulative Time
1Immediate0
21 minute1 minute
35 minutes6 minutes
430 minutes36 minutes
52 hours~2.5 hours
66 hours~8.5 hours
712 hours~20.5 hours
824 hours~44.5 hours
After 8 failed attempts over approximately 44 hours, the webhook delivery is marked as failed and no further retries are attempted.

Retry Headers

Retry attempts include additional headers:
HeaderDescription
X-Fanfare-Retry-CountCurrent retry attempt (0-7)
X-Fanfare-Original-TimeTimestamp of original event
X-Fanfare-Delivery-IdUnique ID for this delivery
X-Fanfare-Webhook-IdWebhook endpoint ID
app.post("/webhooks/fanfare", (req, res) => {
  const retryCount = parseInt(req.headers["x-fanfare-retry-count"] || "0", 10);
  const originalTime = req.headers["x-fanfare-original-time"];

  if (retryCount > 0) {
    console.log(`Retry attempt ${retryCount}, original event from ${originalTime}`);
  }

  // Process webhook...
  res.status(200).send("OK");
});

Handling Retries

Idempotency

Because webhooks may be delivered multiple times (due to retries or network issues), your handler must be idempotent:
import { Redis } from "ioredis";

const redis = new Redis();
const PROCESSED_TTL = 48 * 60 * 60; // 48 hours

async function processWebhook(event) {
  const deliveryId = event.id;

  // Check if already processed
  const alreadyProcessed = await redis.get(`webhook:${deliveryId}`);
  if (alreadyProcessed) {
    console.log(`Webhook ${deliveryId} already processed, skipping`);
    return { duplicate: true };
  }

  // Mark as processing (with short TTL to handle crashes)
  await redis.set(`webhook:${deliveryId}`, "processing", "EX", 300);

  try {
    // Process the event
    await handleEvent(event);

    // Mark as completed (with longer TTL)
    await redis.set(`webhook:${deliveryId}`, "completed", "EX", PROCESSED_TTL);

    return { success: true };
  } catch (error) {
    // Remove the processing marker so retries can work
    await redis.del(`webhook:${deliveryId}`);
    throw error;
  }
}

Database-Based Idempotency

For simpler setups, use database constraints:
async function processWebhook(event) {
  try {
    // Attempt to insert the event ID
    await db.insert(processedWebhooks).values({
      id: event.id,
      eventType: event.type,
      processedAt: new Date(),
    });
  } catch (error) {
    // Unique constraint violation = already processed
    if (error.code === "23505") {
      console.log(`Webhook ${event.id} already processed`);
      return { duplicate: true };
    }
    throw error;
  }

  // Process the event
  await handleEvent(event);

  return { success: true };
}

Responding Appropriately

Quick Acknowledgment

Always respond quickly (< 5 seconds) and process asynchronously:
import { Queue } from "bullmq";

const webhookQueue = new Queue("webhooks");

app.post("/webhooks/fanfare", async (req, res) => {
  // Verify signature first
  if (!verifySignature(req)) {
    return res.status(401).send("Invalid signature");
  }

  const event = JSON.parse(req.body.toString());

  // Queue for background processing
  await webhookQueue.add(event.type, event, {
    jobId: event.id, // Prevents duplicate jobs
    removeOnComplete: 1000,
    attempts: 3,
  });

  // Respond immediately
  res.status(202).json({ received: true });
});

When to Return Errors

ScenarioResponseEffect
Signature invalid401Will retry (check config)
Event already processed200No retry (success)
Temporary processing error500Will retry
Event type not supported200No retry (acknowledge)
Endpoint permanently gone410Stops all retries
Payload validation error400Will retry (review schema)
app.post("/webhooks/fanfare", async (req, res) => {
  // Signature errors should return 401
  if (!verifySignature(req)) {
    return res.status(401).send("Invalid signature");
  }

  const event = JSON.parse(req.body.toString());

  // Check for duplicates - return success
  if (await isDuplicate(event.id)) {
    return res.status(200).send("Already processed");
  }

  // Unknown event types - acknowledge but don't process
  if (!SUPPORTED_EVENTS.includes(event.type)) {
    console.log(`Ignoring unsupported event type: ${event.type}`);
    return res.status(200).send("Event type not handled");
  }

  try {
    await processEvent(event);
    return res.status(200).send("OK");
  } catch (error) {
    // Temporary errors - allow retry
    console.error("Processing error:", error);
    return res.status(500).send("Processing failed");
  }
});

Monitoring Webhook Health

Dashboard Monitoring

Monitor webhook delivery in your Fanfare dashboard:
  1. Go to Settings > Webhooks
  2. Select your endpoint
  3. View delivery history and success rates

Webhook Events

You can also receive webhooks about webhook delivery status:
{
  "id": "whk_01HXYZ123456789",
  "type": "webhook.delivery.failed",
  "timestamp": "2024-12-01T12:00:00Z",
  "organizationId": "org_01HXYZ123456789",
  "data": {
    "endpointId": "whe_01HXYZ123456789",
    "endpointUrl": "https://your-server.com/webhooks/fanfare",
    "eventId": "evt_01HXYZ123456789",
    "eventType": "queue.consumer.admitted",
    "retryCount": 8,
    "lastError": "Connection timeout",
    "willRetry": false
  }
}

Disabling an Endpoint

Automatic Disabling

Endpoints are automatically disabled after consecutive failures:
  • 100 consecutive failures over 7 days
  • Manual re-enabling required in dashboard

Manual Disabling

To stop receiving webhooks temporarily:
  1. Dashboard: Settings > Webhooks > Disable
  2. API: Update endpoint status
curl -X PATCH https://admin.fanfare.io/api/v1/webhooks/whe_01HXYZ123456789 \
  -H "Authorization: Bearer sk_live_xxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

Returning 410 Gone

If your endpoint is permanently removed, return 410 to stop retries:
app.post("/webhooks/fanfare", (req, res) => {
  // Endpoint is being decommissioned
  return res.status(410).send("Endpoint removed");
});

Recovering Missed Events

Event Replay

Request replay of events for a time window:
curl -X POST https://admin.fanfare.io/api/v1/webhooks/whe_01HXYZ123456789/replay \
  -H "Authorization: Bearer sk_live_xxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "startTime": "2024-12-01T00:00:00Z",
    "endTime": "2024-12-01T12:00:00Z",
    "eventTypes": ["queue.consumer.admitted", "order.created"]
  }'

Event Listing

List recent events for manual processing:
curl -X GET "https://admin.fanfare.io/api/v1/webhooks/events?startTime=2024-12-01T00:00:00Z&limit=100" \
  -H "Authorization: Bearer sk_live_xxxxxxxxxxxx"

Best Practices

1. Implement Circuit Breakers

Prevent cascade failures when your system is overloaded:
import CircuitBreaker from "opossum";

const breaker = new CircuitBreaker(processEvent, {
  timeout: 10000,
  errorThresholdPercentage: 50,
  resetTimeout: 30000,
});

app.post("/webhooks/fanfare", async (req, res) => {
  if (!verifySignature(req)) {
    return res.status(401).send("Invalid signature");
  }

  const event = JSON.parse(req.body.toString());

  try {
    await breaker.fire(event);
    res.status(200).send("OK");
  } catch (error) {
    if (breaker.opened) {
      // Circuit is open - return 503 to trigger retry
      return res.status(503).send("Service temporarily unavailable");
    }
    res.status(500).send("Processing failed");
  }
});

2. Log Delivery Metadata

Log retry information for debugging:
app.post("/webhooks/fanfare", (req, res) => {
  const deliveryId = req.headers["x-fanfare-delivery-id"];
  const retryCount = req.headers["x-fanfare-retry-count"] || "0";
  const eventType = req.headers["x-fanfare-event-type"];

  console.log(
    JSON.stringify({
      type: "webhook_received",
      deliveryId,
      retryCount: parseInt(retryCount, 10),
      eventType,
      timestamp: new Date().toISOString(),
    })
  );

  // Process...
});

3. Set Up Alerts

Configure alerts for webhook failures:
async function monitorWebhookHealth() {
  const recentFailures = await getRecentFailures(24 * 60 * 60); // Last 24 hours
  const failureRate = recentFailures.failed / recentFailures.total;

  if (failureRate > 0.1) {
    // More than 10% failure rate
    await sendAlert({
      type: "webhook_health",
      message: `Webhook failure rate is ${(failureRate * 100).toFixed(1)}%`,
      failures: recentFailures.failed,
      total: recentFailures.total,
    });
  }
}

4. Test Retry Handling

Verify your retry handling in development:
// Simulate retry scenario
let requestCount = 0;

app.post("/webhooks/test", (req, res) => {
  requestCount++;

  if (requestCount < 3) {
    // Fail first two attempts
    console.log(`Attempt ${requestCount}: Simulating failure`);
    return res.status(500).send("Simulated failure");
  }

  // Succeed on third attempt
  console.log(`Attempt ${requestCount}: Success`);
  return res.status(200).send("OK");
});

Troubleshooting

Common Issues

IssueCauseSolution
All retries failingEndpoint unreachableCheck firewall, DNS, SSL certificates
Intermittent failuresTimeout exceededOptimize handler, use async processing
Duplicate processingNo idempotency checkImplement deduplication using event ID
Events arriving latePrevious retries queuedCheck X-Fanfare-Original-Time header
Endpoint auto-disabledToo many consecutive failuresFix issues, re-enable in dashboard

Debug Checklist

  1. Verify connectivity: Can you reach your endpoint from external networks?
  2. Check certificates: Is your SSL certificate valid and properly configured?
  3. Review logs: What status codes are you returning?
  4. Test manually: Can you process a test event successfully?
  5. Check timing: Are you responding within 30 seconds?