Retry Strategies

When a job fails, Zeridion Flare automatically retries it with exponential backoff and jitter. You control how many times a job is retried and what happens when all attempts are exhausted.

How retries work

A worker picks up a job and calls your ExecuteAsync method
If ExecuteAsync throws an unhandled exception, the worker reports the failure back to Flare
The server checks whether AttemptNumber < MaxAttempts
If retries remain: the job returns to Pending with a RunAt delay (exponential backoff + jitter)
If retries are exhausted: the job moves to DeadLetter

Exponential backoff with jitter

The retry delay doubles with each attempt, starting at 60 seconds. A uniform random jitter in the range 0–3000 ms (0–3 seconds, millisecond resolution) is added to prevent thundering herd when many jobs fail simultaneously. Both the exponent and the resulting delay are server-clamped so a retry can never schedule into the past — the maximum effective delay is 6 hours regardless of attempt number.

Formula: delay = 60s × 2^(attempt - 1) + random_uniform(0–3000 ms)

Attempt	Base delay	Actual range
1	60s (1 min)	60–63s
2	120s (2 min)	120–123s
3	240s (4 min)	240–243s
4	480s (8 min)	480–483s
5	960s (16 min)	960–963s
6	1920s (32 min)	1920–1923s
7	3840s (64 min)	3840–3843s

With the default MaxAttempts = 3, a job gets three tries spanning roughly 6 minutes of total backoff (60s + 120s ≈ 3 minutes between attempts 1→3, plus the original execution time) before dead-lettering.

Configuring MaxAttempts

You can set the maximum retry count at three levels. More specific settings override less specific ones.

Per-class default

Apply [JobConfig] to set a default for all enqueues of this job type:

[JobConfig(MaxAttempts = 5)]
public class SendWelcomeEmail : IJob<NewUserPayload>
{
    public async Task ExecuteAsync(NewUserPayload payload, JobContext ctx)
    {
        // Up to 5 attempts before dead letter
    }
}

Per-call override

Pass JobOptions when enqueuing to override the class default for a specific enqueue:

await jobs.EnqueueAsync<SendWelcomeEmail>(payload, new JobOptions
{
    MaxAttempts = 10
});

Precedence

Level	How to set	Default
Per-call	`new JobOptions { MaxAttempts = N }`	—
Per-class	`[JobConfig(MaxAttempts = N)]`	3
Server-side clamp	—	1–100

Resolution order: JobOptions (per-call) > [JobConfig] (per-class) > 3 (hardcoded default).

The server clamps the final value to the range 1–100. Values outside this range are reset to 3.

AttemptNumber tracking

AttemptNumber starts at 0 on the job entity and is incremented to 1 when a worker first claims the job. Inside your ExecuteAsync, ctx.AttemptNumber gives the current attempt number.

Use it for conditional logic:

public async Task ExecuteAsync(PaymentPayload payload, JobContext ctx)
{
    if (ctx.AttemptNumber == ctx.MaxAttempts)
    {
        ctx.Logger.LogWarning("Final attempt for job {JobId}, alerting ops", ctx.JobId);
        await _alertService.NotifyAsync($"Job {ctx.JobId} on final attempt");
    }

    await ProcessPayment(payload, ctx.CancellationToken);
}

Dead letter

When AttemptNumber >= MaxAttempts after a failure, the job moves to DeadLetter:

State is set to DeadLetter
CompletedAt is set to the current time
Error details (ErrorType, ErrorMessage, ErrorStackTrace) are preserved from the last failure
Any child continuation jobs in Scheduled state are cancelled

Dead-lettered jobs remain in the database for inspection. They are not deleted or cleaned up automatically.

Querying dead letter jobs

GET /flare/v1/jobs?state=dead_letter&limit=50

Manual retry from dead letter

You can requeue a dead-lettered job via the API, SDK, or dashboard:

API

POST /flare/v1/jobs/{id}/retry

This resets the job to Pending, clears error/worker/timing fields, and bumps MaxAttempts if the current AttemptNumber has already reached it.

SDK

var retried = await jobs.RetryAsync(jobId);
// returns true if requeued, false if job is not in a retryable state

tip

RetryAsync returns false (instead of throwing) when the job is in a state that cannot be retried (e.g., Processing or Succeeded). No try/catch needed.

Dashboard

Click the Retry button on the job detail page to requeue a dead-lettered job with one click.

HTTP client retries (SDK to API)

The job-level retries described above are separate from the SDK's HTTP transport retries. The SDK registers its HTTP client with AddStandardResilienceHandler() from Microsoft.Extensions.Http.Resilience, which provides:

Retry — automatic retry with exponential backoff for transient HTTP failures (5xx, timeouts)
Circuit breaker — stops sending requests when the API is consistently failing
Timeout — per-request and total timeout enforcement

These transport-level retries protect against network blips and temporary API outages. They happen transparently before your code sees the response.

Best practices

Keep jobs idempotent — since jobs may execute more than once, design ExecuteAsync so that re-running with the same payload produces the same result. Use database upserts, check-before-write, or idempotency keys on downstream calls.

Use ctx.AttemptNumber for logging — always include the attempt number in your log messages so you can trace the retry history:

ctx.Logger.LogInformation(
    "Attempt {Attempt}/{Max} for job {JobId}",
    ctx.AttemptNumber, ctx.MaxAttempts, ctx.JobId);

Set reasonable timeouts — jobs without timeouts can run indefinitely and block the worker. Use [JobConfig(TimeoutSeconds = 300)] to cap execution time. The worker reports progress periodically; if it stops reporting, Flare reclaims the job for retry.
Don't catch and swallow all exceptions — let unexpected exceptions bubble up so the retry engine can do its job. Only catch exceptions when you need to prevent retries (e.g., invalid input data that will never succeed).
Monitor dead letter counts — use GET /flare/v1/metrics/summary to track dead letter accumulation. A rising dead letter count signals a systemic issue.

How retries work​

Exponential backoff with jitter​

Configuring MaxAttempts​

Per-class default​

Per-call override​

Precedence​

AttemptNumber tracking​

Dead letter​

Querying dead letter jobs​

Manual retry from dead letter​

API​

SDK​

Dashboard​

HTTP client retries (SDK to API)​

Best practices​

See also​