Queues and Concurrency
Queues let you isolate different types of work so they don't compete for the same worker slots. Concurrency controls limit how many jobs each worker processes in parallel. Together, they give you fine-grained control over throughput and resource usage.
Named queues
Every job is assigned to a queue. The default queue is "default". Use named queues to separate workloads — fast email sends should not be blocked by slow report generation:
[JobConfig(Queue = "email")]
public class SendWelcomeEmail : IJob<NewUserPayload> { ... }
[JobConfig(Queue = "reports")]
public class GenerateMonthlyReport : IJob<ReportPayload> { ... }
[JobConfig(Queue = "critical")]
public class ProcessPayment : IJob<PaymentPayload> { ... }
Assigning a queue
You can set the queue at two levels. More specific settings override less specific ones.
Per-class (attribute)
[JobConfig(Queue = "email")]
public class SendWelcomeEmail : IJob<NewUserPayload>
{
public async Task ExecuteAsync(NewUserPayload payload, JobContext ctx)
{
// Always enqueued to the "email" queue by default
}
}
Per-call (options)
await jobs.EnqueueAsync<SendWelcomeEmail>(payload, new JobOptions
{
Queue = "critical" // Overrides the class-level "email" queue
});
Resolution order
| Level | How to set | Default |
|---|---|---|
| Per-call | new JobOptions { Queue = "..." } | — |
| Per-class | [JobConfig(Queue = "...")] | "default" |
Resolution: JobOptions (per-call) > [JobConfig] (per-class) > "default".
Queue names are trimmed and normalized by the server. Max length is 100 characters.
Worker queue binding
The SDK worker automatically polls all queues used by its registered job types. When the worker starts, it discovers which queues to listen on from the SDK's job-type catalog:
If your application registers three job types with queues email, reports, and critical, the worker subscribes to all three queues when it asks Flare for work.
Queue isolation with separate workers
For strict workload isolation, deploy separate worker instances that only register specific job types:
// Worker A: handles email jobs only
builder.Services.AddZeridionFlare(o =>
{
o.ApiKey = "...";
o.JobAssemblies = [typeof(SendWelcomeEmail).Assembly];
});
// Worker B: handles report jobs only
builder.Services.AddZeridionFlare(o =>
{
o.ApiKey = "...";
o.JobAssemblies = [typeof(GenerateMonthlyReport).Assembly];
});
Worker A only polls the email queue; Worker B only polls the reports queue. Slow reports cannot starve email delivery.
How job claiming works
The SDK worker asks Flare for jobs from the queues it's subscribed to:
Key behaviors:
- Atomic single-claim semantics via
FOR UPDATE SKIP LOCKED— Flare's queue claim uses PostgresSELECT … FOR UPDATE SKIP LOCKEDinside anUPDATE … WHERE Id IN (…)statement. Two workers polling simultaneously each take a row-level lock on distinct candidate rows; whichever worker's transaction commits first owns that job, and the other worker silently skips that row instead of blocking. The result is a lock-free, contention-free single-claim guarantee — two workers never receive the same job, and a slow claim by one worker never stalls another worker's poll. - Queue scoping — only jobs in the worker's registered queues are considered.
- Capacity-bounded — the worker only requests as many jobs as it has available concurrency slots.
- Efficient idle waits — if no jobs are available, Flare holds the request open briefly so the worker doesn't have to busy-poll.
Fairness and ordering guarantees
- FIFO per queue — the claim query orders candidates by
CreatedAt ASC, so jobs are dequeued in approximate insertion order within a single queue. WithSKIP LOCKED, two workers polling at the same instant may receive jobs in slightly different timestamp order if one of them happens to skip a row another is locking, but each individual queue still drains oldest-first. - No server-side priority or weighting — there is no priority column, no weighted queue selection, and no preemption. A long-running job at the head of a queue does not block other queues, but it also does not yield to a "more important" job behind it. If you need priority isolation, use separate queues (e.g.
criticalvsdefault) and deploy dedicated worker instances per queue so latency-sensitive work is never stuck behind a backlog of bulk work. - Cross-queue ordering is undefined — when a worker polls multiple queues at once, the order in which it receives jobs across queues depends on which rows the Postgres planner happens to lock first. Don't assume "queue A always wins ties against queue B".
Concurrency control
ConcurrencyLimit controls how many jobs a single worker instance processes in parallel. It defaults to 10:
builder.Services.AddZeridionFlare(o =>
{
o.ApiKey = "...";
o.ConcurrencyLimit = 5;
});
Under the hood, the worker uses an internal semaphore bounded by ConcurrencyLimit. Before starting each job, the worker acquires a slot. When the job completes, the slot is released. The worker only asks Flare for as many jobs as it has free slots, so it never claims more work than it can handle.
Choosing the right limit
| Job type | Recommended ConcurrencyLimit | Rationale |
|---|---|---|
| I/O-bound (HTTP calls, email) | 10–20 | Jobs spend most time waiting; higher parallelism is safe |
| CPU-bound (image processing) | 2–4 | Jobs consume CPU; too many in parallel causes contention |
| Memory-intensive (large reports) | 2–5 | Each job uses significant memory; limit prevents OOM |
| Mixed workload | 10 (default) | Good general-purpose starting point |
Scaling workers
Scale horizontally by deploying multiple worker instances. Each instance requests work independently and Flare's atomic claim semantics ensure no double-claiming:
| Workers | ConcurrencyLimit | Max parallel jobs |
|---|---|---|
| 1 | 10 | 10 |
| 2 | 10 | 20 |
| 3 | 10 | 30 |
| 5 | 20 | 100 |
Scaling strategies
Uniform scaling — all workers process all job types. Simple to deploy, good for balanced workloads:
Worker Instance 1: all queues, ConcurrencyLimit = 10
Worker Instance 2: all queues, ConcurrencyLimit = 10
Worker Instance 3: all queues, ConcurrencyLimit = 10
Queue-isolated scaling — dedicated workers per queue. Scale each workload independently:
Email Workers (3 instances): queue = "email", ConcurrencyLimit = 20
Report Workers (1 instance): queue = "reports", ConcurrencyLimit = 2
Payment Workers (2 instances): queue = "critical", ConcurrencyLimit = 5
Use Azure Container Apps or Kubernetes to auto-scale worker replicas based on queue depth metrics.
Queue depth monitoring
GET /flare/v1/metrics/queues returns the current depth of each queue:
{
"queues": [
{
"name": "default",
"pending": 45,
"processing": 10,
"scheduled": 3
},
{
"name": "email",
"pending": 120,
"processing": 15,
"scheduled": 0
}
]
}
| Field | Description |
|---|---|
pending | Jobs waiting to be claimed by a worker |
processing | Jobs currently being executed by a worker |
scheduled | Jobs with a future RunAt or waiting for a parent to complete |
Backlog detection
A growing pending count means jobs are arriving faster than workers can process them. Possible responses:
- Increase
ConcurrencyLimit— if workers have idle CPU/memory - Add worker instances — horizontal scaling
- Investigate slow jobs — a single slow job type may be consuming all worker slots
Autoscaling with KEDA
:::warning Planned integration — not yet shipped
A first-class KEDA scaler that subscribes to GET /flare/v1/metrics/queues is on the roadmap but not currently available. There is no Flare-published KEDA ScaledObject and no out-of-the-box queue-depth scaler today. Until it ships, use one of these workarounds for Azure Container Apps / Kubernetes autoscaling:
- CPU/memory-based scaling — set a
ScaledObject(or Container Appsreplicasrule) on CPU utilisation. Effective when worker CPU correlates with backlog (most I/O-bound workloads), insensitive when workers are blocked on the network and CPU stays flat. - Custom-metric scraper — write a small sidecar (or use a generic Prometheus exporter) that polls
GET /flare/v1/metrics/queuesfrom your own infrastructure on a schedule, exportspending_jobs{queue="..."}as a metric, and feeds that into KEDA'sprometheusscaler or HPA's external-metrics API. This gives you queue-depth-driven scaling today without waiting for a built-in scaler. - Manual / scheduled scaling — for predictable workloads (e.g. nightly report jobs), use a
CronTriggeror scheduled replica rule rather than reactive autoscaling.
Watch the changelog for the first-party KEDA scaler announcement; when it ships, the polling sidecar will be removable. :::
Poll interval
PollInterval controls how long the worker waits between poll cycles when the previous poll returned no jobs:
builder.Services.AddZeridionFlare(o =>
{
o.PollInterval = TimeSpan.FromSeconds(5); // Default: 2s
});
Lower values increase responsiveness (faster job pickup) but increase API call volume. The default is a good balance for most workloads.
The poll interval only applies when an idle request returns no work. When jobs are available, the worker requests more work immediately after processing the claimed batch.
Graceful shutdown
When the host shuts down (e.g., SIGTERM from a container orchestrator), the worker:
- Stops asking for new work
- Waits for all in-flight jobs to complete
- Reports each completed job (success or failure) back to Flare
- Exits cleanly
This prevents jobs from being orphaned mid-execution. Flare also reclaims jobs from workers that stop reporting progress, providing a safety net for cases where the worker crashes without completing the shutdown sequence.
Best practices
-
Use descriptive queue names —
email,reports,billing,importsare immediately meaningful. Avoid generic names likequeue1. -
Isolate long-running jobs — put slow jobs (report generation, data imports) in their own queue so they don't block fast jobs (email sends, webhook deliveries).
-
Match concurrency to resource requirements — CPU-bound jobs need lower concurrency than I/O-bound jobs. Start with the default (10) and adjust based on monitoring.
-
Monitor queue depth — track
pendingcounts viaGET /flare/v1/metrics/queues. Rising backlogs mean you need more workers or faster jobs. -
Scale horizontally, not just vertically — adding worker instances is generally more effective than increasing
ConcurrencyLimitbeyond 20, because each instance gets its own process memory and CPU scheduling.
See also
- ZeridionFlareOptions —
ConcurrencyLimit,PollInterval,DefaultQueue - JobOptions — per-call
Queueoverride - JobConfigAttribute — per-class
Queuedefault - Monitoring — metrics API and health endpoints