Skip to main content

Queues and Concurrency

Queues let you isolate different types of work so they don't compete for the same worker slots. Concurrency controls limit how many jobs each worker processes in parallel. Together, they give you fine-grained control over throughput and resource usage.

Named queues

Every job is assigned to a queue. The default queue is "default". Use named queues to separate workloads — fast email sends should not be blocked by slow report generation:

[JobConfig(Queue = "email")]
public class SendWelcomeEmail : IJob<NewUserPayload> { ... }

[JobConfig(Queue = "reports")]
public class GenerateMonthlyReport : IJob<ReportPayload> { ... }

[JobConfig(Queue = "critical")]
public class ProcessPayment : IJob<PaymentPayload> { ... }

Assigning a queue

You can set the queue at two levels. More specific settings override less specific ones.

Per-class (attribute)

[JobConfig(Queue = "email")]
public class SendWelcomeEmail : IJob<NewUserPayload>
{
public async Task ExecuteAsync(NewUserPayload payload, JobContext ctx)
{
// Always enqueued to the "email" queue by default
}
}

Per-call (options)

await jobs.EnqueueAsync<SendWelcomeEmail>(payload, new JobOptions
{
Queue = "critical" // Overrides the class-level "email" queue
});

Resolution order

LevelHow to setDefault
Per-callnew JobOptions { Queue = "..." }
Per-class[JobConfig(Queue = "...")]"default"

Resolution: JobOptions (per-call) > [JobConfig] (per-class) > "default".

Queue names are trimmed and normalized by the server. Max length is 100 characters.

Worker queue binding

The SDK worker automatically polls all queues used by its registered job types. When the worker starts, it discovers which queues to listen on from the SDK's job-type catalog:

If your application registers three job types with queues email, reports, and critical, the worker subscribes to all three queues when it asks Flare for work.

Queue isolation with separate workers

For strict workload isolation, deploy separate worker instances that only register specific job types:

// Worker A: handles email jobs only
builder.Services.AddZeridionFlare(o =>
{
o.ApiKey = "...";
o.JobAssemblies = [typeof(SendWelcomeEmail).Assembly];
});
// Worker B: handles report jobs only
builder.Services.AddZeridionFlare(o =>
{
o.ApiKey = "...";
o.JobAssemblies = [typeof(GenerateMonthlyReport).Assembly];
});

Worker A only polls the email queue; Worker B only polls the reports queue. Slow reports cannot starve email delivery.

How job claiming works

The SDK worker asks Flare for jobs from the queues it's subscribed to:

Key behaviors:

  • Atomic single-claim semantics via FOR UPDATE SKIP LOCKED — Flare's queue claim uses Postgres SELECT … FOR UPDATE SKIP LOCKED inside an UPDATE … WHERE Id IN (…) statement. Two workers polling simultaneously each take a row-level lock on distinct candidate rows; whichever worker's transaction commits first owns that job, and the other worker silently skips that row instead of blocking. The result is a lock-free, contention-free single-claim guarantee — two workers never receive the same job, and a slow claim by one worker never stalls another worker's poll.
  • Queue scoping — only jobs in the worker's registered queues are considered.
  • Capacity-bounded — the worker only requests as many jobs as it has available concurrency slots.
  • Efficient idle waits — if no jobs are available, Flare holds the request open briefly so the worker doesn't have to busy-poll.

Fairness and ordering guarantees

  • FIFO per queue — the claim query orders candidates by CreatedAt ASC, so jobs are dequeued in approximate insertion order within a single queue. With SKIP LOCKED, two workers polling at the same instant may receive jobs in slightly different timestamp order if one of them happens to skip a row another is locking, but each individual queue still drains oldest-first.
  • No server-side priority or weighting — there is no priority column, no weighted queue selection, and no preemption. A long-running job at the head of a queue does not block other queues, but it also does not yield to a "more important" job behind it. If you need priority isolation, use separate queues (e.g. critical vs default) and deploy dedicated worker instances per queue so latency-sensitive work is never stuck behind a backlog of bulk work.
  • Cross-queue ordering is undefined — when a worker polls multiple queues at once, the order in which it receives jobs across queues depends on which rows the Postgres planner happens to lock first. Don't assume "queue A always wins ties against queue B".

Concurrency control

ConcurrencyLimit controls how many jobs a single worker instance processes in parallel. It defaults to 10:

builder.Services.AddZeridionFlare(o =>
{
o.ApiKey = "...";
o.ConcurrencyLimit = 5;
});

Under the hood, the worker uses an internal semaphore bounded by ConcurrencyLimit. Before starting each job, the worker acquires a slot. When the job completes, the slot is released. The worker only asks Flare for as many jobs as it has free slots, so it never claims more work than it can handle.

Choosing the right limit

Job typeRecommended ConcurrencyLimitRationale
I/O-bound (HTTP calls, email)10–20Jobs spend most time waiting; higher parallelism is safe
CPU-bound (image processing)2–4Jobs consume CPU; too many in parallel causes contention
Memory-intensive (large reports)2–5Each job uses significant memory; limit prevents OOM
Mixed workload10 (default)Good general-purpose starting point

Scaling workers

Scale horizontally by deploying multiple worker instances. Each instance requests work independently and Flare's atomic claim semantics ensure no double-claiming:

WorkersConcurrencyLimitMax parallel jobs
11010
21020
31030
520100

Scaling strategies

Uniform scaling — all workers process all job types. Simple to deploy, good for balanced workloads:

Worker Instance 1: all queues, ConcurrencyLimit = 10
Worker Instance 2: all queues, ConcurrencyLimit = 10
Worker Instance 3: all queues, ConcurrencyLimit = 10

Queue-isolated scaling — dedicated workers per queue. Scale each workload independently:

Email Workers (3 instances): queue = "email", ConcurrencyLimit = 20
Report Workers (1 instance): queue = "reports", ConcurrencyLimit = 2
Payment Workers (2 instances): queue = "critical", ConcurrencyLimit = 5

Use Azure Container Apps or Kubernetes to auto-scale worker replicas based on queue depth metrics.

Queue depth monitoring

GET /flare/v1/metrics/queues returns the current depth of each queue:

{
"queues": [
{
"name": "default",
"pending": 45,
"processing": 10,
"scheduled": 3
},
{
"name": "email",
"pending": 120,
"processing": 15,
"scheduled": 0
}
]
}
FieldDescription
pendingJobs waiting to be claimed by a worker
processingJobs currently being executed by a worker
scheduledJobs with a future RunAt or waiting for a parent to complete

Backlog detection

A growing pending count means jobs are arriving faster than workers can process them. Possible responses:

  1. Increase ConcurrencyLimit — if workers have idle CPU/memory
  2. Add worker instances — horizontal scaling
  3. Investigate slow jobs — a single slow job type may be consuming all worker slots

Autoscaling with KEDA

:::warning Planned integration — not yet shipped A first-class KEDA scaler that subscribes to GET /flare/v1/metrics/queues is on the roadmap but not currently available. There is no Flare-published KEDA ScaledObject and no out-of-the-box queue-depth scaler today. Until it ships, use one of these workarounds for Azure Container Apps / Kubernetes autoscaling:

  • CPU/memory-based scaling — set a ScaledObject (or Container Apps replicas rule) on CPU utilisation. Effective when worker CPU correlates with backlog (most I/O-bound workloads), insensitive when workers are blocked on the network and CPU stays flat.
  • Custom-metric scraper — write a small sidecar (or use a generic Prometheus exporter) that polls GET /flare/v1/metrics/queues from your own infrastructure on a schedule, exports pending_jobs{queue="..."} as a metric, and feeds that into KEDA's prometheus scaler or HPA's external-metrics API. This gives you queue-depth-driven scaling today without waiting for a built-in scaler.
  • Manual / scheduled scaling — for predictable workloads (e.g. nightly report jobs), use a CronTrigger or scheduled replica rule rather than reactive autoscaling.

Watch the changelog for the first-party KEDA scaler announcement; when it ships, the polling sidecar will be removable. :::

Poll interval

PollInterval controls how long the worker waits between poll cycles when the previous poll returned no jobs:

builder.Services.AddZeridionFlare(o =>
{
o.PollInterval = TimeSpan.FromSeconds(5); // Default: 2s
});

Lower values increase responsiveness (faster job pickup) but increase API call volume. The default is a good balance for most workloads.

note

The poll interval only applies when an idle request returns no work. When jobs are available, the worker requests more work immediately after processing the claimed batch.

Graceful shutdown

When the host shuts down (e.g., SIGTERM from a container orchestrator), the worker:

  1. Stops asking for new work
  2. Waits for all in-flight jobs to complete
  3. Reports each completed job (success or failure) back to Flare
  4. Exits cleanly

This prevents jobs from being orphaned mid-execution. Flare also reclaims jobs from workers that stop reporting progress, providing a safety net for cases where the worker crashes without completing the shutdown sequence.

Best practices

  1. Use descriptive queue namesemail, reports, billing, imports are immediately meaningful. Avoid generic names like queue1.

  2. Isolate long-running jobs — put slow jobs (report generation, data imports) in their own queue so they don't block fast jobs (email sends, webhook deliveries).

  3. Match concurrency to resource requirements — CPU-bound jobs need lower concurrency than I/O-bound jobs. Start with the default (10) and adjust based on monitoring.

  4. Monitor queue depth — track pending counts via GET /flare/v1/metrics/queues. Rising backlogs mean you need more workers or faster jobs.

  5. Scale horizontally, not just vertically — adding worker instances is generally more effective than increasing ConcurrencyLimit beyond 20, because each instance gets its own process memory and CPU scheduling.

See also