# Retries & Backoff In v3 retries are driven entirely by the **worker** and the **queue backend** — never by a blocking loop inside the runtime. A job runs **one attempt** per delivery; on failure the worker either requeues it with a backoff delay (`nack(delay)`) or hands it to the dead-letter path (`abandon()`). ## The model `QueueWorker::processOnce()` executes exactly one attempt and then decides: - **Success** → `ack(lease)`. The message is removed permanently. - **Failure, retries remaining** → `nack(lease, delay)`. The backend requeues the message; the delay applies real backoff (Redis ZSET / database `available_at` / native release-with-delay). - **Failure, retries exhausted** → `abandon(lease)`. The backend settles the message via its native mechanism (Beanstalk buries, Database marks `failed`, Redis/Sync drop it) and a `critical` log is written so it cannot loop. See [Dead Letter Queue](#dead-letter-queue). There is **no blocking `sleep()`** in the worker between the attempt and the requeue — the delay lives in the backend, so the worker is free to process other jobs. ## Total runs = `maxRetries + 1` `maxRetries` counts **retries**, not total executions. The envelope carries a 0-based `attempts` counter (completed runs before the current one): | `maxRetries` | Total executions | |--------------|------------------| | `0` (default) | 1 (no retries) | | `1` | 2 | | `2` | 3 | | `3` | 4 | The worker requeues while `attempts < maxRetries`; once `attempts == maxRetries` and the attempt fails, it abandons. This single-attempt-per-fetch design removes the legacy double-retry where a coordinator loop and a requeue helper each consumed `maxRetries`. Set it per job: ```php use Daycry\Jobs\Jobs; Jobs::define('command', 'app:sync-stripe') ->queue('integrations') ->maxRetries(5) // up to 6 total executions ->dispatch(); ``` ## RetryPolicyFixed A single class implements all strategies. The worker constructs it from `Config\Jobs` and asks for the delay before the **next** attempt: ```php use Daycry\Jobs\Execution\RetryPolicyFixed; $policy = new RetryPolicyFixed( base: 5, strategy: 'exponential', multiplier: 2.0, max: 300, jitter: true, ); $delay = $policy->computeDelay($attempt); // seconds; $attempt is 1-based ``` `computeDelay($attempt)` returns `0` when `$attempt <= 1` (no pre-delay before the first run). ## Strategies Configured via `Config\Jobs::$retryBackoffStrategy`: | Strategy | Behaviour | |----------|-----------| | `none` (default) | Always returns `0` — immediate requeue, no backoff. | | `fixed` | Constant delay equal to `retryBackoffBase`. | | `exponential` | `delay = base * multiplier^(attempt-2)`, capped at `retryBackoffMax`, with optional jitter. | ### Parameters | Setting | Used by | Description | |---------|---------|-------------| | `retryBackoffBase` | fixed, exponential | Baseline seconds; equals the first retry delay. | | `retryBackoffMultiplier` | exponential | Growth factor between attempts. | | `retryBackoffMax` | exponential | Upper cap on any computed delay. | | `retryBackoffJitter` | exponential | Add ±15% randomness to reduce thundering herd. | ### Exponential formula ```text delay = base * multiplier^(attempt - 2) (clamped to max) ``` The exponent is `attempt - 2`, so the **first retry** (attempt 2) delay equals `base`: ```text base=5, multiplier=2, max=300: computeDelay(1) -> 0s (first run, no pre-delay) computeDelay(2) -> 5s (5 * 2^0) computeDelay(3) -> 10s (5 * 2^1) computeDelay(4) -> 20s (5 * 2^2) computeDelay(5) -> 40s (5 * 2^3) ``` With jitter enabled the result varies by ±15%. Delays above `max` are clamped. ## How the delay is applied per backend The worker passes the computed delay to `QueueBackend::nack($lease, $delay)`. Each backend honours it differently, but the observable effect is the same: the message becomes fetchable again only after the delay elapses, and its `attempts` counter is incremented. | Backend | How `nack(delay)` applies backoff | |---------|-----------------------------------| | `database` | `requeueInPlace()` updates the **same row** with `attempts + 1` and `available_at = now + delay`; the claim query only fetches rows whose `available_at`/`schedule` is due. No orphan rows. | | `redis` | Re-serialises the payload with `attempts + 1`. With a delay it is added to the `{q}-delayed` ZSET (score = due timestamp) and promoted to the waiting list when due; with no delay it is `LPUSH`ed straight back to `{q}-waiting`. | | `beanstalk` | beanstalkd's native `release` cannot mutate the job body, so the backend deletes the reserved job and `put`s a fresh copy with `attempts + 1` and the delay as the put-delay. | | `serviceBus` | Service Bus has no in-place unlock-with-delay, so the backend enqueues a fresh copy (setting `ScheduledEnqueueTimeUtc` when a delay is requested) and settles the original lock. | | `sync` | No-op (`nack()` returns `true`). The Sync backend runs jobs inline at `enqueue()` time, so there is nothing to requeue. | > **Note:** Because `nack()` (re)serialises the wire payload, the **mutable** `attempts`/`schedule` > fields change on every requeue. The HMAC signature deliberately excludes those fields, so a > requeued message still verifies. See [Security](security.md). ## Dead Letter Queue When retries are exhausted the worker calls `backend->abandon(lease)`, which settles the message via the **backend's own native mechanism** — there is no shared dead-letter routing: | Backend | What `abandon()` does | |---------|-----------------------| | `beanstalk` | Buries the job onto beanstalkd's **own** buried list (its native DLQ), not the queue named in `$deadLetterQueue`. | | `database` | Marks the row `status = 'failed'` so it is retained for audit but never re-fetched. | | `redis` | Drops the message from the processing set (it is removed). | | `serviceBus` | Settles (dead-letters) the original lock via the broker. | | `sync` | No-op (returns `true`). | In every case a `critical` log entry records the permanent failure with the last error. > **Important:** The worker does **not** read `Config\Jobs::$deadLetterQueue`. Setting it does *not* > cause the worker to enqueue failed payloads onto a named queue. That config is consumed only by the > opt-in `Daycry\Jobs\Libraries\DeadLetterQueue::store()` helper, which you must call yourself if you > want to forward a failed payload to a dedicated DLQ: ```php use Daycry\Jobs\Libraries\DeadLetterQueue; // Opt-in, app-level. Reads Config\Jobs::$deadLetterQueue and re-enqueues the payload there. (new DeadLetterQueue())->store($payload, $handler, $reason, $attempts); ``` ```php // In Config\Jobs — only the DeadLetterQueue helper above honours this. public ?string $deadLetterQueue = 'failed_jobs'; ``` ## At-least-once delivery Every persistent backend is **at-least-once**: a crashed worker's lease is recovered by `reapExpired()` (see `jobs:queue:reap`) and the message is redelivered. Handlers should therefore be **idempotent**. For exactly-once-ish semantics, opt in to deduplication: ```php Jobs::define('command', 'app:close-month') ->queue('reports') ->idempotencyKey('close-month-2026-06') ->dispatch(); ``` `IdempotencyGuard` marks the key in the cache (TTL `Config\Jobs::$idempotencyTtl`). A second delivery of the same key is acked **without** running the handler again. The check-then-set is best-effort and only atomic on backends that support `SET NX` (e.g. Redis). For the full attempt-counter semantics see [Attempts](ATTEMPTS.md), and for the trust-boundary view of replays and tampering see [Security](security.md). ## Metrics The worker emits per-cycle counters (when a metrics collector is configured): | Counter | Meaning | |---------|---------| | `jobs_fetched` | A message was leased. | | `jobs_succeeded` | Attempt succeeded; message acked. | | `jobs_failed` | Attempt failed (before deciding requeue vs dead-letter). | | `jobs_requeued` | Failure that resulted in a requeue with backoff. | | `jobs_failed_permanently` | Retries exhausted; message dead-lettered. | | `jobs_skipped_idempotent` | Duplicate idempotency key; acked without running. | | `jobs_rejected_signature` | Message rejected for an invalid/missing HMAC signature. |