Exception Handling
This document explains how exceptions are processed and handled throughout the Jobs system lifecycle.
Overview
The Jobs system implements a resilient exception handling strategy with multiple layers:
Safe Execution Wrapper (
JobLifecycleCoordinator)Retry Logic (
JobLifecycleCoordinator)Logging & Metrics (automatic on exceptions)
Dead Letter Queue (permanent failures)
Notifications (success/failure alerts)
Key Principle: Exceptions never crash the worker. All errors are caught, logged, and handled gracefully with automatic retries.
Exception Flow Diagram
Job Execution Attempt
↓
┌─────────────────────┐
│JobLifecycleCoordinator│
│ - executeJobInternal(job) │
└─────────────────────┘
↓
Try-Catch Block
↓
┌─────────┐
│Exception│
│Thrown? │
└─────────┘
↙ ↘
YES NO
↓ ↓
Catch Return
Block Success
↓ ↓
Create ExecutionResult
ExecutionResult (success=true)
(success=false)
↓ ↓
└─────┬───┘
↓
JobLifecycleCoordinator
- Check maxRetries
- Send notifications
↓
┌─────────┐
│Retries │
│Left? │
└─────────┘
↙ ↘
YES NO
↓ ↓
Requeue Dead Letter Queue
+ Backoff (if configured)
↓ ↓
Log Metrics Log Metrics
jobs_requeued jobs_failed_permanently
Layer 1: Safe Execution (Internal)
Location: src/Execution/JobLifecycleCoordinator.php
Responsibility: Execute the job handler with exception safety, handling buffer capture and timing.
Code Flow
private function executeJobInternal(Job $job): ExecutionResult
{
$start = microtime(true);
try {
// 1. Resolve handler class
$class = $mapping[$job->getJob()] ?? null;
if (!$class || !is_subclass_of($class, Job::class)) {
throw JobException::forInvalidJob($job->getJob());
}
// 2. Execute lifecycle hooks
$handler = new $class();
$job = $handler->beforeRun($job);
// 3. Capture output buffer
ob_start();
$returned = $handler->handle($job->getPayload());
$buffer = ob_get_clean();
// 4. Post-execution hook
$job = $handler->afterRun($job);
// 5. Success result
return new ExecutionResult(
success: true,
output: $this->normalizeOutput($returned),
error: null,
startedAt: $start,
endedAt: microtime(true)
);
} catch (Throwable $e) {
// 6. Clean output buffer on exception
if (ob_get_level() > 0) {
ob_end_clean();
}
// 7. Failure result with exception message
return new ExecutionResult(
success: false,
output: null,
error: $e->getMessage(),
startedAt: $start,
endedAt: microtime(true)
);
}
}
What Gets Caught
Exception Source |
Caught By |
Result |
|---|---|---|
Invalid job handler |
|
|
|
Main try-catch |
Error message captured |
|
Main try-catch |
Error message captured |
|
Main try-catch |
Error message captured |
Any |
Main try-catch |
Error message captured |
Example Scenarios
Scenario 1: Handler throws exception
// In your custom job handler
public function handle($payload)
{
throw new RuntimeException('Database connection failed');
}
Result:
ExecutionResult(
success: false,
output: null,
error: 'Database connection failed',
startedAt: 1737339240.123,
endedAt: 1737339240.456
)
Scenario 2: Invalid payload causes error
public function handle($payload)
{
$data = json_decode($payload, true);
// If payload is invalid JSON, json_decode returns null
// Accessing null array throws TypeError
return $data['user_id'];
}
Result:
ExecutionResult(
success: false,
error: 'Trying to access array offset on null',
...
)
Layer 2: JobLifecycleCoordinator (Retry Logic)
Location: src/Execution/JobLifecycleCoordinator.php
Responsibility: Orchestrate retries, timeouts, and notifications.
Code Flow
public function run(Job $job, string $source = 'cron'): LifecycleOutcome
{
$maxRetries = $job->getMaxRetries() ?? 0;
$attemptNumber = $job->getAttempt();
while (true) {
$attemptNumber++;
// Execute with timeout protection (if configured)
$result = $this->safeExecuteWithTimeout($job, $timeout);
// Track attempt metadata
$attemptsMeta[] = [
'attempt' => $attemptNumber,
'success' => $result->success,
'error' => $result->error,
'duration' => $result->durationSeconds()
];
// Send notifications
if ($result->success && $job->shouldNotifyOnSuccess()) {
$job->notify($result);
} elseif (!$result->success && $job->shouldNotifyOnFailure()) {
$job->notify($result);
}
// SUCCESS: Exit retry loop
if ($result->success) {
break;
}
// FAILURE: Check if retries exhausted
if ($attemptNumber > $maxRetries) {
$finalFailure = true;
break;
}
// RETRY: Calculate backoff delay
$delay = $policy->computeDelay($attemptNumber + 1);
sleep($delay);
}
// Return final outcome
return new LifecycleOutcome(
finalResult: $result,
attempts: $attemptNumber,
finalFailure: $finalFailure,
attemptsMeta: $attemptsMeta
);
}
Retry Policy
Configured via Config\Jobs:
public string $retryBackoffStrategy = 'exponential';
public int $retryBackoffBase = 60; // 60 seconds
public float $retryBackoffMultiplier = 2.0;
public int $retryBackoffMax = 3600; // 1 hour
public bool $retryBackoffJitter = true;
Delay Calculation Examples:
Strategy |
Attempt |
Formula |
Delay |
|---|---|---|---|
|
Any |
0 |
0s (immediate) |
|
Any |
base |
60s |
|
1 |
base × multiplier^0 |
60s |
|
2 |
base × multiplier^1 |
120s |
|
3 |
base × multiplier^2 |
240s |
|
4 |
base × multiplier^3 |
480s |
With jitter=true: ±15% random variation (prevents thundering herd).
Per-Job Retry Configuration
// Set max retries per job
$job = (new Job('command', 'import:users'))
->maxRetries(5); // Try up to 6 times total (1 initial + 5 retries)
// Disable retries for specific job
$job->maxRetries(0); // Fail immediately on error
Layer 3: RequeueHelper (Finalization)
Location: src/Queues/RequeueHelper.php
Responsibility: Increment attempts, log metrics, route to DLQ.
Code Flow
public function finalize(Job $job, JobEnvelope $envelope, callable $removeFn, bool $success): void
{
// Increment attempt counter (authoritative)
$job->addAttempt();
if ($success) {
$removeFn($job, false); // Remove from queue
$this->metrics->increment('jobs_succeeded');
return;
}
// FAILURE
$maxRetries = $job->getMaxRetries();
$currentAttempt = $job->getAttempt();
// Determine if should requeue
$shouldRequeue = ($maxRetries !== null)
&& ($currentAttempt < ($maxRetries + 1));
if ($shouldRequeue) {
// REQUEUE for retry
$removeFn($job, true);
$this->metrics->increment('jobs_failed');
$this->metrics->increment('jobs_requeued');
} else {
// PERMANENT FAILURE — v1.0.3+ ordering: DLQ first, then origin removal,
// and emit jobs_dlq_failed when the DLQ could not persist the message.
$stored = $this->dlq->store($job, 'Max retries exceeded', $currentAttempt);
$removeFn($job, false);
$this->metrics->increment('jobs_failed');
$this->metrics->increment('jobs_failed_permanently');
if (! $stored) {
$this->metrics->increment('jobs_dlq_failed');
}
}
}
Metrics Emitted
Metric |
When |
Labels |
|---|---|---|
|
Job completes successfully |
|
|
Job attempt fails |
|
|
Failed job retried |
|
|
Retries exhausted |
|
|
DLQ unconfigured or push to DLQ failed (silent-loss alert, v1.0.3+) |
|
|
Hit |
|
Layer 4: Dead Letter Queue
Location: src/Libraries/DeadLetterQueue.php
Purpose: Store permanently failed jobs for forensic analysis.
Automatic Storage
When a job exceeds maxRetries, it’s automatically moved to the configured DLQ:
// Config
public ?string $deadLetterQueue = 'failed_jobs';
Metadata Added
$envelope->payload['meta'] = [
'dlq_reason' => 'Max retries exceeded',
'dlq_timestamp' => '2026-01-19T15:30:00Z',
'dlq_attempts' => 5,
'original_queue' => 'high_priority',
'original_error' => 'Database connection timeout'
];
Return value (v1.0.3+)
DeadLetterQueue::store() returns bool:
true— the job was successfully pushed to the DLQ.false— DLQ unconfigured (Config\Jobs::$deadLetterQueueis null) or the underlying push raised. Both cases are logged atcriticalseverity andRequeueHelper::finalize()emitsjobs_dlq_failedso operators can alert on it.
Querying DLQ
use Daycry\Jobs\Libraries\DeadLetterQueue;
$dlq = new DeadLetterQueue();
$stats = $dlq->getStats();
Layer 5: Logging
Location: src/Loggers/JobLogger.php
Automatic on Every Execution (if logPerformance = true):
Log Structure
{
"execution_id": "abc123",
"job_name": "import_users",
"queue": "default",
"source": "queue",
"attempt": 3,
"payload_hash": "sha256:...",
"output": "Processed 1500 records",
"error": "Database connection timeout",
"output_length": 23,
"started_at": "2026-01-19 15:30:00",
"ended_at": "2026-01-19 15:30:05",
"duration_seconds": 5.23,
"retry_strategy": "exponential"
}
Exception Information Captured
Error Message:
$exception->getMessage()Attempt Number: Which retry attempt failed
Duration: How long before exception thrown
Payload Hash: For correlation with previous attempts
Sensitive Data Protection
Exception messages are automatically sanitized:
// Before sanitization
error: "API key sk_live_51H9X2sK3Zq8... is invalid"
// After sanitization
error: "API key ***API_KEY*** is invalid"
Patterns detected:
JWT tokens →
***JWT_TOKEN***API keys →
***API_KEY***Bearer tokens →
Bearer ***TOKEN***
Layer 6: Notifications
Configuration:
$job->notifyOnSuccess(); // Email on success
$job->notifyOnFailure(); // Email on failure
$job->notifyOnCompletion(); // Email always
Notification on Exception
When an exception occurs and notifyOnFailure() is enabled:
// In JobLifecycleCoordinator
if (!$result->success && $job->shouldNotifyOnFailure()) {
$job->notify($result);
}
Email Content:
Job name
Error message (sanitized)
Attempt number
Duration
Timestamp
Complete Exception Lifecycle Example
Setup
// Job configuration
$job = (new Job('command', 'process:payment'))
->named('payment_processor')
->maxRetries(3)
->timeout(300)
->notifyOnFailure()
->enqueue('high_priority');
Execution Timeline
Attempt 1 (15:30:00):
1. JobExecutor::execute() called
2. Payment API throws SocketException: "Connection refused"
3. Exception caught in try-catch
4. Returns ExecutionResult(success=false, error="Connection refused")
5. JobLifecycleCoordinator checks: attempt 1 <= maxRetries 3
6. Calculates delay: 60s (exponential backoff)
7. Sleeps 60 seconds
Attempt 2 (15:31:00):
1. JobExecutor::execute() called again
2. Payment API throws TimeoutException: "Request timeout after 30s"
3. Exception caught
4. Returns ExecutionResult(success=false, error="Request timeout after 30s")
5. Coordinator checks: attempt 2 <= maxRetries 3
6. Calculates delay: 120s
7. Sleeps 120 seconds
Attempt 3 (15:33:00):
1. JobExecutor::execute() called
2. Payment API throws AuthException: "Invalid API key"
3. Exception caught
4. Returns ExecutionResult(success=false, error="Invalid API key")
5. Coordinator checks: attempt 3 <= maxRetries 3
6. Calculates delay: 240s
7. Sleeps 240 seconds
Attempt 4 (15:37:00):
1. JobExecutor::execute() called
2. Payment API throws ServerError: "Internal server error"
3. Exception caught
4. Returns ExecutionResult(success=false, error="Internal server error")
5. Coordinator checks: attempt 4 > maxRetries 3
6. Sets finalFailure = true
7. Sends failure notification email
8. Returns LifecycleOutcome(finalFailure=true)
Finalization:
1. RequeueHelper::finalize() called with success=false
2. Checks: shouldRequeue = false (retries exhausted)
3. Calls $removeFn($job, false) - remove from queue
4. Emits metrics:
- jobs_failed{queue=high_priority} +1
- jobs_failed_permanently{queue=high_priority} +1
5. Stores in Dead Letter Queue:
{
dlq_reason: "Max retries exceeded",
dlq_attempts: 4,
original_queue: "high_priority",
original_error: "Internal server error"
}
Logged Records (4 entries):
[
{"attempt": 1, "error": "Connection refused", "duration": 0.5},
{"attempt": 2, "error": "Request timeout after 30s", "duration": 30.2},
{"attempt": 3, "error": "Invalid API key", "duration": 0.3},
{"attempt": 4, "error": "Internal server error", "duration": 1.2}
]
Exception Types & Handling
1. JobException (System Exceptions)
Thrown for configuration/validation errors:
JobException::forInvalidJob('unknown_handler');
JobException::forShellCommandNotAllowed('rm -rf /');
JobException::forJobTimeout('long_task', 300);
JobException::forRateLimitExceeded('high_priority', 100);
Handling: Treated as permanent failures (no retry by default).
2. RuntimeException (Job Logic Errors)
throw new RuntimeException('User not found: ID 12345');
Handling: Retried according to maxRetries configuration.
3. Throwable (All Errors)
Catches everything:
ExceptionErrorTypeErrorValueErrorDivisionByZeroErrorCustom exceptions
// Even fatal errors are caught
function handle($payload) {
return 1 / 0; // DivisionByZeroError
}
Handling: All caught and converted to ExecutionResult(success=false).
Best Practices
1. Throw Meaningful Exceptions
// ❌ BAD
throw new Exception('Error');
// ✅ GOOD
throw new RuntimeException('Failed to import user ID 12345: Email validation failed');
2. Use Specific Exception Types
// For validation errors
throw new InvalidArgumentException('Payload missing required field: user_id');
// For external service failures
throw new RuntimeException('Stripe API returned 503 Service Unavailable');
// For business logic violations
throw new DomainException('Cannot process payment: insufficient balance');
3. Clean Up Resources
public function handle($payload)
{
$file = fopen('temp.csv', 'w');
try {
// Process data
fwrite($file, $data);
return 'Success';
} finally {
// Always close file, even on exception
fclose($file);
}
}
4. Configure Appropriate Retries
// Quick operations: few retries
$emailJob = (new Job('command', 'send:email'))->maxRetries(2);
// External API calls: more retries
$apiJob = (new Job('command', 'sync:stripe'))->maxRetries(5);
// Critical operations: no retries (fail fast)
$paymentJob = (new Job('command', 'charge:card'))->maxRetries(0);
5. Use Callbacks for Error Handling
$mainJob = (new Job('command', 'import:data'))
->maxRetries(3)
->catch(
(new Job('command', 'send:error:alert'))
->enqueue('notifications')
);
Troubleshooting
Exception Not Logged
Symptom: Exception thrown but no log entry.
Causes:
logPerformance = falsein configLog handler misconfigured
Exception thrown before JobExecutor (e.g., in queue worker loop)
Solution: Verify Config\Jobs::$logPerformance = true.
Job Retries Infinitely
Symptom: Job keeps retrying forever.
Causes:
maxRetriesnot set (defaults to 0, but requeue logic might differ)Database backend not persisting attempt count
Solution: Explicitly set maxRetries:
$job->maxRetries(3);
Exception Message Truncated
Symptom: Error message cut off in logs.
Cause: maxOutputLength config limit.
Solution: Increase limit or set null:
public ?int $maxOutputLength = null; // Unlimited
Worker Crashes on Exception
Symptom: Queue worker stops processing.
Causes:
Exception thrown outside job execution (worker loop itself)
Out of memory error (not caught)
SIGTERM/SIGKILL signal
Solution: Use process supervisor (systemd, supervisord):
[program:jobs-worker]
command=php spark jobs:queue:run
autorestart=true
stderr_logfile=/var/log/jobs-worker.err.log
Monitoring Exception Rates
Using Metrics
// Get failure rate
$failed = $metrics->getValue('jobs_failed');
$succeeded = $metrics->getValue('jobs_succeeded');
$failureRate = ($failed / ($failed + $succeeded)) * 100;
Using Health Check
php spark jobs:health --json | jq '.queues.default.last_24h.failure_rate'
# Output: 7.2
Alerting Rules (Prometheus)
groups:
- name: jobs
rules:
- alert: HighJobFailureRate
expr: |
(
rate(jobs_failed[5m]) /
rate(jobs_succeeded[5m] + jobs_failed[5m])
) > 0.1
for: 10m
annotations:
summary: "Job failure rate above 10% for 10 minutes"
Summary
Exception Handling Guarantees:
✅ No Crashes: All exceptions caught at execution layer
✅ Automatic Retries: Configurable retry policy with backoff
✅ Audit Trail: Every exception logged with full context
✅ Metrics: Failure rates tracked in real-time
✅ Notifications: Email alerts on failures
✅ Forensics: Dead Letter Queue preserves failed jobs
✅ Security: Exception messages sanitized before logging
✅ Observability: Health checks expose failure statistics
For Developers:
Throw exceptions freely in job handlers
System handles retries automatically
Focus on business logic, not error handling boilerplate
For Operators:
Monitor via
jobs:healthcommandReview DLQ for systematic issues
Adjust retry policies based on failure patterns