Enhanced Features

This document describes the advanced security, performance, and operational features added to the Jobs system.

Updated for v1.0.x → v2.0-alpha — see CHANGELOG for the per-release timeline. Features that gained significant changes in a specific release are tagged inline (e.g. v1.1+).

Security Enhancements

1. Shell Command Whitelisting (v1.1+ realpath)

Problem: escapeshellarg() neutralises shell metacharacters but the legacy basename-based whitelist allowed /tmp/echo to impersonate the /usr/bin/echo you actually trusted.

Solution: in v1.1+ entries with a path separator are matched against realpath() of the candidate so the resolved binary must match exactly. Bare-name entries continue to work via the legacy basename match with a deprecation log so existing installs do not break (this fallback will be removed in v2.0).

Configuration:

// Recommended: absolute paths, matched via realpath()
public array $allowedShellCommands = ['/usr/bin/ls', '/usr/bin/grep', '/usr/bin/cat'];

// Legacy (still works, emits deprecation log)
public array $allowedShellCommands = ['ls', 'grep', 'cat'];

Behavior:

  • Empty array (default): all commands allowed (backward compatible)

  • Entries with / or \ are resolved with realpath() and compared against the resolved candidate. /tmp/echo is rejected even if /usr/bin/echo is whitelisted.

  • Bare names use the legacy basename match and emit a warning log.

  • Throws JobException::forShellCommandNotAllowed($command) on violation.

Example:

use Daycry\Jobs\Job;

// Allowed when /usr/bin/ls is whitelisted
$job = new Job('shell', '/usr/bin/ls -la');

// Rejected: /tmp/ls is not the whitelisted /usr/bin/ls
$job = new Job('shell', '/tmp/ls -la'); // throws JobException

2. Smart Token Pattern Detection

Problem: Credential leaks in logs even when key names are unknown.

Solution: Pattern-based detection and masking of sensitive data.

Detected Patterns:

Pattern

Detection Rule

Masked As

JWT Tokens

xxx.yyy.zzz format (3 parts separated by dots)

***JWT_TOKEN***

API Keys

Alphanumeric strings ≥32 characters

***API_KEY***

Bearer Tokens

Bearer <token> in headers/strings

Bearer ***TOKEN***

Implementation: Automatic in JobLogger::sanitizeTokenPatterns() - applied to:

  • Job payload

  • Execution output

  • Error messages

Example:

// Before logging
$data = [
    'auth' => 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.dozjgNryP4J3jVmNHl0w5N_XgL0n3I9PlFUP0THsR8U',
    'api_key' => 'sk_live_51H9X2sK3Zq8N9Y8R6T4V1C2B5N3M7P9Q4W6E8D1F3G5H7J2K4L6M8N0P2Q4R6S8T0'
];

// After sanitization
$data = [
    'auth' => 'Bearer ***TOKEN***',
    'api_key' => '***API_KEY***'
];

Regex Patterns:

'/[A-Za-z0-9_-]{32,}/'          // API keys (≥32 chars)
'/\b\w+\.\w+\.\w+\b/'           // JWT tokens
'/Bearer\s+[A-Za-z0-9_-]+/'    // Bearer tokens

Performance Enhancements

3. Configuration Caching

Problem: config('Jobs') called repeatedly in high-throughput scenarios causes overhead.

Solution: Singleton-based in-memory config cache.

Usage:

use Daycry\Jobs\Libraries\ConfigCache;

// Get cached config (subsequent calls use cache)
$config = ConfigCache::get();

// For testing: clear cache
ConfigCache::clear();

// For testing: set custom config
ConfigCache::set($mockConfig);

Implementation Notes:

  • Used in helpers and services (non-coordinator code)

  • JobLifecycleCoordinator uses config() directly to respect test modifications

  • Automatic instantiation on first call

  • Thread-safe (single-process context)

Performance Impact:

  • Reduces config() calls by ~95% in steady state

  • Negligible memory overhead (~1KB per config instance)


4. Rate Limiting

Problem: Queue overload can exhaust resources (CPU, memory, database connections).

Solution: Per-queue rate limiting with token bucket algorithm.

Configuration:

// app/Config/Jobs.php
public array $queueRateLimits = [
    'high_priority' => 100,  // Max 100 jobs/minute
    'default' => 50,         // Max 50 jobs/minute
    'background' => 20,      // Max 20 jobs/minute
];

Behavior:

  • 0 or missing: No limit (backward compatible)

  • Cache-based token bucket (tracks usage per minute)

  • Worker skips processing cycle when limit exceeded

  • Resets automatically each minute

API:

use Daycry\Jobs\Libraries\RateLimiter;

$limiter = new RateLimiter();

// Check if allowed (returns bool)
if ($limiter->allow('default', 50)) {
    // Process job
}

// Throw exception if exceeded
try {
    $limiter->throttle('default', 50);
} catch (JobException $e) {
    // Rate limit exceeded
}

// Get current usage
$usage = $limiter->getUsage('default'); // e.g., 23

// Reset counter (for testing)
$limiter->reset('default');

Integration: The queue worker (jobs:queue:run) automatically checks rate limits before processing:

// Automatic in QueueRunCommand
if (!$rateLimiter->allow($queue, $maxPerMinute)) {
    CLI::write("[Rate Limited] Skipping cycle...", 'yellow');
    sleep($sleep);
    continue;
}

Cache Keys:

  • Format: job_rate_limit:{queue_name}:{minute}

  • TTL: 60 seconds

  • Example: job_rate_limit:default:1737339240


Reliability Enhancements

5. Dead Letter Queue (DLQ)

Problem: Permanently failed jobs disappear from queue, making root cause analysis difficult.

Solution: Automatic routing to dedicated “dead letter” queue.

Configuration:

// app/Config/Jobs.php
public ?string $deadLetterQueue = 'failed_jobs';

Behavior:

  • Activated when deadLetterQueue is set (null = disabled)

  • Jobs exceeding max retries are moved to DLQ instead of deleted

  • Original queue data preserved

  • Metadata added automatically

Metadata Added:

Field

Description

Example

dlq_reason

Why job moved to DLQ

“Max retries exceeded”

dlq_timestamp

When moved (ISO 8601)

“2026-01-19T15:30:00Z”

dlq_attempts

Total attempts before failure

5

original_queue

Source queue name

“high_priority”

API (v1.0.3+ — store() returns bool):

use Daycry\Jobs\Libraries\DeadLetterQueue;

$dlq = new DeadLetterQueue();

// Store failed job (automatic in RequeueHelper). Returns true on success,
// false when the DLQ is unconfigured or the underlying push failed.
$stored = $dlq->store($job, 'Max retries exceeded', 5);

if (! $stored) {
    // RequeueHelper emits jobs_dlq_failed for you in this case.
    log_message('alert', 'DLQ not available — investigate before more jobs exhaust retries.');
}

// Get statistics
$stats = $dlq->getStats();

Integration: RequeueHelper::finalize() calls store() before clearing the origin queue (v1.0.3+ ordering) and emits jobs_dlq_failed if the DLQ rejected the message:

// In RequeueHelper::finalize()
if (! $success && ! $willRetry) {
    $stored = $this->dlq->store($job, 'Max retries exceeded', $currentAttempt);

    $removeFn($job, false);
    $this->metrics?->increment('jobs_failed');
    $this->metrics?->increment('jobs_failed_permanently');
    if (! $stored) {
        $this->metrics?->increment('jobs_dlq_failed');
    }
}

Use Cases:

  • Forensic analysis of failed jobs

  • Identifying systematic issues (patterns in failures)

  • Manual retry after fixing root cause

  • Compliance/audit requirements


6. Job Timeout Protection

Problem: Runaway jobs can block queue workers indefinitely.

Solution: Hard timeout enforcement at execution level.

Configuration:

// app/Config/Jobs.php
public int $jobTimeout = 300; // 5 minutes

Behavior:

  • 0 = disabled (backward compatible)

  • Uses pcntl_alarm() for signal-based timeout (hard kill). v1.2 enables pcntl_async_signals(true) so SIGALRM interrupts CPU-bound code without waiting for a syscall, and restores the previous SIGALRM handler so successive jobs in the same worker process do not inherit our handler.

  • Falls back to a post-execute time check if pcntl is unavailable (Windows/FPM).

  • Emits the jobs_timed_out metric on either path (v1.0.3+).

  • Throws JobException::forJobTimeout($jobName, $timeout).

Implementation Modes:

Mode

Requirement

Enforcement

Hard Timeout

pcntl extension

Signal kills process after timeout (works on CPU-bound code in v1.2+ thanks to async signals)

Soft Timeout

Fallback (no pcntl)

Time check + warning log

Example:

// Job exceeding 300s timeout
try {
    $coordinator->run($job, 'queue');
} catch (JobException $e) {
    // "Job 'data_import' exceeded timeout of 300 seconds"
}

pcntl_alarm() Flow:

  1. Register signal handler (SIGALRM → throws exception)

  2. Set alarm for $timeout seconds

  3. Execute job

  4. Cancel alarm on completion

  5. If timeout: signal fires, exception thrown, execution halted

Fallback Flow (no pcntl):

  1. Record start time

  2. Execute job

  3. Check elapsed time after execution

  4. Log warning if exceeded (soft enforcement)

Per-Job Override:

// Override global timeout for specific job
$job = (new Job('command', 'long-running-import'))
    ->timeout(900); // 15 minutes

Operational Enhancements

7. Fluent Job Chaining

Enhancement: Simplified callback API with semantic methods.

New Methods:

Method

Filter

Queued?

Description

then(Job $next)

success

Yes

Execute after successful completion

catch(Job $handler)

failure

Yes

Execute on failure

finally(Job $cleanup)

always

Yes

Always execute

chain(array $jobs)

Sequential

Yes

Execute jobs in order

Basic Example:

use Daycry\Jobs\Job;

$processPayment = new Job('command', 'process:payment');
$sendInvoice = new Job('command', 'send:invoice');
$notifyAdmin = new Job('command', 'notify:admin');

$processPayment
    ->then($sendInvoice)      // On success
    ->catch($notifyAdmin)     // On failure
    ->push();

Chain Multiple Jobs:

$job->chain([
    new Job('command', 'validate:data'),
    new Job('command', 'transform:data'),
    new Job('command', 'store:data'),
])->push();

Comparison with setCallbackJob():

// Old verbose syntax
$job->setCallbackJob(function(Job $parent) {
    return (new Job('command', 'cleanup'))->enqueue('default');
}, [
    'on' => 'success',
    'inherit' => ['output', 'error'],
    'allowChain' => true
]);

// New fluent syntax
$job->then(
    (new Job('command', 'cleanup'))->enqueue('default')
);

Behind the Scenes:

  • then() → calls setCallbackJob() with filter='success' and allowChain=true

  • catch() → calls setCallbackJob() with filter='failure'

  • finally() → calls setCallbackJob() with filter='always'

  • chain() → wraps multiple jobs with sequential execution logic


8. Health Check Command

Feature: Comprehensive system health monitoring.

Command:

php spark jobs:health [--json] [--queue=NAME]

Options:

  • --json: Output in JSON format (machine-readable)

  • --queue=NAME: Show stats for specific queue only

Output Sections:

1. Configuration

Displays current system settings:

  • Retry strategy (none/fixed/exponential)

  • Job timeout

  • Dead letter queue name

  • Rate limits per queue

2. Queue Status

Per-queue statistics:

  • Pending jobs (waiting to execute)

  • Processing jobs (currently running)

  • Completed jobs (successful)

  • Failed jobs (permanent failures)

3. Rate Limit Usage

Current usage vs. configured limits:

  • Current: 23/50 (46% capacity)

  • Visual representation in table format

4. Last 24 Hours Metrics

Rolling window statistics:

  • Total executions

  • Success rate (%)

  • Failure rate (%)

  • Average duration (seconds)

Example Output (Table):

=== Jobs System Health Check ===

Configuration:
  Retry Strategy: exponential (base: 60s, multiplier: 2.0, max: 3600s)
  Job Timeout: 300 seconds
  Dead Letter Queue: failed_jobs
  Rate Limits: default=50/min, high_priority=100/min

Queue: default
  Status:
    Pending: 42
    Processing: 3
    Completed: 1,245
    Failed: 12
  Rate Limit: 23/50 (46% used)
  Last 24h:
    Executions: 156
    Success Rate: 92.3%
    Failure Rate: 7.7%
    Avg Duration: 2.45s

Queue: high_priority
  Status:
    Pending: 8
    Processing: 1
    Completed: 567
    Failed: 3
  Rate Limit: 78/100 (78% used)
  Last 24h:
    Executions: 89
    Success Rate: 96.6%
    Avg Duration: 1.12s

Example Output (JSON):

{
  "config": {
    "retry_strategy": "exponential",
    "retry_base": 60,
    "retry_multiplier": 2.0,
    "retry_max": 3600,
    "job_timeout": 300,
    "dead_letter_queue": "failed_jobs",
    "rate_limits": {
      "default": 50,
      "high_priority": 100
    }
  },
  "queues": {
    "default": {
      "status": {
        "pending": 42,
        "processing": 3,
        "completed": 1245,
        "failed": 12
      },
      "rate_limit": {
        "current": 23,
        "max": 50,
        "percentage": 46
      },
      "last_24h": {
        "executions": 156,
        "success_rate": 92.3,
        "failure_rate": 7.7,
        "avg_duration_seconds": 2.45
      }
    }
  }
}

Use Cases:

  • Monitoring Dashboards: JSON output to Prometheus/Grafana/Datadog

  • Operational Health Checks: Quick status overview for on-call engineers

  • Capacity Planning: Identify rate limit saturation before issues occur

  • Forensics: Track failure patterns and bottlenecks

  • CI/CD: Validate queue health in deployment pipelines

Integration Example (Prometheus Exporter):

// Custom metrics endpoint
public function metrics()
{
    exec('php spark jobs:health --json', $output);
    $data = json_decode(implode('', $output), true);
    
    foreach ($data['queues'] as $queue => $stats) {
        echo "jobs_pending{queue=\"$queue\"} {$stats['status']['pending']}\n";
        echo "jobs_processing{queue=\"$queue\"} {$stats['status']['processing']}\n";
        echo "jobs_success_rate{queue=\"$queue\"} {$stats['last_24h']['success_rate']}\n";
    }
}

Configuration Summary

All new features are opt-in and backward compatible:

// app/Config/Jobs.php

// Security
public array $allowedShellCommands = [];  // Empty = allow all (default)

// Performance  
public array $queueRateLimits = [];       // Empty = no limits (default)

// Reliability
public ?string $deadLetterQueue = null;   // null = disabled (default)
public int $jobTimeout = 300;             // 0 = disabled

// Future
public int $batchSize = 1;                // Reserved for batch processing

Enabling Everything:

public array $allowedShellCommands = ['ls', 'cat', 'grep'];
public array $queueRateLimits = ['default' => 50, 'high' => 100];
public ?string $deadLetterQueue = 'failed_jobs';
public int $jobTimeout = 300;

Backward Compatibility

All enhancements maintain 100% backward compatibility:

Feature

Default Behavior

Migration Required

Shell Whitelist

All commands allowed

No

Token Detection

Auto-enabled in logger

No

Config Caching

Transparent

No

Rate Limiting

Disabled (no limits)

No

Dead Letter Queue

Disabled

No

Job Timeout

Disabled

No

Fluent Chaining

Alternative to setCallbackJob()

No

Health Check

New command

No

Zero Breaking Changes: Existing code continues working without modification.


Testing

All enhancements include comprehensive test coverage:

# Run full test suite
composer test

# Verify all tests pass
# Expected: 96 tests, 318 assertions, 4 skipped

Test Coverage:

  • RateLimiterTest: Token bucket algorithm, reset, edge cases

  • ConfigCacheTest: Singleton behavior, clear/set operations

  • DeadLetterQueueTest: Storage, statistics, metadata

  • ShellJobTest: Whitelist validation, exception handling

  • JobLoggerTest: Token pattern detection, masking

  • JobLifecycleCoordinatorTest: Timeout enforcement, retry integration

  • CallbackTraitTest: Fluent chaining API

  • HealthCheckCommandTest: Output formats, queue filtering


Performance Impact

Benchmarks on typical workload (1000 jobs):

Feature

Overhead

Notes

Shell Whitelist

<0.1ms/job

Simple array check

Token Detection

~0.5ms/job

3 regex patterns

Config Caching

-15% total time

Reduces config() calls by 95%

Rate Limiting

~0.2ms/job

Cache read + increment

Dead Letter Queue

~5ms/failed job

Only on permanent failure

Job Timeout

<0.1ms/job

Signal setup overhead

Net Impact: ~10-15% performance improvement in high-throughput scenarios (due to config caching).


Migration Guide

From Basic to Enhanced Setup

Step 1: Enable security features

public array $allowedShellCommands = [
    'ls', 'cat', 'grep', 'find', 'awk', 'sed'
];

Step 2: Configure rate limits

public array $queueRateLimits = [
    'default' => 100,
    'high_priority' => 200,
    'background' => 50,
];

Step 3: Enable DLQ

public ?string $deadLetterQueue = 'failed_jobs';

Step 4: Set timeout

public int $jobTimeout = 300; // 5 minutes

Step 5: Update job code to use fluent API (optional)

// Before
$job->setCallbackJob(function($p) {
    return (new Job('command', 'notify'))->enqueue('default');
}, ['on' => 'success']);

// After
$job->then((new Job('command', 'notify'))->enqueue('default'));

Step 6: Set up monitoring

# Add to cron
*/5 * * * * cd /app && php spark jobs:health --json > /var/log/jobs-health.json

Troubleshooting

Shell Commands Rejected

Symptom: JobException::forShellCommandNotAllowed() Solution: Add command to whitelist or set $allowedShellCommands = []

Rate Limit Issues

Symptom: Jobs queued but not processing Solution: Increase limit in $queueRateLimits or check cache backend

Timeout False Positives

Symptom: Jobs killed before completion Solution: Increase $jobTimeout or set per-job override with ->timeout(900)

DLQ Not Storing

Symptom: Failed jobs disappear Solution: Verify $deadLetterQueue is set and queue exists in $queues

Health Check Empty

Symptom: No data in jobs:health output Solution: Ensure $logPerformance = true for metrics collection


Future Enhancements

Planned features leveraging this foundation:

  1. Batch Processing: Use $batchSize for efficient bulk operations

  2. Priority Queues: Enhanced priority handling across backends

  3. Job Clustering: Distributed locking for multi-server deployments

  4. Telemetry: OpenTelemetry integration for distributed tracing

  5. Web Dashboard: Real-time monitoring UI using jobs:health JSON API