Enhanced Features
This document describes the advanced security, performance, and operational features added to the Jobs system.
Updated for v1.0.x → v2.0-alpha — see CHANGELOG for the per-release timeline. Features that gained significant changes in a specific release are tagged inline (e.g. v1.1+).
Security Enhancements
1. Shell Command Whitelisting (v1.1+ realpath)
Problem: escapeshellarg() neutralises shell metacharacters but the legacy basename-based whitelist allowed /tmp/echo to impersonate the /usr/bin/echo you actually trusted.
Solution: in v1.1+ entries with a path separator are matched against realpath() of the candidate so the resolved binary must match exactly. Bare-name entries continue to work via the legacy basename match with a deprecation log so existing installs do not break (this fallback will be removed in v2.0).
Configuration:
// Recommended: absolute paths, matched via realpath()
public array $allowedShellCommands = ['/usr/bin/ls', '/usr/bin/grep', '/usr/bin/cat'];
// Legacy (still works, emits deprecation log)
public array $allowedShellCommands = ['ls', 'grep', 'cat'];
Behavior:
Empty array (default): all commands allowed (backward compatible)
Entries with
/or\are resolved withrealpath()and compared against the resolved candidate./tmp/echois rejected even if/usr/bin/echois whitelisted.Bare names use the legacy basename match and emit a
warninglog.Throws
JobException::forShellCommandNotAllowed($command)on violation.
Example:
use Daycry\Jobs\Job;
// Allowed when /usr/bin/ls is whitelisted
$job = new Job('shell', '/usr/bin/ls -la');
// Rejected: /tmp/ls is not the whitelisted /usr/bin/ls
$job = new Job('shell', '/tmp/ls -la'); // throws JobException
2. Smart Token Pattern Detection
Problem: Credential leaks in logs even when key names are unknown.
Solution: Pattern-based detection and masking of sensitive data.
Detected Patterns:
Pattern |
Detection Rule |
Masked As |
|---|---|---|
JWT Tokens |
|
|
API Keys |
Alphanumeric strings ≥32 characters |
|
Bearer Tokens |
|
|
Implementation:
Automatic in JobLogger::sanitizeTokenPatterns() - applied to:
Job payload
Execution output
Error messages
Example:
// Before logging
$data = [
'auth' => 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.dozjgNryP4J3jVmNHl0w5N_XgL0n3I9PlFUP0THsR8U',
'api_key' => 'sk_live_51H9X2sK3Zq8N9Y8R6T4V1C2B5N3M7P9Q4W6E8D1F3G5H7J2K4L6M8N0P2Q4R6S8T0'
];
// After sanitization
$data = [
'auth' => 'Bearer ***TOKEN***',
'api_key' => '***API_KEY***'
];
Regex Patterns:
'/[A-Za-z0-9_-]{32,}/' // API keys (≥32 chars)
'/\b\w+\.\w+\.\w+\b/' // JWT tokens
'/Bearer\s+[A-Za-z0-9_-]+/' // Bearer tokens
Performance Enhancements
3. Configuration Caching
Problem: config('Jobs') called repeatedly in high-throughput scenarios causes overhead.
Solution: Singleton-based in-memory config cache.
Usage:
use Daycry\Jobs\Libraries\ConfigCache;
// Get cached config (subsequent calls use cache)
$config = ConfigCache::get();
// For testing: clear cache
ConfigCache::clear();
// For testing: set custom config
ConfigCache::set($mockConfig);
Implementation Notes:
Used in helpers and services (non-coordinator code)
JobLifecycleCoordinatorusesconfig()directly to respect test modificationsAutomatic instantiation on first call
Thread-safe (single-process context)
Performance Impact:
Reduces
config()calls by ~95% in steady stateNegligible memory overhead (~1KB per config instance)
4. Rate Limiting
Problem: Queue overload can exhaust resources (CPU, memory, database connections).
Solution: Per-queue rate limiting with token bucket algorithm.
Configuration:
// app/Config/Jobs.php
public array $queueRateLimits = [
'high_priority' => 100, // Max 100 jobs/minute
'default' => 50, // Max 50 jobs/minute
'background' => 20, // Max 20 jobs/minute
];
Behavior:
0 or missing: No limit (backward compatible)
Cache-based token bucket (tracks usage per minute)
Worker skips processing cycle when limit exceeded
Resets automatically each minute
API:
use Daycry\Jobs\Libraries\RateLimiter;
$limiter = new RateLimiter();
// Check if allowed (returns bool)
if ($limiter->allow('default', 50)) {
// Process job
}
// Throw exception if exceeded
try {
$limiter->throttle('default', 50);
} catch (JobException $e) {
// Rate limit exceeded
}
// Get current usage
$usage = $limiter->getUsage('default'); // e.g., 23
// Reset counter (for testing)
$limiter->reset('default');
Integration:
The queue worker (jobs:queue:run) automatically checks rate limits before processing:
// Automatic in QueueRunCommand
if (!$rateLimiter->allow($queue, $maxPerMinute)) {
CLI::write("[Rate Limited] Skipping cycle...", 'yellow');
sleep($sleep);
continue;
}
Cache Keys:
Format:
job_rate_limit:{queue_name}:{minute}TTL: 60 seconds
Example:
job_rate_limit:default:1737339240
Reliability Enhancements
5. Dead Letter Queue (DLQ)
Problem: Permanently failed jobs disappear from queue, making root cause analysis difficult.
Solution: Automatic routing to dedicated “dead letter” queue.
Configuration:
// app/Config/Jobs.php
public ?string $deadLetterQueue = 'failed_jobs';
Behavior:
Activated when
deadLetterQueueis set (null = disabled)Jobs exceeding max retries are moved to DLQ instead of deleted
Original queue data preserved
Metadata added automatically
Metadata Added:
Field |
Description |
Example |
|---|---|---|
|
Why job moved to DLQ |
“Max retries exceeded” |
|
When moved (ISO 8601) |
“2026-01-19T15:30:00Z” |
|
Total attempts before failure |
5 |
|
Source queue name |
“high_priority” |
API (v1.0.3+ — store() returns bool):
use Daycry\Jobs\Libraries\DeadLetterQueue;
$dlq = new DeadLetterQueue();
// Store failed job (automatic in RequeueHelper). Returns true on success,
// false when the DLQ is unconfigured or the underlying push failed.
$stored = $dlq->store($job, 'Max retries exceeded', 5);
if (! $stored) {
// RequeueHelper emits jobs_dlq_failed for you in this case.
log_message('alert', 'DLQ not available — investigate before more jobs exhaust retries.');
}
// Get statistics
$stats = $dlq->getStats();
Integration:
RequeueHelper::finalize() calls store() before clearing the origin queue (v1.0.3+ ordering) and emits jobs_dlq_failed if the DLQ rejected the message:
// In RequeueHelper::finalize()
if (! $success && ! $willRetry) {
$stored = $this->dlq->store($job, 'Max retries exceeded', $currentAttempt);
$removeFn($job, false);
$this->metrics?->increment('jobs_failed');
$this->metrics?->increment('jobs_failed_permanently');
if (! $stored) {
$this->metrics?->increment('jobs_dlq_failed');
}
}
Use Cases:
Forensic analysis of failed jobs
Identifying systematic issues (patterns in failures)
Manual retry after fixing root cause
Compliance/audit requirements
6. Job Timeout Protection
Problem: Runaway jobs can block queue workers indefinitely.
Solution: Hard timeout enforcement at execution level.
Configuration:
// app/Config/Jobs.php
public int $jobTimeout = 300; // 5 minutes
Behavior:
0 = disabled (backward compatible)
Uses
pcntl_alarm()for signal-based timeout (hard kill). v1.2 enablespcntl_async_signals(true)so SIGALRM interrupts CPU-bound code without waiting for a syscall, and restores the previous SIGALRM handler so successive jobs in the same worker process do not inherit our handler.Falls back to a post-execute time check if
pcntlis unavailable (Windows/FPM).Emits the
jobs_timed_outmetric on either path (v1.0.3+).Throws
JobException::forJobTimeout($jobName, $timeout).
Implementation Modes:
Mode |
Requirement |
Enforcement |
|---|---|---|
Hard Timeout |
|
Signal kills process after timeout (works on CPU-bound code in v1.2+ thanks to async signals) |
Soft Timeout |
Fallback (no |
Time check + warning log |
Example:
// Job exceeding 300s timeout
try {
$coordinator->run($job, 'queue');
} catch (JobException $e) {
// "Job 'data_import' exceeded timeout of 300 seconds"
}
pcntl_alarm() Flow:
Register signal handler (
SIGALRM→ throws exception)Set alarm for
$timeoutsecondsExecute job
Cancel alarm on completion
If timeout: signal fires, exception thrown, execution halted
Fallback Flow (no pcntl):
Record start time
Execute job
Check elapsed time after execution
Log warning if exceeded (soft enforcement)
Per-Job Override:
// Override global timeout for specific job
$job = (new Job('command', 'long-running-import'))
->timeout(900); // 15 minutes
Operational Enhancements
7. Fluent Job Chaining
Enhancement: Simplified callback API with semantic methods.
New Methods:
Method |
Filter |
Queued? |
Description |
|---|---|---|---|
|
|
Yes |
Execute after successful completion |
|
|
Yes |
Execute on failure |
|
|
Yes |
Always execute |
|
Sequential |
Yes |
Execute jobs in order |
Basic Example:
use Daycry\Jobs\Job;
$processPayment = new Job('command', 'process:payment');
$sendInvoice = new Job('command', 'send:invoice');
$notifyAdmin = new Job('command', 'notify:admin');
$processPayment
->then($sendInvoice) // On success
->catch($notifyAdmin) // On failure
->push();
Chain Multiple Jobs:
$job->chain([
new Job('command', 'validate:data'),
new Job('command', 'transform:data'),
new Job('command', 'store:data'),
])->push();
Comparison with setCallbackJob():
// Old verbose syntax
$job->setCallbackJob(function(Job $parent) {
return (new Job('command', 'cleanup'))->enqueue('default');
}, [
'on' => 'success',
'inherit' => ['output', 'error'],
'allowChain' => true
]);
// New fluent syntax
$job->then(
(new Job('command', 'cleanup'))->enqueue('default')
);
Behind the Scenes:
then()→ callssetCallbackJob()withfilter='success'andallowChain=truecatch()→ callssetCallbackJob()withfilter='failure'finally()→ callssetCallbackJob()withfilter='always'chain()→ wraps multiple jobs with sequential execution logic
8. Health Check Command
Feature: Comprehensive system health monitoring.
Command:
php spark jobs:health [--json] [--queue=NAME]
Options:
--json: Output in JSON format (machine-readable)--queue=NAME: Show stats for specific queue only
Output Sections:
1. Configuration
Displays current system settings:
Retry strategy (none/fixed/exponential)
Job timeout
Dead letter queue name
Rate limits per queue
2. Queue Status
Per-queue statistics:
Pending jobs (waiting to execute)
Processing jobs (currently running)
Completed jobs (successful)
Failed jobs (permanent failures)
3. Rate Limit Usage
Current usage vs. configured limits:
Current: 23/50 (46% capacity)
Visual representation in table format
4. Last 24 Hours Metrics
Rolling window statistics:
Total executions
Success rate (%)
Failure rate (%)
Average duration (seconds)
Example Output (Table):
=== Jobs System Health Check ===
Configuration:
Retry Strategy: exponential (base: 60s, multiplier: 2.0, max: 3600s)
Job Timeout: 300 seconds
Dead Letter Queue: failed_jobs
Rate Limits: default=50/min, high_priority=100/min
Queue: default
Status:
Pending: 42
Processing: 3
Completed: 1,245
Failed: 12
Rate Limit: 23/50 (46% used)
Last 24h:
Executions: 156
Success Rate: 92.3%
Failure Rate: 7.7%
Avg Duration: 2.45s
Queue: high_priority
Status:
Pending: 8
Processing: 1
Completed: 567
Failed: 3
Rate Limit: 78/100 (78% used)
Last 24h:
Executions: 89
Success Rate: 96.6%
Avg Duration: 1.12s
Example Output (JSON):
{
"config": {
"retry_strategy": "exponential",
"retry_base": 60,
"retry_multiplier": 2.0,
"retry_max": 3600,
"job_timeout": 300,
"dead_letter_queue": "failed_jobs",
"rate_limits": {
"default": 50,
"high_priority": 100
}
},
"queues": {
"default": {
"status": {
"pending": 42,
"processing": 3,
"completed": 1245,
"failed": 12
},
"rate_limit": {
"current": 23,
"max": 50,
"percentage": 46
},
"last_24h": {
"executions": 156,
"success_rate": 92.3,
"failure_rate": 7.7,
"avg_duration_seconds": 2.45
}
}
}
}
Use Cases:
Monitoring Dashboards: JSON output to Prometheus/Grafana/Datadog
Operational Health Checks: Quick status overview for on-call engineers
Capacity Planning: Identify rate limit saturation before issues occur
Forensics: Track failure patterns and bottlenecks
CI/CD: Validate queue health in deployment pipelines
Integration Example (Prometheus Exporter):
// Custom metrics endpoint
public function metrics()
{
exec('php spark jobs:health --json', $output);
$data = json_decode(implode('', $output), true);
foreach ($data['queues'] as $queue => $stats) {
echo "jobs_pending{queue=\"$queue\"} {$stats['status']['pending']}\n";
echo "jobs_processing{queue=\"$queue\"} {$stats['status']['processing']}\n";
echo "jobs_success_rate{queue=\"$queue\"} {$stats['last_24h']['success_rate']}\n";
}
}
Configuration Summary
All new features are opt-in and backward compatible:
// app/Config/Jobs.php
// Security
public array $allowedShellCommands = []; // Empty = allow all (default)
// Performance
public array $queueRateLimits = []; // Empty = no limits (default)
// Reliability
public ?string $deadLetterQueue = null; // null = disabled (default)
public int $jobTimeout = 300; // 0 = disabled
// Future
public int $batchSize = 1; // Reserved for batch processing
Enabling Everything:
public array $allowedShellCommands = ['ls', 'cat', 'grep'];
public array $queueRateLimits = ['default' => 50, 'high' => 100];
public ?string $deadLetterQueue = 'failed_jobs';
public int $jobTimeout = 300;
Backward Compatibility
All enhancements maintain 100% backward compatibility:
Feature |
Default Behavior |
Migration Required |
|---|---|---|
Shell Whitelist |
All commands allowed |
No |
Token Detection |
Auto-enabled in logger |
No |
Config Caching |
Transparent |
No |
Rate Limiting |
Disabled (no limits) |
No |
Dead Letter Queue |
Disabled |
No |
Job Timeout |
Disabled |
No |
Fluent Chaining |
Alternative to |
No |
Health Check |
New command |
No |
Zero Breaking Changes: Existing code continues working without modification.
Testing
All enhancements include comprehensive test coverage:
# Run full test suite
composer test
# Verify all tests pass
# Expected: 96 tests, 318 assertions, 4 skipped
Test Coverage:
RateLimiterTest: Token bucket algorithm, reset, edge casesConfigCacheTest: Singleton behavior, clear/set operationsDeadLetterQueueTest: Storage, statistics, metadataShellJobTest: Whitelist validation, exception handlingJobLoggerTest: Token pattern detection, maskingJobLifecycleCoordinatorTest: Timeout enforcement, retry integrationCallbackTraitTest: Fluent chaining APIHealthCheckCommandTest: Output formats, queue filtering
Performance Impact
Benchmarks on typical workload (1000 jobs):
Feature |
Overhead |
Notes |
|---|---|---|
Shell Whitelist |
<0.1ms/job |
Simple array check |
Token Detection |
~0.5ms/job |
3 regex patterns |
Config Caching |
-15% total time |
Reduces config() calls by 95% |
Rate Limiting |
~0.2ms/job |
Cache read + increment |
Dead Letter Queue |
~5ms/failed job |
Only on permanent failure |
Job Timeout |
<0.1ms/job |
Signal setup overhead |
Net Impact: ~10-15% performance improvement in high-throughput scenarios (due to config caching).
Migration Guide
From Basic to Enhanced Setup
Step 1: Enable security features
public array $allowedShellCommands = [
'ls', 'cat', 'grep', 'find', 'awk', 'sed'
];
Step 2: Configure rate limits
public array $queueRateLimits = [
'default' => 100,
'high_priority' => 200,
'background' => 50,
];
Step 3: Enable DLQ
public ?string $deadLetterQueue = 'failed_jobs';
Step 4: Set timeout
public int $jobTimeout = 300; // 5 minutes
Step 5: Update job code to use fluent API (optional)
// Before
$job->setCallbackJob(function($p) {
return (new Job('command', 'notify'))->enqueue('default');
}, ['on' => 'success']);
// After
$job->then((new Job('command', 'notify'))->enqueue('default'));
Step 6: Set up monitoring
# Add to cron
*/5 * * * * cd /app && php spark jobs:health --json > /var/log/jobs-health.json
Troubleshooting
Shell Commands Rejected
Symptom: JobException::forShellCommandNotAllowed()
Solution: Add command to whitelist or set $allowedShellCommands = []
Rate Limit Issues
Symptom: Jobs queued but not processing
Solution: Increase limit in $queueRateLimits or check cache backend
Timeout False Positives
Symptom: Jobs killed before completion
Solution: Increase $jobTimeout or set per-job override with ->timeout(900)
DLQ Not Storing
Symptom: Failed jobs disappear
Solution: Verify $deadLetterQueue is set and queue exists in $queues
Health Check Empty
Symptom: No data in jobs:health output
Solution: Ensure $logPerformance = true for metrics collection
Future Enhancements
Planned features leveraging this foundation:
Batch Processing: Use
$batchSizefor efficient bulk operationsPriority Queues: Enhanced priority handling across backends
Job Clustering: Distributed locking for multi-server deployments
Telemetry: OpenTelemetry integration for distributed tracing
Web Dashboard: Real-time monitoring UI using
jobs:healthJSON API