我有一个公牛队列运行冗长的视频上传作业,这可能需要任何时间从<1分钟至多分钟。
作业在默认的30秒后停止运行,所以我将超时时间增加到几分钟,但这一点并没有得到遵守。如果我将超时设置为10 is,它将立即停止,因此它将考虑超时。
Job {
opts: {
attempts: 1,
timeout: 600000,
delay: 0,
timestamp: 1634753060062,
backoff: undefined
},
...
}尽管超时,我正在接收一个stalled事件,并且作业再次开始处理。
编辑:我认为“拖延”和超时是一样的,但很明显,对于被搁置的工作,公牛检查的频率有一个单独的超时时间。换句话说,真正的问题是为什么工作被认为是“停滞”,即使他们在忙着做上传。
发布于 2022-11-09 13:05:26
问题似乎是由于您正在运行的操作阻塞了事件循环而导致您的作业延迟。您可以将您的代码转换为非阻塞代码,并以这种方式解决问题。
尽管如此,在启动队列时,可以在队列设置中设置停滞的间隔检查(更多的是快速解决方案):
const queue = new Bull('queue', {
port: 6379,
host: 'localhost',
db: 0,
settings: {
stalledInterval: 60 * 60 * 1000, // change default from 30 sec to 1 hour, set 0 for disabling the stalled interval
},
})根据公牛的医生:
error
增加stalledInterval (或通过将其设置为0来禁用它)将删除确保事件循环正在运行的检查,从而强制系统忽略失速状态。
同样也适用于医生:
When a worker is processing a job it will keep the job "locked" so other workers can't process it.
It's important to understand how locking works to prevent your jobs from losing their lock - becoming _stalled_ -
and being restarted as a result. Locking is implemented internally by creating a lock for `lockDuration` on interval
`lockRenewTime` (which is usually half `lockDuration`). If `lockDuration` elapses before the lock can be renewed,
the job will be considered stalled and is automatically restarted; it will be __double processed__. This can happen when:
1. The Node process running your job processor unexpectedly terminates.
2. Your job processor was too CPU-intensive and stalled the Node event loop, and as a result, Bull couldn't renew the job lock (see [#488](https://github.com/OptimalBits/bull/issues/488) for how we might better detect this). You can fix this by breaking your job processor into smaller parts so that no single part can block the Node event loop. Alternatively, you can pass a larger value for the `lockDuration` setting (with the tradeoff being that it will take longer to recognize a real stalled job).
As such, you should always listen for the `stalled` event and log this to your error monitoring system, as this means your jobs are likely getting double-processed.
As a safeguard so problematic jobs won't get restarted indefinitely (e.g. if the job processor always crashes its Node process), jobs will be recovered from a stalled state a maximum of `maxStalledCount` times (default: `1`).https://stackoverflow.com/questions/69651175
复制相似问题