在运行Sidekiq时,我看到大量作业失败,这些作业都与我的Mongo数据库的连接问题有关。我对这台机器进行了大量负载的压力测试,所以我在失败的情况下,用5秒钟的时间重新尝试了18,000多个任务。有些作业(我猜是那些工作人员能够成功地检索连接线程的作业)处理得很好。我还有很多其他人都犯了这样的错误:
2013-12-26T19:25:35Z 13447 TID-m2biw WARN: {"retry"=>true, "queue"=>"default",
"class"=>"ParseFileWorker", "args"=>[{"filename"=>"/opt/media/AE-67452_36.XML"}],
"jid"=>"5d6c48fa94ed8c8da1b226fc", "enqueued_at"=>1388084903.6113343,
"error_message"=>"Could not connect to a primary node for replica set #
<Moped::Cluster:44552140 @seeds=[<Moped::Node resolved_address=\"127.0.0.1:27017\">]>",
"error_class"=>"Moped::Errors::ConnectionFailure", "failed_at"=>"2013-12-26T19:16:25Z",
"retry_count"=>3, "retried_at"=>2013-12-26 19:25:35 UTC}还有一些超时错误,如以下所示:
Timeout::Error: Waited 0.5 sec注意,我正在运行Rails 4,其中包含从https://github.com/mongoid/mongoid主分支签出的Mongoid代码。据我所读,在完成Sidekiq作业时,Mongoid的早期版本需要显式关闭连接。蒙哥大4号应该自动做到这一点。我不确定它是不是在做。当作业排队太快时,问题似乎有两倍:连接不可用或正在超时。一些工人成功地打开了连接。有些作业必须等到重试才能解析。
我不完全确定这里的最佳解决方案是什么。如果我把工作排得慢一些,一切都会好起来的。不幸的是,这并不是应用程序在现实世界中运行的方式。
我如何确保我的侧翼作业得到一个摩托/蒙哥德连接他们需要完成工作,或等到一个可用的处理?因为这个问题而看到了大量的例外/错误。下面是常见的重复错误的堆栈跟踪:
2013-12-26T20:56:58Z 1317 TID-i70ms ParseFileWorker JID-0fc375d8fd9707e7d3ec3238 INFO: fail: 1.507 sec
2013-12-26T20:56:58Z 1317 TID-i70ms WARN: {"retry"=>true, "queue"=>"default", "class"=>"ParseFileWorker", "args"=>[{"filename"=>"/opt/media/AC-19269_287.XML"}], "jid"=>"0fc375d8fd9707e7d3ec3238", "enqueued_at"=>1388091410.0421875, "error_message"=>"Waited 0.5 sec", "error_class"=>"Timeout::Error", "failed_at"=>2013-12-26 20:56:58 UTC, "retry_count"=>0}
2013-12-26T20:56:58Z 1317 TID-i70ms WARN: Waited 0.5 sec
2013-12-26T20:56:58Z 1317 TID-i70ms WARN: /media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool/timed_stack.rb:35:in `block (2 levels) in pop'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool/timed_stack.rb:31:in `loop'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool/timed_stack.rb:31:in `block in pop'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool/timed_stack.rb:30:in `synchronize'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool/timed_stack.rb:30:in `pop'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool.rb:66:in `checkout'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/connection_pool-1.2.0/lib/connection_pool.rb:53:in `with'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/node.rb:114:in `connection'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/node.rb:141:in `disconnect'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/node.rb:156:in `down!'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/node.rb:442:in `rescue in refresh'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/node.rb:431:in `refresh'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/cluster.rb:182:in `block in refresh'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/cluster.rb:194:in `each'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/cluster.rb:194:in `refresh'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/cluster.rb:151:in `nodes'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/cluster.rb:240:in `with_primary'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/read_preference/primary.rb:55:in `block in with_node'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/read_preference/selectable.rb:65:in `call'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/read_preference/selectable.rb:65:in `with_retry'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/read_preference/primary.rb:54:in `with_node'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/gems/moped-2.0.0.beta4/lib/moped/query.rb:127:in `first'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual/mongo.rb:201:in `block (2 levels) in first'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual/mongo.rb:537:in `with_sorting'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual/mongo.rb:200:in `block in first'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual/mongo.rb:449:in `try_cache'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual/mongo.rb:199:in `first'
/media/apps/tsn-parser-server/vendor/bundle/ruby/2.0.0/bundler/gems/mongoid-fc589421bf8b/lib/mongoid/contextual.rb:19:in `first'
/media/apps/tsn-parser-server/app/models/parser/core.rb:88:in `prepare_parse'**更新**
如果我将sidekiq工作人员的数量减少到大约10个左右,问题就不会发生--仅仅是因为只有10个线程正在打开一个连接到MongoDB的工作人员较少。必须有一种方法来为Mongoid/Moped使用一个连接池。由于Rails负责建立和拆卸连接,所以我不知道如何实现这一点。
**更新2 **
当我把它指向我们的生产服务器时,情况会恶化10倍。几乎每一个线程都是一个Timeout::Error。有些人通过了,但一点也不多。这太可怕了。这个问题至少可以通过限制当MongoDB是本地的10名工人来“解决”。但这不是一个现实的用例,我需要能够连接到一个单独的机器上的数据库。
发布于 2013-12-29 08:17:28
首先要检查的是DB端的mongod连接限制和OS限制,因为它们很好,可能连接池大小可能是限制因素。
事实证明,对于你的大量工人来说,mongoid connection pool size default was 5是相当低的。提高这些as shown in an example here将允许更多的工人同时获得连接。
https://stackoverflow.com/questions/20790295
复制相似问题