我使用
BrB在Ruby 1.9中为各种工作进程共享一个数据源,我用Process#fork来分支,如下所示:
- Thread.abort_on_exception = true
- fork do
- puts "Initializing data source process... (PID: #{Process.pid})"
- data = DataSource.new(files)
- BrB::Service.start_service(:object => data,:verbose => false,:host => host,:port => port)
- EM.reactor_thread.join
- end
工人分叉如下:
- 8.times do |t|
- fork do
- data = BrB::Tunnel.create(nil,"brb://#{host}:#{port}",:verbose => false)
- puts "Launching #{threads_num} worker threads... (PID: #{Process.pid})"
- threads = []
- threads_num.times { |i|
- threads << Thread.new {
- while true
- begin
- worker = Worker.new(data,config)
- rescue OutOfTargetsError
- break
- rescue Exception => e
- puts "An unexpected exception was caught: #{e.class} => #{e}"
- sleep 5
- end
- end
- }
- }
- threads.each { |t| t.join }
- data.stop_service
- EM.stop
- end
- end
这样做非常完美,但运行大约10分钟后,我会收到以下错误信息:
- bootstrap.rb:47:in `join': deadlock detected (fatal)
- from bootstrap.rb:47:in `block in '
- from bootstrap.rb:39:in `fork'
- from bootstrap.rb:39:in `'
现在这个错误并没有告诉我关于死锁实际发生的地方,它只是指向我在EventMachine线程上的连接.
如何追溯程序锁定的时间?
解决方法
它锁定在父线程中加入,该信息是准确的.
要跟踪它在子线程中锁定的位置,请尝试将线程的工作包在 timeout block中.您需要临时删除超时异常的catch-all rescue.
要跟踪它在子线程中锁定的位置,请尝试将线程的工作包在 timeout block中.您需要临时删除超时异常的catch-all rescue.
目前,父线程尝试按顺序加入所有线程,阻塞直到每个线程完成.但是每个线程只会加入OutOfTargetsError.使用短命的线程并将while循环移动到父节点可能会避免死锁.没有保证,但也许这样的事情?
- 8.times do |t|
- fork do
- running = true
- Signal.trap("INT") do
- puts "Interrupt signal received,waiting for threads to finish..."
- running = false
- end
- data = BrB::Tunnel.create(nil,:verbose => false)
- puts "Launching max #{threads_num} worker threads... (PID: #{Process.pid})"
- threads = []
- while running
- # Start new threads until we have threads_num running
- until threads.length >= threads_num do
- threads << Thread.new {
- begin
- worker = Worker.new(data,config)
- rescue OutOfTargetsError
- rescue Exception => e
- puts "An unexpected exception was caught: #{e.class} => #{e}"
- sleep 5
- end
- }
- end
- # Make sure the parent process doesn't spin too much
- sleep 1
- # Join finished threads
- finished_threads = threads.reject &:status
- threads -= finished_threads
- finished_threads.each &:join
- end
- data.stop_service
- EM.stop
- end
- end