Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 红宝石线程&;互斥:为什么我的代码无法按顺序获取JSON?_Ruby_Multithreading_Mutex - Fatal编程技术网

Ruby 红宝石线程&;互斥:为什么我的代码无法按顺序获取JSON?

Ruby 红宝石线程&;互斥:为什么我的代码无法按顺序获取JSON?,ruby,multithreading,mutex,Ruby,Multithreading,Mutex,我编写了一个爬虫程序,它使用8个线程从Internet下载JSON: #encoding: utf-8 require 'net/http' require 'sqlite3' require 'zlib' require 'json' require 'thread' $mutex = Mutex.new # Lock of database and $cnt $cntMutex = Mutex.new # Lock of $threadCnt $threadCnt = 0 # number

我编写了一个爬虫程序,它使用8个线程从Internet下载JSON:

#encoding: utf-8
require 'net/http'
require 'sqlite3'
require 'zlib'
require 'json'
require 'thread'

$mutex = Mutex.new # Lock of database and $cnt
$cntMutex = Mutex.new # Lock of $threadCnt
$threadCnt = 0 # number of running threads 
$cnt = 0 # number of lines in this COMMIT to database

db = SQLite3::Database.new "price.db"
db.results_as_hash = true
STDOUT.sync = true
start = 10000000    
def fetch(http, url, timeout = 10) 
    # ...
end

def parsePrice( i, db)
        ss = fetch(Net::HTTP.start('p.3.cn',80), 'http://p.3.cn/prices/get?skuid=J_'+i.to_s)
        doc = JSON.parse(ss)[0]
        puts "processing "+i.to_s
        STDOUT.flush
        begin
                $mutex.synchronize {
                        $cnt = $cnt+1
                        db.execute("insert into prices (id, price) VALUES (?,?)", [i,doc["p"].to_f])
                        if $cnt > 20
                                db.execute('COMMIT')
                                db.execute('BEGIN')
                                $cnt = 0
                        end
                }
        rescue SQLite3::ConstraintException
                warn("duplicate id: "+i.to_s)
                $cntMutex.synchronize {
                        $threadCnt -= 1;
                }
                Thread.terminate
        rescue NoMethodError
                warn("Matching failed")
        rescue
                raise
        ensure
        end

        $cntMutex.synchronize {
                $threadCnt -= 1;
        }
end



puts "will now start from " + start.to_s()
db.execute("BEGIN")

Thread.new {
        for ii in start..12000000 do

                sleep 0.1 while $threadCnt > 7

                $cntMutex.synchronize {
                        $threadCnt += 1;
                }
                Thread.new { 
                        parsePrice( ii, db)
                }



        end
        db.execute('COMMIT')
} . join
然后我创建了一个名为
price.db
的数据库:

sqlite3 > create table prices (id INT PRIMATY KEY, price REAL);
为了使我的代码线程安全,
db
$cnt
$threadCnt
都受到
$mutex
$cntMutex
的保护

但是,当我尝试运行此脚本时,打印了以下消息:

[lz@lz crawl]$ ruby priceCrawler.rb 
will now start from 10000000
http://p.3.cn/prices/get?skuid=J_10000008http://p.3.cn/prices/get?skuid=J_10000008
http://p.3.cn/prices/get?skuid=J_10000008http://p.3.cn/prices/get?skuid=J_10000002http://p.3.cn/prices/get?skuid=J_10000008
http://p.3.cn/prices/get?skuid=J_10000008



http://p.3.cn/prices/get?skuid=J_10000002http://p.3.cn/prices/get?skuid=J_10000002

processing 10000002
processing 10000002processing 10000008processing 10000008processing 10000002

duplicate id: 10000002

duplicate id: 10000002processing 10000008
processing 10000008duplicate id: 10000008


duplicate id: 10000008processing 10000008
duplicate id: 10000008
这个脚本似乎跳过了某个id,并多次使用相同的id调用了
parsePrice


那么为什么会出现这种错误呢?任何帮助都将不胜感激。

在我看来,您的线程调度是错误的。我已经修改了您的代码,以说明您触发的一些可能的比赛条件

re 'net/http'
require 'sqlite3'
require 'zlib'
require 'json'
require 'thread'

$mutex = Mutex.new # Lock of database and $cnt
$cntMutex = Mutex.new # Lock of $threadCnt
$threadCnt = 0 # number of running threads 
$cnt = 0 # number of lines in this COMMIT to database

db = SQLite3::Database.new "price.db"
db.results_as_hash = true
STDOUT.sync = true
start = 10000000    
def fetch(http, url, timeout = 10) 
  # ...
end

def parsePrice(i, db)
  must_terminate = false

  ss = fetch(Net::HTTP.start('p.3.cn',80), "http://p.3.cn/prices/get?skuid=J_#{i}")
  doc = JSON.parse(ss)[0]
  puts "processing #{i}"
  STDOUT.flush
  begin
    $mutex.synchronize {
      $cnt = $cnt+1
      db.execute("insert into prices (id, price) VALUES (?,?)", [i,doc["p"].to_f])
      if $cnt > 20
        db.execute('COMMIT')
        db.execute('BEGIN')
        $cnt = 0
      end
    }
  rescue SQLite3::ConstraintException
    warn("duplicate id: #{i}")
    must_terminate = true
  rescue NoMethodError
    warn("Matching failed")
  rescue
    # Raising here does not prevent ensure from running.
    # It will raise after we decrement $threadCnt on
    # ensure clause.
    raise
  ensure
    $cntMutex.synchronize {
      $threadCnt -= 1;
    }
  end

  Thread.terminate if must_terminate
end

puts "will now start from #{start}"

# This begin makes no sense for me.
db.execute("BEGIN")

for ii in start..12000000 do
  should_redo = false

  # Instead of sleeping, we acquire the lock and check
  # if we can create another thread. If we can't, we just 
  # release the lock and retry latter (using for-redo).
  $cntMutex.synchronize{
    if $threadCnt <= 7
      $threadCnt += 1;
      Thread.new { parsePrice(ii, db) }
    else
      # We use this flag since we don't know for sure redo's
      # behavior inside a lock.
      should_redo = true
    end

  }

  # Will redo this iteration if we can't create the thread.
  if should_redo
    # Mitigate busy waiting a bit.
    sleep(0.1)
    redo
  end
end

# This commit makes no sense to me.
db.execute('COMMIT')

Thread.list.each { |t| t.join }
re'net/http'
需要'sqlite3'
需要“zlib”
需要“json”
需要“线程”
$mutex=mutex.new#数据库锁和$cnt
$cntMutex=Mutex.new#锁定$threadCnt
$threadCnt=0#正在运行的线程数
$cnt=0#此提交到数据库中的行数
db=SQLite3::Database.new“price.db”
db.results\u as\u hash=true
STDOUT.sync=true
开始=10000000
def fetch(http,url,超时=10)
# ...
结束
价格(一分贝)
必须终止=错误
ss=fetch(Net::HTTP.start('p.3.cn',80),”http://p.3.cn/prices/get?skuid=J_#{i} ))
doc=JSON.parse(ss)[0]
放入“处理#{i}”
冲洗
开始
$mutex.synchronize{
$cnt=$cnt+1
db.执行(“插入价格(id,价格)值(?,)”,[i,文件[“p”]至_f])
如果$cnt>20
db.execute('COMMIT'))
db.execute('BEGIN')
$cnt=0
结束
}
rescue SQLite3::ConstraintException
警告(“重复id:#{i}”)
必须终止=真
救援指名员
警告(“匹配失败”)
营救
#在此处提升不会阻止Sure运行。
#在我们减少$threadCnt之后,它将增加
#保证条款。
提升
确保
$cntMutex.synchronize{
$threadCnt-=1;
}
结束
Thread.terminate如果必须终止
结束
放置“现在将从#{start}开始”
#这对我来说毫无意义。
db.execute(“开始”)
对于开始阶段的ii..12000000
是否应该重做=错误
#我们没有睡觉,而是得到锁并检查
#如果我们可以创建另一个线程。如果我们不能,我们只是
#释放锁并重试后者(用于重做)。
$cntMutex.synchronize{
如果$threadCnt