Ruby 如何在I'；之前阻止删除Creek创建的临时文件；我受够了？_Ruby_Garbage Collection

Ruby 如何在I'；之前阻止删除Creek创建的临时文件；我受够了？

ruby

Ruby 如何在I'；之前阻止删除Creek创建的临时文件；我受够了？,ruby,garbage-collection,Ruby,Garbage Collection,我正在编写一个脚本，其中包含Creek和一个.xlsx文件，并使用它更新数据库中产品的价格和权重。.xlsx文件位于AWS服务器上，因此Creek会在使用该文件时将其向下复制并存储在Tempfile中问题是，在某个点上，Tempfile似乎被过早地删除了，而且由于Creek每次遍历工作表时都会继续调用Tempfile，因此脚本失败。有趣的是，我同事的环境很好地运行了脚本，尽管我没有发现我们运行的脚本之间有什么区别以下是我写的脚本： require 'creek' class Pricing

我正在编写一个脚本，其中包含Creek和一个.xlsx文件，并使用它更新数据库中产品的价格和权重。.xlsx文件位于AWS服务器上，因此Creek会在使用该文件时将其向下复制并存储在Tempfile中

问题是，在某个点上，Tempfile似乎被过早地删除了，而且由于Creek每次遍历工作表时都会继续调用Tempfile，因此脚本失败。有趣的是，我同事的环境很好地运行了脚本，尽管我没有发现我们运行的脚本之间有什么区别

以下是我写的脚本：

require 'creek'

class PricingUpdateWorker
  include Sidekiq::Worker

  def perform(filename)
    # This points to the file in the root bucket
    file = bucket.files.get(filename)

    # Make public temporarily to open in Creek
    file.public = true
    file.save

    creek_sheets = Creek::Book.new(file.public_url, remote: true).sheets

    # Close file to public
    file.public = false
    file.save

    creek_sheets.each_with_index do |sheet, sheet_index|
      p "---------- #{sheet.name} ----------"

      sheet.simple_rows.each_with_index do |row, index|
        next if index == 0

        product = Product.find_by_id(row['A'].to_i)
        if product
          if row['D']&.match(/N\/A/) || row['E']&.match(/N\/A/)
            product.delete
            p '*** deleted ***'
          else
            product.price = row['D']&.to_f&.round(2)
            product.weight = row['E']&.to_f
            product.request_for_quote = false
            product.save
            p 'product updated'
          end
        else
          p "#{row['A']} | product not found ***"
        end
      end
    end
  end

  private

  def connection
    @connection ||= Fog::Storage.new(
      provider: 'AWS',
      aws_access_key_id: ENV['AWS_ACCESS_KEY_ID'],
      aws_secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
    )
  end

  def bucket
    # Grab the file from the bucket
    @bucket ||= connection.directories.get 'my-aws-bucket'
  end
end

以及日志：

"---------- Sheet 1 ----------"
"product updated"
"product updated"
... I've cut out a bunch more of these...
"product updated"
"product updated"
"---------- Sheet 2 ----------"
rails aborted!
Errno::ENOENT: No such file or directory @ rb_sysopen - /var/folders/9m/mfcnhxmn1bqbm6h91rx_rd8m0000gn/T/file20190920-19247-c6x4zw

“/var/folders/9m/mfcnhxmn1bqbm6h91rx_rd8m0000gn/T/file20190920-19247-c6x4zw”是临时文件，正如您所见，它已经被收集，即使我还在使用它，我相信它仍然在范围内。你知道这是什么原因吗？特别奇怪的是，我的同事能很好地处理这件事

如果有帮助，下面是来自Creek的一些代码：

def initialize path, options = {}
      check_file_extension = options.fetch(:check_file_extension, true)
      if check_file_extension
        extension = File.extname(options[:original_filename] || path).downcase
        raise 'Not a valid file format.' unless (['.xlsx', '.xlsm'].include? extension)
      end
      if options[:remote]
        zipfile = Tempfile.new("file")
        zipfile.binmode
        zipfile.write(HTTP.get(path).to_s)
        # I added the line below this one, and it fixes the problem by preventing the file from being marked for garbage collection, though I shouldn't need to take steps like that.
        # ObjectSpace.undefine_finalizer(zipfile)
        zipfile.close
        path = zipfile.path
      end
      @files = Zip::File.open(path)
      @shared_strings = SharedStrings.new(self)
    end

编辑：有人想知道我是如何运行我的代码的，所以就在这里

我通过在命令行中执行bundle exec rails client:pricing\u update[client\u updated\u prices.xlsx]来运行以下rake任务

namespace :client do
  desc 'Imports the initial database structure & base data from uploaded .xlsx file'
  task :pricing_update, [:filename] => :environment do |t, args|
    PricingUpdateWorker.new.perform(args[:filename])
  end
end

我还应该提到，我正在运行Rails，因此Gemfile.lock使我和同事之间的gem版本保持一致。我的fog版本是2.0.0，我的rubyzip版本是1.2.2。

最后，问题似乎根本不在Creek gem中，而是在rubyzip gem中，xlsx文件有问题，如中所述，这似乎取决于文件源是如何生成的。我在GoogleSheets中创建了一个简单的2页电子表格，它工作得很好，但随机的xlsx文件可能不行

require 'creek'

def test_creek(url)
  Creek::Book.new(url, remote: true).sheets.each_with_index do |sheet, index|
    p "----------Name: #{sheet.name} Index: #{index} ----------"
    sheet.simple_rows.each_with_index do |row, i|
        puts "#{row} index: #{i}"
    end
  end
end

test_creek 'https://tc-sandbox.s3.amazonaws.com/creek-test.xlsx'
# works fine should output
"----------Name: Sheet1 Index: 0 ----------"
{"A"=>"foo ", "B"=>"sheet", "C"=>"one"} index: 0
"----------Name: Sheet2 Index: 1 ----------"
{"A"=>"bar", "B"=>"sheet", "C"=>"2.0"} index: 0

test_creek 'http://dev-builds.libreoffice.org/tmp/test.xlsx'
# raises error

最后，这个bug似乎根本不在Creek gem中，而是在rubyzip gem中，它与xlsx文件有问题，如中所述，这似乎取决于文件源是如何生成的。我在GoogleSheets中创建了一个简单的2页电子表格，它工作得很好，但随机的xlsx文件可能不行

require 'creek'

def test_creek(url)
  Creek::Book.new(url, remote: true).sheets.each_with_index do |sheet, index|
    p "----------Name: #{sheet.name} Index: #{index} ----------"
    sheet.simple_rows.each_with_index do |row, i|
        puts "#{row} index: #{i}"
    end
  end
end

test_creek 'https://tc-sandbox.s3.amazonaws.com/creek-test.xlsx'
# works fine should output
"----------Name: Sheet1 Index: 0 ----------"
{"A"=>"foo ", "B"=>"sheet", "C"=>"one"} index: 0
"----------Name: Sheet2 Index: 1 ----------"
{"A"=>"bar", "B"=>"sheet", "C"=>"2.0"} index: 0

test_creek 'http://dev-builds.libreoffice.org/tmp/test.xlsx'
# raises error

如果将

#Close file to public

之后的两行移动到循环结束后会发生什么情况？LaCostnyCoder恐怕没有什么不同。我删除了我的答案，因为在使用公共xlsx文件进行了更多测试后，我能够得到一个更基本的示例，可以在没有错误的情况下工作。我可以问一下你是如何运行代码的吗？另外，您正在运行哪些版本的Fog和rubyzip gems？确保它们与Heroku或工作机器上运行的内容相同。@lacostenycoder我已经用您问题的答案更新了我的帖子。如果您是从rake上运行的，而不是使用

。执行异步您甚至需要包含Sidekiq:：Worker
？但我不认为这是你的问题。如果你在循环结束后将#Close file to public
后的2行移动到public，会发生什么？Lacostynycoder恐怕没有什么不同。我删除了我的答案，因为在对一个公共xlsx文件进行了更多测试后，我能够得到一个更基本的示例，以确保没有错误。我可以问一下你是如何运行代码的吗？另外，您正在运行哪些版本的Fog和rubyzip gems？确保它们与Heroku或工作机器上运行的内容相同。@lacostenycoder我已经用您问题的答案更新了我的帖子。如果您是从rake上运行的，而不是使用。执行异步您甚至需要包含Sidekiq:：Worker
？但我不认为这是你的问题。我也评论了谢谢你的回答！不幸的是，这对我不起作用。它也没有解释为什么代码可以在其他人的设置上工作，这是谜题的一部分。@Dinopolis是其他人的设置使用相同版本的东西吗？我们不知道你的依赖链。但我可以告诉你，我在S3上用一个测试xlsx文件测试了类似的分条代码，并在本地创建了一个文件，而不是处理tempfile，这对我来说很好。他们使用相同的分支和相同的gem规范，并且使用相同版本的Ruby（2.5.1）。该代码在部署到Heroku时也可以工作。当我运行你的代码时，我得到一个错误，告诉我文件不是有效的格式。确切的错误是什么？如果是编码，请尝试省去“w:ASCII-8BIT”
，只需传递w
即可写入文件。但是，当在Heroku或其他机器上测试时，您是否在S3上使用相同的精确文件进行测试？我还评论了感谢您的回答！不幸的是，这对我不起作用。它也没有解释为什么代码可以在其他人的设置上工作，这是谜题的一部分。@Dinopolis是其他人的设置使用相同版本的东西吗？我们不知道你的依赖链。但我可以告诉你，我在S3上用一个测试xlsx文件测试了类似的分条代码，并在本地创建了一个文件，而不是处理tempfile，这对我来说很好。他们使用相同的分支和相同的gem规范，并且使用相同版本的Ruby（2.5.1）。该代码在部署到Heroku时也可以工作。当我运行你的代码时，我得到一个错误，告诉我文件不是有效的格式。确切的错误是什么？如果是编码，请尝试省去“w:ASCII-8BIT”
，只需传递w
即可写入文件。但是，当在Heroku或其他机器上测试时，您是否在S3上使用相同的文件进行测试？