Algorithm 选择尽可能多的行，以保证每列的项目密度_Algorithm

Algorithm 选择尽可能多的行，以保证每列的项目密度

algorithm

Algorithm 选择尽可能多的行，以保证每列的项目密度,algorithm,Algorithm,假设我们有一个0-1矩阵，如： 0, 1, 1, 1, 1 1, 1, 1, 1, 1 1, 0, 1, 0, 1 1, 1, 1, 1, 1 0, 1, 1, 0, 1 1, 1, 0, 1, 0 目标是从该矩阵中选择尽可能多的行，以形成新矩阵，并确保在新矩阵中，每列包含不少于80%的1 例如，对于上述矩阵，结果将为： 0, 1, 1, 1, 1 1, 1, 1, 1, 1 1, 0, 1, 0, 1 1, 1, 1, 1, 1 1, 1, 0, 1, 0 贪婪算

假设我们有一个0-1矩阵，如：

0, 1, 1, 1, 1 
1, 1, 1, 1, 1 
1, 0, 1, 0, 1 
1, 1, 1, 1, 1 
0, 1, 1, 0, 1 
1, 1, 0, 1, 0

目标是从该矩阵中选择尽可能多的行，以形成新矩阵，并确保在新矩阵中，每列包含不少于80%的

例如，对于上述矩阵，结果将为：

0, 1, 1, 1, 1 
1, 1, 1, 1, 1 
1, 0, 1, 0, 1 
1, 1, 1, 1, 1 
1, 1, 0, 1, 0

贪婪算法显然不适用于这个问题，正如我所看到的，普通DP也不适用

在实际问题中，矩阵将有大约7000行和100列。因为会有一些all-1行，所以总会存在至少一个解决方案

有人能帮我启发一下吗？谢谢

简单答案的实现：

（过于简单，因为搜索是“胆小的”-在评估状态是否可接受之前只需一步，它不会重新排序以查找有助于列更快达到80%的行

code.rb： 7000个随机行的时间：

实现：2742 接受以下接受：[1、0、0、0、0、0、0、0、0、0、1、1、1、1、1、0、0、1、1、1、1、1、1、1、1、1、1、1、1、1、0、0、0、0、1、0、1、0、1、0、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、0、1、1、1、1、1、1、1、1、0、1、1、1、1、1、1、1、1、0、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1、1,1,1,0,1,1] 实现：2743 实现：2743 解决方案。百分比=[81.18847976667881, 80.42289464090412, 81.00619759387531, 80.93328472475392, 80.67808968282901, 80.38643820634341, 81.22493620123952, 81.55304411228582, 81.22493620123952, 80.20415603353992, 80.45935107546481, 81.22493620123952, 80.56872037914691, 80.02187386073642, 81.15202333211812, 82.39154210718192, 80.02187386073642, 80.24061246810062, 81.26139263580022, 80.38643820634341, 80.02187386073642, 80.27706890266131, 80.16769959897921, 81.00619759387531, 80.49580751002551, 81.37076193948232, 81.69886985052861, 80.24061246810062, 81.00619759387531, 80.34998177178271, 80.20415603353992, 81.69886985052861, 81.51658767772511, 80.64163324826832, 80.02187386073642, 80.02187386073642, 80.02187386073642, 80.02187386073642, 80.34998177178271, 80.27706890266131, 80.02187386073642, 80.16769959897921, 80.82391542107182, 81.29784907036091, 81.77178271965002, 80.75100255195042, 81.84469558877142, 80.53226394458622, 80.02187386073642, 80.86037185563251, 80.09478672985782, 81.18847976667881, 81.15202333211812, 80.31352533722202, 82.28217280349982, 82.02697776157491, 81.48013124316441, 80.64163324826832, 80.89682829019321, 81.11556689755741, 81.26139263580022, 80.64163324826832, 80.64163324826832, 80.45935107546481, 80.86037185563251, 80.31352533722202, 80.05833029529711, 81.40721837404301, 81.00619759387531, 81.77178271965002, 80.96974115931461, 81.22493620123952, 81.37076193948232, 80.49580751002551, 80.05833029529711, 80.89682829019321, 81.44367480860372, 80.02187386073642, 80.02187386073642, 81.55304411228582, 80.67808968282901, 80.49580751002551, 81.26139263580022, 80.02187386073642, 80.27706890266131, 80.42289464090412, 80.45935107546481, 81.55304411228582, 81.77178271965002, 80.45935107546481, 81.73532628508931, 80.75100255195042, 83.04775792927451, 80.45935107546481, 80.02187386073642, 80.02187386073642, 81.04265402843602, 81.51658767772511, 80.89682829019321, 81.58950054684651] 最小溶液百分比=80.02187386073642 解决方案有2743行考虑有4267行完成了3个循环，每个循环结束时完成：[274227432743]行

real 5m57.637s 用户5m57.446s

sys 0m0.335s在我看来是NP难的。我会选择类似于子集和和整数线性规划的近似模式。另一方面，DP可能会有所帮助；你能展示一下你在DP方面的尝试吗？@G.Bach我已经想到了DP，也试图将实际问题简化为可使用DP的问题。但目前我是最肯定的是，如果不是肯定的话，DP不会有帮助：-（一种方法是有界搜索，从最可能的候选行开始，向下搜索列表，查看可以添加哪些其他行。首先按设置的位数对行进行排序，然后依次检查每一行，查看是否可以在不破坏约束的情况下将其添加到解决方案中。继续向下运行尚未添加的行列表，直到l您已在未向解决方案中添加新行的情况下运行了列表。这将生成一个解决方案。如果您想尝试第二次通过，请计算每列中设置的位数，并对每行的分数进行加权，使设置了位数较低的行排名更高（以平衡百分比）。以此为起点，您可以决定搜索更好的解决方案所需的额外时间。其他起点包括从选定的所有行开始，并尝试查找要删除的最小行，可能会更多地考虑得分最差的列中带零的行。

#!/usr/bin/env ruby
data = [
[0, 1, 1, 1, 1 ],
[1, 1, 1, 1, 1 ],
[1, 0, 1, 0, 1 ],
[1, 1, 1, 1, 1 ],
[0, 1, 1, 0, 1 ],
[1, 1, 0, 1, 0 ]
]

# array with blocks of different average densities
data = 990.times.collect do
         limit = rand(1000)
         100.times.collect do
          rand(1000) <= limit ? 1 : 0
        end
      end + 10.times.collect { 100.times.collect { 1 } }

#puts "data = #{data.inspect}"

def sum(list)
  list.inject(0){|res,v| res + v}
end

def column_percent(array)
  multiplier = 100.0 / array.count
  array.transpose.collect{|column| sum(column) * multiplier}
end

sorted_data = data.sort{|a,b| sum(b) <=> sum(a)}

#puts sorted_data.inspect
puts "Data percentages: #{column_percent(data).inspect}"
puts "Average over data: #{column_percent(data).min.inspect}"


solution = [ ]
consider = sorted_data
discarded = [ ]
loops = 0
done_something = true
achieved = [ ]
while (done_something)
  loops += 1
  done_something = false
  while (!consider.empty?)
    row = consider.shift
    #puts "Considerring: #{row.inspect}"
    if column_percent(solution + [ row ]).min >= 80.0
      done_something = true
      solution.push row
    else
      discarded.push row
    end
  end
  achieved << solution.count
  consider = discarded
  discarded = [ ]
end

puts "solution: #{solution.inspect}" if solution.count < 10
puts "solution.percents = #{column_percent(solution).inspect}"
puts "min solution.percents = #{column_percent(solution).min.inspect}"
puts "solution has #{solution.count.inspect} rows"
puts "consider has #{consider.count.inspect} rows"
puts "went through #{loops} loops, achievment at end of each loop: #{achieved.inspect} rows"

exit

Data percentages: [66.66666666666667, 83.33333333333334, 83.33333333333334, 66.66666666666667, 83.33333333333334]
Average over data: 66.66666666666667
solution: [[1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]
solution.percents = [100.0, 100.0, 100.0, 100.0, 100.0]
min solution.percents = 100.0
solution has 2 rows
consider has 4 rows
went through 2 loops, achievment at end of each loop: [2, 2] rows


$ time ruby code.rb # 1000 rows

Data percentages: [50.900000000000006, 48.900000000000006, 50.2, 49.7, 47.5, 50.800000000000004, 50.2, 48.900000000000006, 51.300000000000004, 49.1, 49.900000000000006, 48.7, 49.7, 48.900000000000006, 50.300000000000004, 52.400000000000006, 51.0, 49.900000000000006, 50.800000000000004, 49.6, 49.0, 50.1, 49.1, 48.7, 50.800000000000004, 49.0, 49.2, 49.900000000000006, 48.800000000000004, 50.1, 50.2, 49.6, 49.900000000000006, 50.2, 50.900000000000006, 49.2, 51.7, 49.300000000000004, 48.400000000000006, 49.400000000000006, 49.5, 49.6, 47.7, 50.0, 46.900000000000006, 51.0, 50.0, 51.5, 50.5, 49.300000000000004, 49.1, 50.400000000000006, 47.800000000000004, 51.800000000000004, 50.2, 49.400000000000006, 49.400000000000006, 49.0, 51.5, 48.0, 53.7, 49.1, 51.300000000000004, 50.400000000000006, 50.800000000000004, 48.900000000000006, 50.6, 47.0, 50.300000000000004, 49.400000000000006, 50.800000000000004, 51.300000000000004, 52.900000000000006, 50.0, 51.300000000000004, 47.800000000000004, 51.300000000000004, 47.6, 49.900000000000006, 54.5, 49.5, 51.800000000000004, 50.800000000000004, 50.400000000000006, 51.0, 50.1, 47.7, 49.6, 53.300000000000004, 50.2, 49.7, 51.5, 47.900000000000006, 49.7, 48.0, 48.6, 49.6, 48.900000000000006, 50.1, 50.7]
Average over data: 46.900000000000006
solution.percents = [84.02366863905326, 83.13609467455622, 85.50295857988166, 81.65680473372781, 80.17751479289942, 82.54437869822486, 83.13609467455622, 81.95266272189349, 82.54437869822486, 80.4733727810651, 87.27810650887574, 82.84023668639054, 83.4319526627219, 82.54437869822486, 80.76923076923077, 84.31952662721893, 81.36094674556213, 85.79881656804734, 82.24852071005917, 83.72781065088758, 81.65680473372781, 82.24852071005917, 80.76923076923077, 82.54437869822486, 85.20710059171599, 83.72781065088758, 80.17751479289942, 83.72781065088758, 82.84023668639054, 81.95266272189349, 84.61538461538461, 80.17751479289942, 81.95266272189349, 81.36094674556213, 84.02366863905326, 84.61538461538461, 84.02366863905326, 83.72781065088758, 82.24852071005917, 84.31952662721893, 84.02366863905326, 84.02366863905326, 80.17751479289942, 82.84023668639054, 80.4733727810651, 82.84023668639054, 83.13609467455622, 82.84023668639054, 80.17751479289942, 80.17751479289942, 82.84023668639054, 83.72781065088758, 80.17751479289942, 81.95266272189349, 81.95266272189349, 82.84023668639054, 80.76923076923077, 81.95266272189349, 81.95266272189349, 82.84023668639054, 85.20710059171599, 83.4319526627219, 83.72781065088758, 80.17751479289942, 84.31952662721893, 82.54437869822486, 86.09467455621302, 81.95266272189349, 82.54437869822486, 81.95266272189349, 81.95266272189349, 83.72781065088758, 83.4319526627219, 84.61538461538461, 86.68639053254438, 81.06508875739645, 83.4319526627219, 80.76923076923077, 80.76923076923077, 85.79881656804734, 82.84023668639054, 85.79881656804734, 84.31952662721893, 82.24852071005917, 84.02366863905326, 80.76923076923077, 80.17751479289942, 84.9112426035503, 83.72781065088758, 84.61538461538461, 83.13609467455622, 84.61538461538461, 84.61538461538461, 82.54437869822486, 80.76923076923077, 82.84023668639054, 80.4733727810651, 80.17751479289942, 82.84023668639054, 80.17751479289942]
min solution.percents = 80.17751479289942
solution has 338 rows
consider has 662 rows
went through 3 loops, achievment at end of each loop: [337, 338, 338] rows

real    0m7.588s
user    0m7.435s
sys 0m0.142s