Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 用于筛选表的Regexp_Php_Ruby_Regex_Ruby On Rails 3_Codeigniter - Fatal编程技术网

Php 用于筛选表的Regexp

Php 用于筛选表的Regexp,php,ruby,regex,ruby-on-rails-3,codeigniter,Php,Ruby,Regex,Ruby On Rails 3,Codeigniter,好吧,我有一个表,它由一些开源软件输出,但它没有以实际的表格式输出,例如 <table> <thead> <td>Heading</td> <thead> <tbody> <tr> <td>Content</td> </tr> <tbody> </table 所以我不能建立一个网络刮板来获取数据,或者

好吧,我有一个表,它由一些开源软件输出,但它没有以实际的表格式输出,例如

<table> 
  <thead>
     <td>Heading</td>
  <thead>
  <tbody>
    <tr>
       <td>Content</td>
    </tr>
  <tbody>
</table
所以我不能建立一个网络刮板来获取数据,或者我不是舒尔,如果我可以建立一个刮板来刮板,因为它都被包装在一个

text.lines.to_a.each do |line|
   line.sub(/^\| |^\+*-*\+*\-*/) do |match|
    puts "Regexp Match: " << match
end
STDIN.getc
puts "New Line "<< line
end
例如,第一行的输出将仅为
+--------------+------------
它是CSV格式的,因此我将使用
Gsub
将剩余的
替换为


我可以使用PHP或Ruby,因此任何答案都是非常受欢迎的

对于从表中获取字段的主要工作,请使用带有模式的
split
来获取每一行:

$table = '+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+';

$lines = preg_split('/\r\n|\r|\n/', $table);
$array = array();
foreach($lines as $line){
  if(!preg_match('/\+-+\+/', $line)){
    $array[] = preg_split('/\s*\|\s*/', trim($line, '| '));
  }
}

print_r($array);
这将根据每个
|
和周围的任何空格将行拆分为一个数组。丢弃数组的第一个和最后一个元素,因为模式也匹配开头和结尾
|

签出:

Array
(
    [0] => Array
        (
            [0] => HEADING 1
            [1] => HEADING 2
            [2] => ETC
            [3] => ANOTHER
            [4] => HEADING3
            [5] => HEADING4
            [6] => SML
        )

    [1] => Array
        (
            [0] => content
            [1] => more content
            [2] => cont
            [3] => More more
            [4] => content
            [5] => content 2.0
            [6] => litl
        )

    [2] => Array
        (
            [0] => content
            [1] => more content
            [2] => cont
            [3] => More more
            [4] => content
            [5] => content 2.0
            [6] => litl
        )

    [3] => Array
        (
            [0] => content
            [1] => more content
            [2] => cont
            [3] => More more
            [4] => content
            [5] => content 2.0
            [6] => litl
        )

    [4] => Array
        (
            [0] => content
            [1] => more content
            [2] => cont
            [3] => More more
            [4] => content
            [5] => content 2.0
            [6] => litl
        )

    [5] => Array
        (
            [0] => content
            [1] => more content
            [2] => cont
            [3] => More more
            [4] => content
            [5] => content 2.0
            [6] => litl
        )

    [6] => Array
        (
            [0] => content
            [1] => more content
            [2] => cont
            [3] => More more
            [4] => content
            [5] => content 2.0
            [6] => litl
        )

    [7] => Array
        (
            [0] => content
            [1] => more content
            [2] => cont
            [3] => More more
            [4] => content
            [5] => content 2.0
            [6] => litl
        )

    [8] => Array
        (
            [0] => content
            [1] => more content
            [2] => cont
            [3] => More more
            [4] => content
            [5] => content 2.0
            [6] => litl
        )

    [9] => Array
        (
            [0] => TOTALS        AGENTS:21
            [1] => total
            [2] => total
            [3] => total
            [4] => total
            [5] => total
        )

)
输出:

require 'builder'

table = '+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+';

def parse_table(table)
  rows = []
  table.each_line do |line|
    next if line.match /^\+/
    rows << line.split(/\s*\|\s*/).reject(&:empty?) 
  end
  rows
end

def html_row(xml, columns)
  xml.tr do
    columns.each do |column|
      xml.td column
    end
  end
end

def html_table(rows)
  head_row = rows.first
  body_rows = rows[1..-1]

  xml = Builder::XmlMarkup.new :indent => 2
  xml.table do
    xml.thead do
      html_row xml, head_row
    end
    xml.tbody do
      body_rows.each do |body_row|
        html_row xml, body_row
      end
    end
  end.to_s
end


rows = parse_table(table)
html = html_table(rows)
puts html
@text = <<END
+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+
END
s = @text.scan(/^[|]\W(.*)[|]$/)
puts s
arr = []
arr2 = []
s.each do |o|
  a = o.to_s.split('|')
    a.each do |oo|
      arr2 << oo.to_s.gsub('["','').gsub('"]','').gsub(/\s+/, "")
    end
    arr << arr2
  arr2 = []
end
arr.each do |i|
  puts i
end

希望这有帮助:)

这是一个完整的ruby解决方案。不过,您需要在最后一行手动添加一个
|

<table>
  <thead>
    <tr>
      <td>HEADING 1</td>
      <td>HEADING 2</td>
      <td>ETC</td>
      <td>ANOTHER</td>
      <td>HEADING3</td>
      <td>HEADING4</td>
      <td>SML</td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>TOTALS        AGENTS:21</td>
      <td>total</td>
      <td>total</td>
      <td>total</td>
      <td>total</td>
      <td>total</td>
    </tr>
  </tbody>
</table>
需要“生成器”
桌子+------------+-------------+-------+-------------+------------+---------------+----------+
|品目1 |品目2 |等|另一|品目3 |品目4 | SML|
+------------+-------------+-------+-------------+------------+---------------+----------+
|内容|更多内容|继续|更多|内容|内容2.0 | litl|
|内容|更多内容|继续|更多|内容|内容2.0 | litl|
|内容|更多内容|继续|更多|内容|内容2.0 | litl|
|内容|更多内容|继续|更多|内容|内容2.0 | litl|
|内容|更多内容|继续|更多|内容|内容2.0 | litl|
|内容|更多内容|继续|更多|内容|内容2.0 | litl|
|内容|更多内容|继续|更多|内容|内容2.0 | litl|
|内容|更多内容|继续|更多|内容|内容2.0 | litl|
+------------+-------------+-------+-------------+------------+--------------+----------+
|总计代理:21 |总计|总计|总计|总计|总计|
+------------+-------------+-------+-------------+------------+--------------+----------+';
def parse_表格(表格)
行=[]
表1.每行do|
下一个if line.match/^\+/
第2行
xml.do表
xml.thead-do
html\u行xml,头\u行
结束
xml.tbody-do
身体排。每个都做身体排|
html_行xml,body_行
结束
结束
完
结束
行=解析表(表)
html=html\u表格(行)
放置html
输出:

require 'builder'

table = '+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+';

def parse_table(table)
  rows = []
  table.each_line do |line|
    next if line.match /^\+/
    rows << line.split(/\s*\|\s*/).reject(&:empty?) 
  end
  rows
end

def html_row(xml, columns)
  xml.tr do
    columns.each do |column|
      xml.td column
    end
  end
end

def html_table(rows)
  head_row = rows.first
  body_rows = rows[1..-1]

  xml = Builder::XmlMarkup.new :indent => 2
  xml.table do
    xml.thead do
      html_row xml, head_row
    end
    xml.tbody do
      body_rows.each do |body_row|
        html_row xml, body_row
      end
    end
  end.to_s
end


rows = parse_table(table)
html = html_table(rows)
puts html
@text = <<END
+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+
END
s = @text.scan(/^[|]\W(.*)[|]$/)
puts s
arr = []
arr2 = []
s.each do |o|
  a = o.to_s.split('|')
    a.each do |oo|
      arr2 << oo.to_s.gsub('["','').gsub('"]','').gsub(/\s+/, "")
    end
    arr << arr2
  arr2 = []
end
arr.each do |i|
  puts i
end

标题1
标题2
等
另一个
头3
头4
SML
内容
更多内容
续
更多
内容
内容2.0
利特尔
内容
更多内容
续
更多
内容
内容2.0
利特尔
内容
更多内容
续
更多
内容
内容2.0
利特尔
内容
更多内容
续
更多
内容
内容2.0
利特尔
内容
更多内容
续
更多
内容
内容2.0
利特尔
内容
更多内容
续
更多
内容
内容2.0
利特尔
内容
更多内容
续
更多
内容
内容2.0
利特尔
内容
更多内容
续
更多
内容
内容2.0
利特尔
总数:21
全部的
全部的
全部的
全部的
全部的

这可能不像可能的那么干净,但它适用于此示例:) 红宝石:


@text=使用HTML解析器选择
pre
标记中的文本,然后使用子字符串提取数据(我假设列位于固定位置)。如果列的宽度在一个表中固定,但在另一个表中不固定,然后,您可以分析标题以计算出每列的宽度currently@nhahtdh这些列的宽度不是固定的,我希望它们是啊哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈?如果内容中没有出现
,则可以按
进行拆分。固定宽度是指每一列的宽度是固定的(不同的列可能有不同的宽度,但一列的所有行必须有相同的宽度)。所有的全局变量是什么?在这里使用它们有什么意义?@paile哇太棒了,然后我只需要像一个迷你铲运机一样构建,从本地文件中获取数据,然后导出到CSV?或者有什么好东西吗?所以你想要的输出是CSV?看看ruby std中的CSV类lib@paddle是的,因为第一次使用fastercsv,所以拍摄的帮助进行了调查,但它似乎被贬值了?