Parsing 在Lua中用LPeg解析出多行_Parsing_Logging_Lua_Text Parsing_Lpeg

Parsing 在Lua中用LPeg解析出多行

parsing logging lua

Parsing 在Lua中用LPeg解析出多行,parsing,logging,lua,text-parsing,lpeg,Parsing,Logging,Lua,Text Parsing,Lpeg,我有一些多行的文本文件，如块 2011/01/01 13:13:13,<AB>, Some Certain Text,=, [ certain text [ 0: 0 0 0 0 0 0 0 0 8: 0 0 0 0 0 0 0 0 16: 0 0 0 9 343 3938 9433 8756 24: 6270 4

我有一些多行的文本文件，如块

2011/01/01 13:13:13,<AB>, Some Certain Text,=,
[    
certain text
         [
                  0: 0 0 0 0 0 0 0 0 
                  8: 0 0 0 0 0 0 0 0 
                 16: 0 0 0 9 343 3938 9433 8756 
                 24: 6270 4472 3182 2503 1768 1140 836 496 
                 32: 326 273 349 269 144 121 94 82 
                 40: 64 80 66 59 56 47 50 46 
                 48: 64 35 42 53 42 40 41 34 
                 56: 35 41 39 39 47 30 30 39 
                 Total count: 12345
        ]
    certain text
]
some text
2011/01/01 14:14:14,<AB>, Some Certain Text,=,
[
 certain text
   [
              0: 0 0 0 0 0 0 0 0 
              8: 0 0 0 0 0 0 0 0 
             16: 0 0 0 4 212 3079 8890 8941 
             24: 6177 4359 3625 2420 1639 974 594 438 
             32: 323 286 318 296 206 132 96 85 
             40: 65 73 62 53 47 55 49 52 
             48: 29 44 44 41 43 36 50 36 
             56: 40 30 29 40 35 30 25 31 
             64: 47 31 25 29 24 30 35 31 
             72: 28 31 17 37 35 30 20 33 
             80: 28 20 37 25 21 23 25 36 
             88: 27 35 22 23 15 24 34 28
             Total count: 123456 
    ]
    certain text
some text
]

2011/01/01 13:13:13，，某些文本，=，
[    
某些文本
[
0: 0 0 0 0 0 0 0 0 
8: 0 0 0 0 0 0 0 0 
16: 0 0 0 9 343 3938 9433 8756 
24: 6270 4472 3182 2503 1768 1140 836 496 
32: 326 273 349 269 144 121 94 82 
40: 64 80 66 59 56 47 50 46 
48: 64 35 42 53 42 40 41 34 
56: 35 41 39 39 47 30 30 39 
总数：12345
]
某些文本
]
一些文本
2011/01/01 14:14:14，，某些文本，=，
[
某些文本
[
0: 0 0 0 0 0 0 0 0 
8: 0 0 0 0 0 0 0 0 
16: 0 0 0 4 212 3079 8890 8941 
24: 6177 4359 3625 2420 1639 974 594 438 
32: 323 286 318 296 206 132 96 85 
40: 65 73 62 53 47 55 49 52 
48: 29 44 44 41 43 36 50 36 
56: 40 30 29 40 35 30 25 31 
64: 47 31 25 29 24 30 35 31 
72: 28 31 17 37 35 30 20 33 
80: 28 20 37 25 21 23 25 36 
88: 27 35 22 23 15 24 34 28
总数：123456
]
某些文本
一些文本
]

这些可变长度块存在于文本之间。我想在：之后读出所有数字，并将它们保存在单独的数组中。在这种情况下，将有两个阵列：

数组1={0 0 0 0 0 0 0 0 0 0 0 0 9 343 3938 9433 8756 6270 4472 3182 2503 1768 1140 836 496 326 273 349 269 144 121 94 82 64 80 66 59 56 47 50 46 64 42 42 40 41 34 39 39 39 30 39 12345}

数组2={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

我发现lpeg可能是一种轻量级的实现方法。但我对PEGs和LPeg完全陌生。请帮忙

我的纯Lua字符串库解决方案如下：

local bracket_pattern = "%b[]" --pattern for getting into brackets
local number_pattern = "(%d+)%s+" --pattern for parsing numbers
local output_array = {} --output 2-dimensional array
local i = 1
local j = 1
local tmp_number
local tmp_sub_str

for tmp_sub_str in file_content:gmatch(bracket_pattern) do --iterating through [string]
    table.insert(output_array, i, {}) --adding new [string] group
    for tmp_number in tmp_sub_str:gmatch(number_pattern) do --iterating through numberWHITESPACE
        table.insert(output_array[i], tonumber(tmp_number)) --adding [string] group element (number)
    end
    i = i + 1
end

编辑：这也适用于升级的文件格式。

请尝试此代码，它不使用LPEG:

-- assume T contains the text
local a={}
local i=0
for b in T:gmatch("%b[]") do
        b=b:gsub("%d+:","")
        i=i+1
        local t={}
        local j=0
        for n in b:gmatch("%d+") do
                j=j+1; t[j]=tonumber(n)
        end
        a[i]=t
end

LPEG版本：

local lpeg            = require "lpeg"
local lpegmatch       = lpeg.match
local C, Ct, P, R, S  = lpeg.C, lpeg.Ct, lpeg.P, lpeg.R, lpeg.S
local Cg              = lpeg.Cg

local data_to_arrays

do
  local colon    = P":"
  local lbrak    = P"["
  local rbrak    = P"]"
  local digits   = R"09"^1
  local eol      = P"\n\r" + P"\r\n" + P"\n" + P"\r"
  local ws       = S" \t\v"
  local optws    = ws^0
  local getnum   = C(digits) / tonumber * optws
  local start    = lbrak * optws * eol
  local stop     = optws * rbrak
  local line     = optws * digits * colon * optws
                 * getnum * getnum * getnum * getnum
                 * getnum * getnum * getnum * getnum
                 * eol
  local count    = optws * P"Total count:" * optws * getnum * eol
  local inner    = Ct(line^1 * count^-1)
--local inner    = Ct(line^1 * Cg(count, "count")^-1)
  local array    = start * inner * stop
  local extract  = Ct((array + 1)^0)

  data_to_arrays = function (data)
    return lpegmatch (extract, data)
  end
end

实际上，只有当上正好有8个整数时，这才有效数据块的每一行。这可能是一个诅咒，也可能是一个错误，这取决于你的输入格式有多好祝福

；-）
和一个测试文件：
data = [[
some text
[    
some text
         [
                  0: 0 0 0 0 0 0 0 0 
                  8: 0 0 0 0 0 0 0 0 
                 16: 0 0 0 9 343 3938 9433 8756 
                 24: 6270 4472 3182 2503 1768 1140 836 496 
                 32: 326 273 349 269 144 121 94 82 
                 40: 64 80 66 59 56 47 50 46 
                 48: 64 35 42 53 42 40 41 34 
                 56: 35 41 39 39 47 30 30 39 
                 Total count: 12345
        ]
    some text
]
some text
[
 some text
   [
              0: 0 0 0 0 0 0 0 0 
              8: 0 0 0 0 0 0 0 0 
             16: 0 0 0 4 212 3079 8890 8941 
             24: 6177 4359 3625 2420 1639 974 594 438 
             32: 323 286 318 296 206 132 96 85 
             40: 65 73 62 53 47 55 49 52 
             48: 29 44 44 41 43 36 50 36 
             56: 40 30 29 40 35 30 25 31 
             64: 47 31 25 29 24 30 35 31 
             72: 28 31 17 37 35 30 20 33 
             80: 28 20 37 25 21 23 25 36 
             88: 27 35 22 23 15 24 34 28 
    ]
    some text
some text
]
]]

local arrays = data_to_arrays (data)

for n = 1, #arrays do
  local ar   = arrays[n]
  local size = #ar
  io.write (string.format ("[%d] = { --[[size: %d items]]\n  ", n, size))
  for i = 1, size do
    io.write (string.format ("%d,%s", ar[i], (i % 5 == 0) and "\n  " or " "))
  end
  if ar.count ~= nil then
    io.write (string.format ("\n  [\"count\"] = %d,", ar.count))
  end
  io.write (string.format ("\n}\n"))
end

phg已经为您的问题提供了一个很好的LPeg解决方案，但这里还有一个使用LPeg的re模块的解决方案。语法更接近于BNF，并且使用的运算符更像“regex”，因此这个解决方案可能更容易找到
re = require 're'

function dump(t)
  io.write '{'
  for _, v in ipairs(t) do
    io.write(v, ',')
  end
  io.write '}\n'
end

local textformat = [[
  data_in   <-  block+
  block     <-  text '[' block_content ']'
  block_content <- {| data_arr |} / (block / text)*
  data_arr  <- (text ':' nums whitesp)+
  text      <- whitesp [%w' ']+ whitesp
  nums      <- (' '+ {digits} -> tonumber)+
  digits    <- %d+
  whitesp   <- %s*
]]
local parser = re.compile(textformat, {tonumber = tonumber})
local arr1, arr2 = parser:match(data)

dump(arr1)
dump(arr2)

re=需要“re”
函数转储（t）
io.写“{”
对于ipairs（t）do中的v
io.写入（v，，）
结束
io.write'}\n'
结束
本地文本格式=[[
中的数据\u我知道这是一个迟到的回复，但定义的语法要少得多。以下模式找到开头[
，并捕获每个没有后缀的数字：
，直到到达结尾]
。然后重复整个块
，直到没有匹配为止
local patt = re.compile([=[
    data    <- {| block |}+
    block   <- ('[' ((%d+ ':') / { %d+ } -> int / [^]%d]+)+ ']') / ([^[]+ block)
]=], { int = tonumber })

您好@lhf，实际上，文本文件是[some text[data array]some text]；%b[]将在[]之外捕获。如何在数据数组内部捕获？%b[]很好用…但我真的很想学习其他情况下的lpeg ~~~`Hi@phg，是的，输入数组每行正好是8个整数。但是这个文本文件超过100 MB。我如何读取文件？我尝试了本地断言（io.open（filepath））。它无法将文件作为字符串读取。我应该将整个文件作为字符串读取吗？f=io.open（filename，“r”）如果f-then data=f:read“*all”f:close（）end
会将所有内容读取到内存中。如果这不起作用，您可能需要对文件进行分块处理。Hi@phg是的，这可以很好地读取文件内容。但在文本文件中，真正的场景是[一些文本[数据数组]一些文本]
。你的lpeg不工作。我无法修改你的lpeg以匹配此条件。你能帮我吗？@Decula你更新的示例在这里解析得很好。你能发布不工作的部分吗？我只是意识到我的数据数组有两种不同的类型。其中一种有总计数……我添加了'local lower=R“az”^1 local upper=R“az”^1 local words=lower+upper`和changelocal line=optws*digits*colon*optws*getnum*getnum*getnum*getnum*getnum*getnum*getnum*eol*letter。它仍然不起作用。我只想在数组的末尾添加total count，用BNF样式的LPeg处理时间戳…但我没能实现它@Decula注意：上面的语法只是如何解析输入的一个粗略的近似值，基于对原始问题输入的观察。既然你更清楚解析格式的范围，你应该改进语法以更好地匹配它。嗨@greatwolf，我仍然在与BNF语法作斗争……实际上我们有很多奇怪的日志，要处理的图像..而且我们已经有一个谷歌作为Milpitas的承包商。我们真的需要一个像你这样的C、C++和Lua专家。我发现你回答了我的大部分问题。如果你感兴趣，请给我一个
local a = { patt:match[=[ ... ]=] }