Python 正则表达式重复模式
我尝试使用正则表达式从下面的日志中捕获数据组。模式是Python 正则表达式重复模式,python,regex,Python,Regex,我尝试使用正则表达式从下面的日志中捕获数据组。模式是 <item> : <key> = <value> , <key> = <value>, ..., <key> = <value> 20141207,07:15:52,0,>>比率:出纳#=30, 数值=2.579,单位=比率,误差=N 20141207,07:15:52,0,>>比率: 出纳#=31,值=4.509,单位=比率,误差=N 20141207,0
<item> : <key> = <value> , <key> = <value>, ..., <key> = <value>
20141207,07:15:52,0,>>比率:出纳#=30,
数值=2.579,单位=比率,误差=N 20141207,07:15:52,0,>>比率:
出纳#=31,值=4.509,单位=比率,误差=N
20141207,07:15:52,0,>>比率:出纳#=32,
数值=3.735,单位=比率,误差=N 20141207,07:15:52,0,>>比率:
出纳员#=33,值=2.401,单位=比率,误差=N
20141207,07:15:52,0,>>客户:收银员#=30,价值=50,单位=计数
20141207,07:15:52,0,>>客户:收银员#=31,价值=6,单位=计数
20141207,07:15:52,0,>>客户:收银员#=32,价值=88,单位=计数
20141207,07:15:52,0,>>客户:收银员#=33,价值=33,单位=计数
显然,结果并非预期的那样。有人能给我一些提示吗?我最终使用python来翻译代码。谢谢。(?>)(\w+):|([\w+)+)\s*=\s*(\s+)(?:,|\s)
(?<=>>)(\w+):|([\w#]+)\s*=\s*(\S+?)(?:,|\s)
试试这个。抓拍。看演示
节点说明
--------------------------------------------------------------------------------
(?> '>>'
--------------------------------------------------------------------------------
)回头看
--------------------------------------------------------------------------------
(组和捕获到\1:
--------------------------------------------------------------------------------
\w+字字符(a-z,a-z,0-9,41;)(1或
更多次(与最多金额匹配)
(可能的)
--------------------------------------------------------------------------------
)结束\1
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
|或
--------------------------------------------------------------------------------
(分组并捕获到\2:
--------------------------------------------------------------------------------
[\w#]+任意字符:单词字符(a-z,
A-Z,0-9,,,“#”(1次或更多次)
(匹配尽可能多的金额)
--------------------------------------------------------------------------------
)结束\2
--------------------------------------------------------------------------------
\s*空格(\n、\r、\t、\f和“”)(0或
更多次(与最多金额匹配)
(可能的)
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s*空格(\n、\r、\t、\f和“”)(0或
更多次(与最多金额匹配)
(可能的)
--------------------------------------------------------------------------------
(分组并捕获到\3:
--------------------------------------------------------------------------------
\S+?非空白(除\n、\r、\t、\f、,
和“”)(1次或多次(与
尽可能少的金额)
--------------------------------------------------------------------------------
)结束\3
--------------------------------------------------------------------------------
(?:组,但不捕获:
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
|或
--------------------------------------------------------------------------------
\s空格(\n、\r、\t、\f和“”)
--------------------------------------------------------------------------------
)分组结束
您的文件是csv文件,因此您可以更轻松地使用csv模块:
import csv
f = open('data.txt', 'rb')
for row in csv.reader(f, delimiter=','):
if row:
item, key_and_val = row[3].split(':')
item = item[2:]
key, val = key_and_val.split('=')
print item
print ' {} => {}'.format(key.strip(), val.strip())
for key_and_val in row[4:]:
key, val = key_and_val.split('=')
print ' {} => {}'.format(key.strip(), val.strip())
--output:--
RATIO
casher# => 30
Value => 2.579
Units => ratio
Error => N
RATIO
casher# => 31
Value => 4.509
Units => ratio
Error => N
RATIO
casher# => 32
Value => 3.735
Units => ratio
Error => N
RATIO
casher# => 33
Value => 2.401
Units => ratio
Error => N
CUSTOMER
casher# => 30
Value => 50
Units => count
CUSTOMER
casher# => 31
Value => 6
Units => count
CUSTOMER
casher# => 32
Value => 88
Units => count
CUSTOMER
casher# => 33
Value => 33
Units => count
您的匹配模式也匹配key=value,即使“item:”不匹配
存在,是否有任何高级技术来排除那些key=value行
以下内容将跳过没有项目的行:
for row in csv.reader(f, delimiter=','):
if row:
if row[3].startswith('>>'): #Check if there is an item
item, key_and_val = row[3].split(': ')
item = item[2:]
key, val = key_and_val.split('=')
print item
print ' {} => {}'.format(key.strip(), val.strip())
for key_and_val in row[4:]:
key, val = key_and_val.split('=')
print ' {} => {}'.format(key.strip(), val.strip())
f.close()
我不确定问题是什么,但不能用一个正则表达式捕获所有的
key=value
对。不管怎么说,不要分组。我爱你@vks,我很虚弱,请你解释一下这个表达好吗?非常感谢你。我接受了ans。非常感谢。谢谢你的解释和教学。您的匹配模式也匹配key=value,即使“item:”不存在,是否有任何高级技术可以排除这些key=value行?无论如何,你的表达式已经足够了。谢谢@7stud,我会在将它们解压缩到csv文件后使用你的代码^^"
import csv
f = open('data.txt', 'rb')
for row in csv.reader(f, delimiter=','):
if row:
item, key_and_val = row[3].split(':')
item = item[2:]
key, val = key_and_val.split('=')
print item
print ' {} => {}'.format(key.strip(), val.strip())
for key_and_val in row[4:]:
key, val = key_and_val.split('=')
print ' {} => {}'.format(key.strip(), val.strip())
--output:--
RATIO
casher# => 30
Value => 2.579
Units => ratio
Error => N
RATIO
casher# => 31
Value => 4.509
Units => ratio
Error => N
RATIO
casher# => 32
Value => 3.735
Units => ratio
Error => N
RATIO
casher# => 33
Value => 2.401
Units => ratio
Error => N
CUSTOMER
casher# => 30
Value => 50
Units => count
CUSTOMER
casher# => 31
Value => 6
Units => count
CUSTOMER
casher# => 32
Value => 88
Units => count
CUSTOMER
casher# => 33
Value => 33
Units => count
for row in csv.reader(f, delimiter=','):
if row:
if row[3].startswith('>>'): #Check if there is an item
item, key_and_val = row[3].split(': ')
item = item[2:]
key, val = key_and_val.split('=')
print item
print ' {} => {}'.format(key.strip(), val.strip())
for key_and_val in row[4:]:
key, val = key_and_val.split('=')
print ' {} => {}'.format(key.strip(), val.strip())
f.close()