如何正确拆分以下字符串？-python_Python_Parsing

如何正确拆分以下字符串？-python

python parsing

如何正确拆分以下字符串？-python,python,parsing,Python,Parsing,我有以下要分析的文件： Total Virtual Clients : 10 (1 Machines) Current Connections : 10 Total Elapsed Time : 50 Secs (0 Hrs,0 Mins,50 Secs) Total Requests : 337827 ( 6

我有以下要分析的文件：

Total Virtual Clients       :             10      (1 Machines)
Current Connections         :             10
Total Elapsed Time          :             50 Secs (0 Hrs,0 Mins,50 Secs)

Total Requests              :         337827      (    6687/Sec)
Total Responses             :         337830      (    6687/Sec)
Total Bytes                 :      990388848      (   20571 KB/Sec)
Total Success Connections   :           3346      (      66/Sec)
Total Connect Errors        :              0      (       0/Sec)
Total Socket Errors         :              0      (       0/Sec)
Total I/O Errors            :              0      (       0/Sec)
Total 200 OK                :          33864      (     718/Sec)
Total 30X Redirect          :              0      (       0/Sec)
Total 304 Not Modified      :              0      (       0/Sec)
Total 404 Not Found         :         303966      (    5969/Sec)
Total 500 Server Error      :              0      (       0/Sec)
Total Bad Status            :         303966      (    5969/Sec)

因此，我有解析算法来搜索文件中的这些值，但是，当我这样做时：

for data in temp:
     line = data.strip().split()
     print line

其中

temp

是我的临时缓冲区，它包含这些值，我得到：

我想：

['Total I/O Errors', '0', '0']
['Total 200 OK', '69807', '864']
['Total 30X Redirect', '0', '0']

等等。我怎样才能做到这一点呢？

您可以使用以下方法：

import re
rex = re.compile('([^:]+\S)\s*:\s*(\d+)\s*\(\s*(\d+)/Sec\)')
for line in temp:
    match = rex.match(line)
    if match:
        print match.groups()

这将给你：

['Total Requests', '337827', '6687']
['Total Responses', '337830', '6687']
['Total Success Connections', '3346', '66']
['Total Connect Errors', '0', '0']
['Total Socket Errors', '0', '0']
['Total I/O Errors', '0', '0']
['Total 200 OK', '33864', '718']
['Total 30X Redirect', '0', '0']
['Total 304 Not Modified', '0', '0']
['Total 404 Not Found', '303966', '5969']
['Total 500 Server Error', '0', '0']
['Total Bad Status', '303966', '5969']

请注意，将只匹配与“标题：编号（编号/秒）”相对应的行。您也可以调整表达式以匹配其他行。

您可以使用如下表达式：

import re
rex = re.compile('([^:]+\S)\s*:\s*(\d+)\s*\(\s*(\d+)/Sec\)')
for line in temp:
    match = rex.match(line)
    if match:
        print match.groups()

这将给你：

['Total Requests', '337827', '6687']
['Total Responses', '337830', '6687']
['Total Success Connections', '3346', '66']
['Total Connect Errors', '0', '0']
['Total Socket Errors', '0', '0']
['Total I/O Errors', '0', '0']
['Total 200 OK', '33864', '718']
['Total 30X Redirect', '0', '0']
['Total 304 Not Modified', '0', '0']
['Total 404 Not Found', '303966', '5969']
['Total 500 Server Error', '0', '0']
['Total Bad Status', '303966', '5969']

请注意，将只匹配与“标题：编号（编号/秒）”相对应的行。您也可以调整表达式以匹配其他行。

您需要根据格式中的其他分隔符进行拆分，而不是根据空格进行拆分，它可能如下所示：

for data in temp:
     first, rest = data.split(':')
     second, rest = rest.split('(')
     third, rest = rest.split(')')
     print [x.strip() for x in (first, second, third)]

您需要根据格式中的其他分隔符进行拆分，而不是根据空格进行拆分，它可能如下所示：

for data in temp:
     first, rest = data.split(':')
     second, rest = rest.split('(')
     third, rest = rest.split(')')
     print [x.strip() for x in (first, second, third)]

正则表达式对于解析数据来说是多余的，但它是表示固定长度字段的一种方便方法。比如说

for data in temp:
    first, second, third = re.match("(.{28}):(.{21})(.*)", data).groups()
    ...

这意味着第一个字段是28个字符。跳过“：”，接下来的21个字符是第二个字段，其余的是第三个字段

正则表达式对于解析数据来说是一种过分的方法，但它是表示固定长度字段的一种方便方法。比如说

for data in temp:
    first, second, third = re.match("(.{28}):(.{21})(.*)", data).groups()
    ...

这意味着第一个字段是28个字符。跳过“：”，接下来的21个字符是第二个字段，其余的是第三个字段

太酷了！非常感谢。（几分钟后就会接受你的回答）太酷了！非常感谢。（几分钟后将接受您的回答）数据似乎是固定字段宽度，因此最好使用固定切片数据似乎是固定字段宽度，因此最好使用固定切片