Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 读取带引号的以空格分隔的CSV_Python_Regex_Csv - Fatal编程技术网

Python 读取带引号的以空格分隔的CSV

Python 读取带引号的以空格分隔的CSV,python,regex,csv,Python,Regex,Csv,我有一个shell命令,它返回如下行 timestamp=1511270820724797892 eventID=1511270820724797892 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419165205232 messageOffset=1 warnCSMInvalidSample=false warn

我有一个shell命令,它返回如下行

timestamp=1511270820724797892 eventID=1511270820724797892 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419165205232 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="nginx/1.10.1" Method="GET" RequestURI="/system/varlogmessages/" UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" WebSite="backup-server-new" Domain="backup-server-new" SrcIP="172.20.1.13" SrcPort="80" DstIP="172.18.4.181" DstPort="60065" 

timestamp=1511270820735795372 eventID=1511270820735795372 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419176202992 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="probe" Method="GET" RequestURI="/system/status" WebSite="probe609:8111" Domain="probe609:8111" SrcIP="172.20.2.109" SrcPort="8111" DstIP="172.18.4.96" DstPort="49714"
我试图将其解读为:

for i, row in enumerate(csv.reader(execute(cmd), delimiter=' ', skipinitialspace=True)):                                                                                                                           
    print i, len(row)                                                                                                                                                                                              
    if i > 10:                                                                                                                                                                                                     
        break                                                                                                                                                                                                      
但这并不正确,因为引号内的空格不会被忽略。例如,
channelID=“HTTP:Other”
由于
HTTP:
Other


解析这种类型的输入的正确方法是什么?

这有点老套,但我觉得这里的规则与解析HTML标记中的属性的规则类似

from HTMLParser import HTMLParser
#from html.parser import HTMLParser     # Python 3

# Create a parser that simply dumps the tag attributes to an instance variable    
class AttrParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        self.attrs = attrs

# Our input
to_parse = 'channelID="HTTP: Other" channelDir=false classID="class-default"'

# Create a parser instance
parser = AttrParser()
# Wrap our input text inside a dummy HTML tag and feed it into the parser
parser.feed('<NOTAG {}>'.format(to_parse))
# Read the results
print(parser.attrs)

这是一种黑客行为,但让我吃惊的是,这里的规则与解析HTML标记中的属性的规则相似

from HTMLParser import HTMLParser
#from html.parser import HTMLParser     # Python 3

# Create a parser that simply dumps the tag attributes to an instance variable    
class AttrParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        self.attrs = attrs

# Our input
to_parse = 'channelID="HTTP: Other" channelDir=false classID="class-default"'

# Create a parser instance
parser = AttrParser()
# Wrap our input text inside a dummy HTML tag and feed it into the parser
parser.feed('<NOTAG {}>'.format(to_parse))
# Read the results
print(parser.attrs)

正则表达式查找关键点,然后使用关键点进行偏移

lines = [
    '''timestamp=1511270820724797892 eventID=1511270820724797892 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419165205232 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="nginx/1.10.1" Method="GET" RequestURI="/system/varlogmessages/" UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" WebSite="backup-server-new" Domain="backup-server-new" SrcIP="172.20.1.13" SrcPort="80" DstIP="172.18.4.181" DstPort="60065"''',
    '''timestamp=1511270820735795372 eventID=1511270820735795372 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419176202992 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="probe" Method="GET" RequestURI="/system/status" WebSite="probe609:8111" Domain="probe609:8111" SrcIP="172.20.2.109" SrcPort="8111" DstIP="172.18.4.96" DstPort="49714"''',
]
results = []
for line in lines:
    result = {}
    keys = re.findall(r'\w+=', line)
    for idx, k in enumerate(keys):
        start = line.find(k)
        if idx + 1 >= len(keys):
            end = len(line)
        else:
            end = line.find(keys[idx+1])
        key, value = line[start:end].strip().split("=")
        if isinstance(value, str):
            if value.lower() == "true":
                value = True
            elif value.lower() == "false":
                value = False
            elif value.isdigit():
                value = int(value)
            else:
                value = value.strip('"')
        result[key] = value
    results.append(result)

正则表达式查找关键点,然后使用关键点进行偏移

lines = [
    '''timestamp=1511270820724797892 eventID=1511270820724797892 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419165205232 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="nginx/1.10.1" Method="GET" RequestURI="/system/varlogmessages/" UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" WebSite="backup-server-new" Domain="backup-server-new" SrcIP="172.20.1.13" SrcPort="80" DstIP="172.18.4.181" DstPort="60065"''',
    '''timestamp=1511270820735795372 eventID=1511270820735795372 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419176202992 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="probe" Method="GET" RequestURI="/system/status" WebSite="probe609:8111" Domain="probe609:8111" SrcIP="172.20.2.109" SrcPort="8111" DstIP="172.18.4.96" DstPort="49714"''',
]
results = []
for line in lines:
    result = {}
    keys = re.findall(r'\w+=', line)
    for idx, k in enumerate(keys):
        start = line.find(k)
        if idx + 1 >= len(keys):
            end = len(line)
        else:
            end = line.find(keys[idx+1])
        key, value = line[start:end].strip().split("=")
        if isinstance(value, str):
            if value.lower() == "true":
                value = True
            elif value.lower() == "false":
                value = False
            elif value.isdigit():
                value = int(value)
            else:
                value = value.strip('"')
        result[key] = value
    results.append(result)

您是否尝试过
quotechar='”
?我当前的解决方案是:
row={k:v.strip(''”)for k,v in re.findall(r'(\S+)=(“*?”\S+”,row)}
我刚刚意识到当前的解决方案在某些情况下不起作用cases@Donbeo哪一个?上面评论中的那个你试过了吗“?我当前的解决方案是:
row={k:v.strip(““”)代表k,v在re.findall(r'(\S+)=(“*?”|\S+”,row)}
我刚刚意识到当前的解决方案在某些情况下不起作用cases@Donbeo哪一个?上面评论中的那一个似乎有效。唯一的缺点是它使所有的小写字母都有效。唯一的缺点是它使所有的小写字母都有效