Python 使用正则表达式查找文本并返回列表_Python_Regex_Re

Python 使用正则表达式查找文本并返回列表

python regex

Python 使用正则表达式查找文本并返回列表,python,regex,re,Python,Regex,Re,我试图用pythonregular expressionre从文本文件（.txt）中创建一个列表。正文的某些部分如下所示 146.204.224.152-feest6811[21/Jun/2019:15:45:24-0700]“发布/激励HTTP/1.1”3024622\n197.109.77.178--kertzmann3129[21/Jun/2019:15:45:25-0700]“删除/virtual/solutions/target/web+服务HTTP/2.0”203 26554 我可以

我试图用

python

regular expression

re

从文本文件（

.txt

）中创建一个列表。正文的某些部分如下所示

146.204.224.152-feest6811[21/Jun/2019:15:45:24-0700]“发布/激励HTTP/1.1”3024622\n197.109.77.178--kertzmann3129[21/Jun/2019:15:45:25-0700]“删除/virtual/solutions/target/web+服务HTTP/2.0”203 26554

我可以知道如何将文本以列表格式正则化为：

{
"host_name": "146.204.224.152", 
"name": "feest6811", 
"time": "21/Jun/2019:15:45:24 -0700", 
"method": "POST /incentivize HTTP/1.1"
},
..
..
..

我尝试使用此模式来正则表达式，因为我看到了使用此模式的示例：

pattern="(?P<host_name>.*)(\ -\ )(?P<name>\w*)"

for item in re.finditer(pattern,'Text_data',re.VERBOSE):
    print(item.groupdict())

pattern=“（？P.*）（\-\）（？P\w*）”
对于re.finditer中的项（模式，'Text_data'，re.VERBOSE）：
打印（item.groupdict（））

本文中对regex的任何建议。

使用

（？m）^（？P[\d.]+）-（？P\w+）\[（？P[^][]+）]”（？P[^“]+）”

看

解释

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?P<host_name>           group and capture to \k<host_name>:
--------------------------------------------------------------------------------
    [\d.]+                   any character of: digits (0-9), '.' (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<host_name>
--------------------------------------------------------------------------------
   -                       ' - '
--------------------------------------------------------------------------------
  (?P<name>                 group and capture to \k<name>:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<name>
--------------------------------------------------------------------------------
                           ' '
--------------------------------------------------------------------------------
  \[                       '['
--------------------------------------------------------------------------------
  (?P<time>                group and capture to \k<time>:
--------------------------------------------------------------------------------
    [^][]+                   any character except: ']', '[' (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<time>
--------------------------------------------------------------------------------
  ] "                      '] "'
--------------------------------------------------------------------------------
  (?P<method>                        group and capture to \k<method>:
--------------------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<method>
--------------------------------------------------------------------------------
  "                        '"'

--------------------------------------------------------------------------------
^字符串的开头
--------------------------------------------------------------------------------
（？P组和捕获到\k：
--------------------------------------------------------------------------------
[\d.]+以下任意字符：数字（0-9），“.”（1）
或更多次（与最大金额匹配）
（可能的）
--------------------------------------------------------------------------------
)结束\k
--------------------------------------------------------------------------------
-                       ' - '
--------------------------------------------------------------------------------
（？P组和捕获到\k：
--------------------------------------------------------------------------------
\w+字字符（a-z，a-z，0-9，41;）（1或
更多次（与最多金额匹配）
（可能的）
--------------------------------------------------------------------------------
)结束\k
--------------------------------------------------------------------------------
' '
--------------------------------------------------------------------------------
\[                       '['
--------------------------------------------------------------------------------
（？P组和捕获到\k：
--------------------------------------------------------------------------------
[^][]+除以下字符外的任何字符：']'、'['（1或
更多次（与最多金额匹配）
（可能的）
--------------------------------------------------------------------------------
)结束\k
--------------------------------------------------------------------------------
] "                      '] "'
--------------------------------------------------------------------------------
（？P组和捕获到\k：
--------------------------------------------------------------------------------
[^“]+除：“”（1个或多个）以外的任何字符
次数（与最大金额匹配）
（可能的）
--------------------------------------------------------------------------------
)结束\k
--------------------------------------------------------------------------------
"                        '"'

最好为此创建一个解析器，然后使用regex，因为这看起来像一个具有适当结构的web日志当您说“list format，“你能举个例子吗？您希望只包含字典示例的键或值，还是两者都包含？@gmdev抱歉使用错误。我提到的是，我希望从字符串返回字典。