Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 正则表达式:如何从这个字符串中捕获多个文本?_Python_Regex - Fatal编程技术网

Python 正则表达式:如何从这个字符串中捕获多个文本?

Python 正则表达式:如何从这个字符串中捕获多个文本?,python,regex,Python,Regex,我有来自日志文件的文本,格式如下: {s:9:\\“批次号”\\'s:16:\\“4578123645712459\\”s:9:\\“全名”\\“s:8:\\”John 能源部s:6:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ 12:45:10:10:10:10:10 10 10:10 10:10

我有来自日志文件的文本,格式如下:

{s:9:\\“批次号”\\'s:16:\\“4578123645712459\\”s:9:\\“全名”\\“s:8:\\”John 能源部s:6:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ 12:45:10:10:10:10:10 10 10:10 10:10 10 10:1996-03-09)s:10:10:10 10 10 10:10 10 10:10 10 10:10 10 10:10 10 10 10:10 10 10 10 10 10:10 10 10 10 10 10 10:10 10 10 10 10 10:10 10:10:10“接触人数”s:11“接触人数”s:1“接触人数”s:s:1:1:1“接触人数”s:1“接触人数”s:1“接触人数”s:1“接触”s:1“接触”s:1“接触”s:1“接触”s:1“6”s:1”s:1:1“6”s:1”s:1“6”s:1“6”s:1“6”s:1“6”s:1“6”s:1“6“6“6”6”6“6“6“6“6“6”6”6“6”6“6“6输入学校日期N;s:10:专业\“;}

目前,我只能使用正则表达式提取batch_num:


(?我想我有一个给你。jpeg匹配组2不包括两个//,这就是为什么它们是粉红色的,它们是相同的匹配组:

输出:

['4578123645712459',
 'John Doe',
 'profile/2df0d9f29ab3ha65fed4847c8lb1o9sa.jpeg']

使用多个正则表达式

批号
(?通过将字符串转换为json优雅地提取值的解决方案

步骤1:清洁字符串

import re, itertools
str_text = text.replace('\\','').replace(';','').replace('""','"').replace(':"','"').replace('N',',""')
str_text = re.sub('s:\d+',',', str_text)
str_text = re.sub('^{,','{', str_text)
str_text = re.sub('}$',':""}', str_text)
str_text = re.sub('(,)', lambda m, c=itertools.count(): m.group() if next(c) % 2 else ':', str_text)
str_text
#'{"batch_num":"4578123645712459","full_name":"John Doe","mobile":"123456784512","address":"Redacted","create_time":"2017-09-10 12:45:01","gender":"1","birthdate":"1996-03-09","contact_num":"0","identity":"2","school":"","school_city_id":"","profile_pic":"profile/2df0d9f29ab3ha65fed4847c8lb1o9sa.jpeg","school_address":"","enter_school_date":"","speciality":""}'
步骤2:将字符串转换为json并提取

import json
str_json = json.loads(str_text)
print(str_json['batch_num'])
print(str_json['full_name'])
print(str_json['profile_pic'])
#4578123645712459
#John Doe
#profile/2df0d9f29ab3ha65fed4847c8lb1o9sa.jpeg

您可以使用和捕获组获取示例数据的所有3个匹配项:

\b(?:batch_num|full_name|profile_pic)\b\\\\\\";s:\d+:\\\\\\"([^"]+)\\\\\\"
部分地

  • \b(?:batch_num | full_name | profile_pic)\b
    在单词边界之间匹配一个选项
  • \s:\d+:
    匹配
    \\s:
    和1+位数
  • \\”
    匹配
    \\”
  • Capturegroup 1
    • [^”]+
      匹配1+倍字符,除了
  • 关闭组
  • \\”
    匹配
    \\”
|

比如说

import re

regex = r'\b(?:batch_num|full_name|profile_pic)\b\\\\\\";s:\d+:\\\\\\"([^"]+)\\\\\\"'
test_str = r'''{s:9:\\\"batch_num\\\";s:16:\\\"4578123645712459\\\";s:9:\\\"full_name\\\";s:8:\\\"John Doe\\\";s:6:\\\"mobile\\\";s:12:\\\"123456784512\\\";s:7:\\\"address\\\";s:5:\\\"Redacted"\\\";s:11:\\\"create_time\\\";s:19:\\\"2017-09-10 12:45:01\\\";s:6:\\\"gender\\\";s:1:\\\"1\\\";s:9:\\\"birthdate\\\";s:10:\\\"1996-03-09\\\";s:11:\\\"contact_num\\\";s:1:\\\"0\\\";s:8:\\\"identity\\\";s:1:\\\"2\\\";s:6:\\\"school\\\";N;s:14:\\\"school_city_id\\\";N;s:17:\\\"profile_pic\\\";s:43:\\\"profile\\/2df0d9f29ab3ha65fed4847c8lb1o9sa.jpeg\\\";s:14:\\\"school_address\\\";N;s:17:\\\"enter_school_date\\\";N;s:10:\\\"speciality\\\";}'''
matches = re.finditer(regex, test_str)
print(re.findall(regex, test_str))
输出

['4578123645712459', 'John Doe', 'profile\\\\/2df0d9f29ab3ha65fed4847c8lb1o9sa.jpeg']

批次数的长度是否恒定?要获得剖面图,模式为“剖面图V[A-Za-z0-9]+"。必须用一个正则表达式来完成吗?所有这些反斜杠真的是字符串的一部分吗?或者它们是印刷品的产物吗?如果全名有中间名和姓氏呢?例如:Patty Marsh Holt。我想涵盖这两个方面,因为在某些情况下,名称值只有名字或全名,您可以匹配任何类型的名称。
https://regex101.com/r/OBaOY0/3
很好的方法。只有一件事。不建议避免使用json作为变量名吗?
import re

regex = r'\b(?:batch_num|full_name|profile_pic)\b\\\\\\";s:\d+:\\\\\\"([^"]+)\\\\\\"'
test_str = r'''{s:9:\\\"batch_num\\\";s:16:\\\"4578123645712459\\\";s:9:\\\"full_name\\\";s:8:\\\"John Doe\\\";s:6:\\\"mobile\\\";s:12:\\\"123456784512\\\";s:7:\\\"address\\\";s:5:\\\"Redacted"\\\";s:11:\\\"create_time\\\";s:19:\\\"2017-09-10 12:45:01\\\";s:6:\\\"gender\\\";s:1:\\\"1\\\";s:9:\\\"birthdate\\\";s:10:\\\"1996-03-09\\\";s:11:\\\"contact_num\\\";s:1:\\\"0\\\";s:8:\\\"identity\\\";s:1:\\\"2\\\";s:6:\\\"school\\\";N;s:14:\\\"school_city_id\\\";N;s:17:\\\"profile_pic\\\";s:43:\\\"profile\\/2df0d9f29ab3ha65fed4847c8lb1o9sa.jpeg\\\";s:14:\\\"school_address\\\";N;s:17:\\\"enter_school_date\\\";N;s:10:\\\"speciality\\\";}'''
matches = re.finditer(regex, test_str)
print(re.findall(regex, test_str))
['4578123645712459', 'John Doe', 'profile\\\\/2df0d9f29ab3ha65fed4847c8lb1o9sa.jpeg']