Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python-如何使用正则表达式来分隔输入_Python_Regex_Input - Fatal编程技术网

python-如何使用正则表达式来分隔输入

python-如何使用正则表达式来分隔输入,python,regex,input,Python,Regex,Input,假设我直接从代码中的HTML文档中读取。每一行看起来如下所示: <TD>field1</TD><TD><A HREF="http://sample.url.com">field2</TD><TD><EM>field3</EM></TD> 这是正则表达式可以实现的吗?如果您想使用正则表达式,这里是您的: import re a = "<TD>field1</TD>

假设我直接从代码中的HTML文档中读取。每一行看起来如下所示:

<TD>field1</TD><TD><A HREF="http://sample.url.com">field2</TD><TD><EM>field3</EM></TD>

这是正则表达式可以实现的吗?

如果您想使用正则表达式,这里是您的:

import re

a = "<TD>field1</TD><TD><A HREF=\"http://sample.url.com\">field2</TD><TD><EM>field3</EM></TD>"
REGEX = r'<TD>(\w+)</TD><TD><A HREF="([A-Za-z/:.]+)">(\w+)</TD><TD><EM>(\w+)</EM></TD>'
print(re.findall(REGEX, a))
>>>> [('field1', 'http://sample.url.com', 'field2', 'field3')]
重新导入
a=“field1field2field3”
REGEX=r'(\w+)(\w+)(\w+)
打印(关于findall(正则表达式,a))
>>>>[('field1','http://sample.url.com“,”字段2“,”字段3“)]

您可以执行以下操作:

import re
pattern = re.compile('<TD>(?P<field1>.*?)</TD><TD><A HREF="(?P<url>.*?)">(?P<field2>.*?)</TD><TD><EM>(?P<field3>.*?)</EM></TD>')

html = '<TD>field1</TD><TD><A HREF="http://sample.url.com">field2</TD><TD><EM>field3</EM></TD>'
match = pattern.search(html)
if match:
    field1, url, field2, field3 = match.groups()
    # or you can do field1 = match.group('field1') and so on....
重新导入
pattern=re.compile(“(?P.*?(?P.*?)(?P.*?)”)
html='field1field2field3'
match=pattern.search(html)
如果匹配:
field1,url,field2,field3=match.groups()
#或者您可以执行field1=match.group('field1')等操作。。。。

我建议您使用以下简单解决方案,返回字段并选择在列表中分析url:

import re

s = "<TD>field1</TD><TD><A HREF=\"http://sample.url.com\">field2</TD><TD><EM>field3</EM></TD>"

# If you want to extract the URL
myPattern = re.compile(r'<TD>(\w+)</TD><TD><A HREF=(.+)>(\w+)</TD><TD><EM>(\w+)</EM></TD>')
listOfMatches = list(myPattern.findall(s)[0])
print(listOfMatches) # ['field1', '"http://sample.url.com"', 'field2', 'field3']

# If you don't want to extract the URL
myPattern = re.compile(r'<TD>(\w+)</TD><TD><A HREF=.+>(\w+)</TD><TD><EM>(\w+)</EM></TD>')
listOfMatches = list(myPattern.findall(s)[0])
print(listOfMatches) # ['field1', 'field2', 'field3']
重新导入
s=“field1field2field3”
#如果要提取URL
myPattern=re.compile(r'(\w+)(\w+)(\w+))
listOfMatches=list(myPattern.findall[0])
打印(匹配列表)#['field1','”http://sample.url.com“'、‘字段2’、‘字段3’]
#如果不想提取URL
myPattern=re.compile(r'(\w+)(\w+)(\w+))
listOfMatches=list(myPattern.findall[0])
打印(匹配列表)#['field1','field2','field3']

不要将正则表达式用于HTML,尽可能地使用HTML解析器,但实际上,使用HTML解析器将使您的生活更轻松(代码也更易于维护)。
import re

s = "<TD>field1</TD><TD><A HREF=\"http://sample.url.com\">field2</TD><TD><EM>field3</EM></TD>"

# If you want to extract the URL
myPattern = re.compile(r'<TD>(\w+)</TD><TD><A HREF=(.+)>(\w+)</TD><TD><EM>(\w+)</EM></TD>')
listOfMatches = list(myPattern.findall(s)[0])
print(listOfMatches) # ['field1', '"http://sample.url.com"', 'field2', 'field3']

# If you don't want to extract the URL
myPattern = re.compile(r'<TD>(\w+)</TD><TD><A HREF=.+>(\w+)</TD><TD><EM>(\w+)</EM></TD>')
listOfMatches = list(myPattern.findall(s)[0])
print(listOfMatches) # ['field1', 'field2', 'field3']