正则表达式在Python 3的列表中查找字符串_Python_Regex

正则表达式在Python 3的列表中查找字符串

python regex

正则表达式在Python 3的列表中查找字符串,python,regex,Python,Regex,如何从列表中获取base.php？id=5314 import urllib.parse import urllib.request from bs4 import BeautifulSoup url = 'http://www.fansubs.ru/search.php' values = {'Content-Type:' : 'application/x-www-form-urlencoded', 'query' : 'Boku dake ga Inai Machi' } d =

如何从列表中获取base.php？id=5314

import urllib.parse
import urllib.request
from bs4 import BeautifulSoup
url = 'http://www.fansubs.ru/search.php'
values = {'Content-Type:' : 'application/x-www-form-urlencoded',
      'query' : 'Boku dake ga Inai Machi' }
d = {}
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()
soup = BeautifulSoup(the_page, 'html.parser')
for link in soup.findAll('a'):
    d[link] = (link.get('href'))
x = (list(d.values()))

您可以将内置函数与

regex

结合使用。例如：

import re

# ... your code here ...

x = (list(d.values()))
test = re.compile("base\.php\?id=", re.IGNORECASE)
results = filter(test.search, x)

基于注释更新：您可以将筛选结果转换为列表：

print(list(results))

具有以下硬编码列表的示例结果：

x = ["asd/asd/asd.py", "asd/asd/base.php?id=5314",
     "something/else/here/base.php?id=666"]

你会得到：

['asd/asd/base.php?id=5314', 'something/else/here/base.php?id=666']

这个答案基于一个关于过滤列表的页面。它没有更多的实现来做同样的事情，这可能更适合您。希望它有帮助

您可以将正则表达式直接传递给

查找所有

，它将根据href和

href=re.compile（…

）为您进行筛选：

import re

with urllib.request.urlopen(req) as response:
    the_page = response.read()
    soup = BeautifulSoup(the_page, 'html.parser')
    d = {link:link["href"] for link in soup.find_all('a', href=re.compile(re.escape('base.php?id='))}

find_all将只返回a href属性与regex匹配的a标记

这给了你：

In [21]:d = {link:link["href"] for link in soup.findAll('a', href=re.compile(re.escape('base.php?id='))}

In [22]: d
Out[22]: {<a href="base.php?id=5314">Boku dake ga Inai Machi <small>(ТВ)</small></a>: 'base.php?id=5314'}

[21]中的

d={link:link[“href”]表示soup.findAll中的链接（'a'，href=re.compile（re.escape（'base.php？id='））}
In[22]：d
Out[22]：{：‘base.php？id=5314'}

考虑到您似乎只在寻找一个链接，那么仅使用“查找”：

In [36]: link = soup.find('a', href=re.compile(re.escape('base.php?id='))

In [37]: link
Out[37]: <a href="base.php?id=5314">Boku dake ga Inai Machi <small>(ТВ)</small></a>

In [38]: link["href"]
Out[38]: 'base.php?id=5314'

[36]中的

：link=soup.find（'a'，href=re.compile（re.escape（'base.php？id='））
在[37]中：链接
出[37]：
在[38]：链接[“href”]
Out[38]：'base.php？id=5314'

你的问题到底是什么？据我所知，如果他只是使用正则表达式寻找精确匹配，他会查看页面中的所有

s，并希望过滤特定的

href

值…（存储为

中的列表），这是一种过分的做法。只需使用：

过滤器（lambda y:'base.php？id='in y.lower（），x）

。此外，当使用正则表达式执行精确匹配时，您应该使用

re.escape

来转义内容，而不是自己转义，因此

re.compile（re.escape（'base.php？id='）、re.IGNORECASE）

等。这对于用户提供的输入更为重要。