Python 3.x 从HTML、python、web抓取接收特定数据
我希望从这样的HTML文件中接收特定数据: 我想得到的是最后一条信息,即来自strong部分的“Sonic le film”,此时我的代码如下所示:Python 3.x 从HTML、python、web抓取接收特定数据,python-3.x,python-requests,Python 3.x,Python Requests,我希望从这样的HTML文件中接收特定数据: 我想得到的是最后一条信息,即来自strong部分的“Sonic le film”,此时我的代码如下所示: import requests from bs4 import BeautifulSoup import re url = 'https://www.villedieu-cinema.fr/' resp = requests.get(url) soup = BeautifulSoup(resp.txt, 'lxml') table = soup
import requests
from bs4 import BeautifulSoup
import re
url = 'https://www.villedieu-cinema.fr/'
resp = requests.get(url)
soup = BeautifulSoup(resp.txt, 'lxml')
table = soup.find('table', {'class': ' mceEditable'})
contents = table.find_all('tr')
你知道如何获取这些数据吗?非常感谢你给我的建议 带有表格的URL是
https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/
您可以使用此示例从表中获取信息:
import requests
from bs4 import BeautifulSoup
url = 'https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
movies = []
for tr in soup.select('.mceEditable tr'):
tds = [' '.join(td.get_text(strip=True, separator=' ').split()) for td in tr.select('td')]
if tds[0]:
movies.append(tds[0])
print(tds)
# print all movies:
print()
print(*movies, sep='\n')
印刷品:
['', 'Mer 8', 'Jeu 9', 'Ven 10', 'Sam 11', 'Dim 12', 'Lun 13', 'Mar 14']
['Sonic le film', '16h', '18h', '16h', '', '', '', '']
['Les parfums', '21h', '', '21h', '18h', '18h', '21h', '']
['Radioactive', '', '21h VO', '', '21h VF', '', '', '21h VF']
Sonic le film
Les parfums
Radioactive
带有该表的URL为
https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/
您可以使用此示例从表中获取信息:
import requests
from bs4 import BeautifulSoup
url = 'https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
movies = []
for tr in soup.select('.mceEditable tr'):
tds = [' '.join(td.get_text(strip=True, separator=' ').split()) for td in tr.select('td')]
if tds[0]:
movies.append(tds[0])
print(tds)
# print all movies:
print()
print(*movies, sep='\n')
印刷品:
['', 'Mer 8', 'Jeu 9', 'Ven 10', 'Sam 11', 'Dim 12', 'Lun 13', 'Mar 14']
['Sonic le film', '16h', '18h', '16h', '', '', '', '']
['Les parfums', '21h', '', '21h', '18h', '18h', '21h', '']
['Radioactive', '', '21h VO', '', '21h VF', '', '', '21h VF']
Sonic le film
Les parfums
Radioactive
您可能可以尝试类似于
contents.html.find('a')[0]。text
您可能可以尝试类似于contents.html.find('a')[0]。text