Python 3.x 从HTML、python、web抓取接收特定数据_Python 3.x_Python Requests

Python 3.x 从HTML、python、web抓取接收特定数据

python-3.x

Python 3.x 从HTML、python、web抓取接收特定数据,python-3.x,python-requests,Python 3.x,Python Requests,我希望从这样的HTML文件中接收特定数据：我想得到的是最后一条信息，即来自strong部分的“Sonic le film”，此时我的代码如下所示： import requests from bs4 import BeautifulSoup import re url = 'https://www.villedieu-cinema.fr/' resp = requests.get(url) soup = BeautifulSoup(resp.txt, 'lxml') table = soup

我希望从这样的HTML文件中接收特定数据：

我想得到的是最后一条信息，即来自strong部分的“Sonic le film”，此时我的代码如下所示：

import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.villedieu-cinema.fr/'
resp = requests.get(url)
soup = BeautifulSoup(resp.txt, 'lxml')
table = soup.find('table', {'class': ' mceEditable'})
contents = table.find_all('tr')

你知道如何获取这些数据吗？非常感谢你给我的建议

带有表格的URL是

https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/

您可以使用此示例从表中获取信息：

import requests
from bs4 import BeautifulSoup


url = 'https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

movies = []
for tr in soup.select('.mceEditable tr'):
    tds = [' '.join(td.get_text(strip=True, separator=' ').split()) for td in tr.select('td')]
    if tds[0]:
        movies.append(tds[0])
    print(tds)

# print all movies:
print()
print(*movies, sep='\n')

印刷品：

['', 'Mer 8', 'Jeu 9', 'Ven 10', 'Sam 11', 'Dim 12', 'Lun 13', 'Mar 14']
['Sonic le film', '16h', '18h', '16h', '', '', '', '']
['Les parfums', '21h', '', '21h', '18h', '18h', '21h', '']
['Radioactive', '', '21h VO', '', '21h VF', '', '', '21h VF']

Sonic le film
Les parfums
Radioactive

带有该表的URL为

https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/

您可以使用此示例从表中获取信息：

import requests
from bs4 import BeautifulSoup


url = 'https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

movies = []
for tr in soup.select('.mceEditable tr'):
    tds = [' '.join(td.get_text(strip=True, separator=' ').split()) for td in tr.select('td')]
    if tds[0]:
        movies.append(tds[0])
    print(tds)

# print all movies:
print()
print(*movies, sep='\n')

印刷品：

['', 'Mer 8', 'Jeu 9', 'Ven 10', 'Sam 11', 'Dim 12', 'Lun 13', 'Mar 14']
['Sonic le film', '16h', '18h', '16h', '', '', '', '']
['Les parfums', '21h', '', '21h', '18h', '18h', '21h', '']
['Radioactive', '', '21h VO', '', '21h VF', '', '', '21h VF']

Sonic le film
Les parfums
Radioactive

您可能可以尝试类似于

contents.html.find（'a'）[0]。text

您可能可以尝试类似于

contents.html.find（'a'）[0]。text