Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 从HTML、python、web抓取接收特定数据_Python 3.x_Python Requests - Fatal编程技术网

Python 3.x 从HTML、python、web抓取接收特定数据

Python 3.x 从HTML、python、web抓取接收特定数据,python-3.x,python-requests,Python 3.x,Python Requests,我希望从这样的HTML文件中接收特定数据: 我想得到的是最后一条信息,即来自strong部分的“Sonic le film”,此时我的代码如下所示: import requests from bs4 import BeautifulSoup import re url = 'https://www.villedieu-cinema.fr/' resp = requests.get(url) soup = BeautifulSoup(resp.txt, 'lxml') table = soup

我希望从这样的HTML文件中接收特定数据:

我想得到的是最后一条信息,即来自strong部分的“Sonic le film”,此时我的代码如下所示:

import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.villedieu-cinema.fr/'
resp = requests.get(url)
soup = BeautifulSoup(resp.txt, 'lxml')
table = soup.find('table', {'class': ' mceEditable'})
contents = table.find_all('tr')

你知道如何获取这些数据吗?非常感谢你给我的建议

带有表格的URL是
https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/

您可以使用此示例从表中获取信息:

import requests
from bs4 import BeautifulSoup


url = 'https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

movies = []
for tr in soup.select('.mceEditable tr'):
    tds = [' '.join(td.get_text(strip=True, separator=' ').split()) for td in tr.select('td')]
    if tds[0]:
        movies.append(tds[0])
    print(tds)

# print all movies:
print()
print(*movies, sep='\n')
印刷品:

['', 'Mer 8', 'Jeu 9', 'Ven 10', 'Sam 11', 'Dim 12', 'Lun 13', 'Mar 14']
['Sonic le film', '16h', '18h', '16h', '', '', '', '']
['Les parfums', '21h', '', '21h', '18h', '18h', '21h', '']
['Radioactive', '', '21h VO', '', '21h VF', '', '', '21h VF']

Sonic le film
Les parfums
Radioactive

带有该表的URL为
https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/

您可以使用此示例从表中获取信息:

import requests
from bs4 import BeautifulSoup


url = 'https://www.villedieu-cinema.fr/semaine-2020-8-au-14-07-1/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

movies = []
for tr in soup.select('.mceEditable tr'):
    tds = [' '.join(td.get_text(strip=True, separator=' ').split()) for td in tr.select('td')]
    if tds[0]:
        movies.append(tds[0])
    print(tds)

# print all movies:
print()
print(*movies, sep='\n')
印刷品:

['', 'Mer 8', 'Jeu 9', 'Ven 10', 'Sam 11', 'Dim 12', 'Lun 13', 'Mar 14']
['Sonic le film', '16h', '18h', '16h', '', '', '', '']
['Les parfums', '21h', '', '21h', '18h', '18h', '21h', '']
['Radioactive', '', '21h VO', '', '21h VF', '', '', '21h VF']

Sonic le film
Les parfums
Radioactive

您可能可以尝试类似于
contents.html.find('a')[0]。text
您可能可以尝试类似于
contents.html.find('a')[0]。text