Python 从网页中删除数据帧_Python_Pandas_Beautifulsoup

Python 从网页中删除数据帧

python pandas

Python 从网页中删除数据帧,python,pandas,beautifulsoup,Python,Pandas,Beautifulsoup,我试图用beautifulsoup和pandas从网页上抓取一组图像，但我被一段文字卡住了由于是用MicrosoftWord编码的，格式不太合适，我计划将其转换为pd.DataFrame，其中所有图像的路径都可以与适当的描述链接我已经设法打印出我需要的所有信息，但我无法将它们加载到列表或数据框中你能帮忙吗代码如下： # Import packages import requests from bs4 import BeautifulSoup # Specify url url = 'h

我试图用beautifulsoup和pandas从网页上抓取一组图像，但我被一段文字卡住了

由于是用MicrosoftWord编码的，格式不太合适，我计划将其转换为pd.DataFrame，其中所有图像的路径都可以与适当的描述链接

我已经设法打印出我需要的所有信息，但我无法将它们加载到列表或数据框中

你能帮忙吗

代码如下：

# Import packages
import requests
from bs4 import BeautifulSoup

# Specify url
url = 'http://mariaberica.it/4.Quadri per sito.htm'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Extracts the response as html: html_doc
html_doc = r.text

# create a BeautifulSoup object from the HTML: soup
soup = BeautifulSoup(html_doc)

# Find all 'a' tags (which define hyperlinks): a_tags

a_tags = soup.find_all('img')


# Print the URLs to the shell
for link in a_tags:
    print(link.get('src'))

这是我在贝壳里得到的东西。我想把它列在清单上

4.Quadri%20per%20sito_file/image001.jpg
4.Quadri%20per%20sito_file/image002.jpg
4.Quadri%20per%20sito_file/image003.jpg
4.Quadri%20per%20sito_file/image004.jpg

你能帮忙吗？

谢谢。

只需使用

列表理解

即可将其放入列表中

以下是方法：

import requests
from bs4 import BeautifulSoup

url = 'http://mariaberica.it/4.Quadri per sito.htm'
a_tags = BeautifulSoup(requests.get(url).text, "html.parser").find_all('img')
image_src = [l["src"] for l in a_tags]
print(image_src[:2])

前两个元素的示例输出：

['4.Quadri%20per%20sito_file/image001.jpg', '4.Quadri%20per%20sito_file/image002.jpg']

你能更具体地说明问题是什么吗？你熟悉列表的基本知识吗？你有没有咨询过医生？