Python Can'；无法从网页获取所有图像_Python_Web Scraping

Python Can'；无法从网页获取所有图像

python web-scraping

Python Can'；无法从网页获取所有图像,python,web-scraping,Python,Web Scraping,我想在这里刮所有的图片链接，我使用的是requests+BeautifulSoupPython 3.7。我的问题是结果是3，而页面上有6个图像编辑：服务器正在使用Cookie为我提供想要的图片和完整的html页面，因此在添加Cookie处理并在我的代码中添加正确的url后，它会按需要工作这是因为在代码中，您只能在表标记内找到图像： obj=obj.find('table') 只有两个同时尝试在页面中搜索其他图像： import requests from bs4 import Bea

我想在这里刮所有的图片链接，我使用的是requests+BeautifulSoupPython 3.7。我的问题是结果是3，而页面上有6个图像

编辑：服务器正在使用Cookie为我提供想要的图片和完整的html页面，因此在添加Cookie处理并在我的代码中添加正确的url后，它会按需要工作

这是因为在代码中，您只能在表标记内找到图像：

obj=obj.find('table')

只有两个

同时尝试在页面中搜索其他图像：

import requests
from bs4 import BeautifulSoup as bs
url='https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no={};'
#var=input("Enter the variable to Bring Photos links:")
var='240100160336'
url=url.format(var)
headers={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3','Cookie':'ASP.NET_SessionId=v4kd535hn3d43z0x4ttgzqit','User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'}
res=requests.get(url,headers=headers)
obj=bs(res.text,'html.parser')

# Search for images inside tables
objTable=obj.find('table')
imgs=objTable.find_all('img')

# Search for other images in the page
imgs2=obj.find_all('img')

print(len(imgs) + len(imgs2))

编辑：

您的代码中提供的URL与您要刮取的URL不同

代码中的URL为：

https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no={};

您用来修改URL并向其附加变量的方法没有帮助。它打印：

https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no=240100160336;

请查看此链接以获取有关的帮助

您在帖子中链接的URL为：

https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no=240100160336

我稍微修改了您的代码并添加了正确的URL:

import requests
from bs4 import BeautifulSoup as bs
url='https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no=240100160336'

res=requests.get(url)
obj=bs(res.text, 'html.parser')

# Search for images in the page
imgs=obj.find_all('img')
images = []
for img in imgs:
    images.append(img.get('src'))

print(images)

print(len(images))

请查看它现在是否工作。

如果您运行代码并打印链接，您将发现并非所有图像都被找到+我注意到头部图像的src（打印imgs和imgs2以了解）为空+当我打印页面代码并在代码中搜索imgs时，大多数图像标记未找到请在浏览器中查看图像信息或查看页面源，您会注意到报告了5个图像，但只有两个图像具有绝对真实路径，而其他三个图像是从没有真实路径的ASP脚本动态引入浏览器的。我将参数放入有效负载并添加了标题，但不起作用，当我用我的浏览器1更改它时：它起作用了，现在如何生成这个cookie（阅读问题编辑）你是说“刮”吗？因为“废品”意味着你想扔掉它们。对不起，我不知道

import requests
from bs4 import BeautifulSoup as bs
url='https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no=240100160336'

res=requests.get(url)
obj=bs(res.text, 'html.parser')

# Search for images in the page
imgs=obj.find_all('img')
images = []
for img in imgs:
    images.append(img.get('src'))

print(images)

print(len(images))