Python 正在尝试从图像搜索中获取url_Python_Image_Url_Beautifulsoup

Python 正在尝试从图像搜索中获取url

python image url

Python 正在尝试从图像搜索中获取url,python,image,url,beautifulsoup,Python,Image,Url,Beautifulsoup,有人能帮我解决这个问题吗！所以我想做的是制作一个程序，如果你输入一个单词，它会找到第一个图像并从img发回url，但它不会这样做。 from urllib.request import urlopen from bs4 import BeautifulSoup import re word = input() html = urlopen('https://www.google.com/search?q=', word +'&rlz=1C1GCEU_lvLV926LV926&

有人能帮我解决这个问题吗！所以我想做的是制作一个程序，如果你输入一个单词，它会找到第一个图像并从img发回url，但它不会这样做。

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

word = input()

html = urlopen('https://www.google.com/search?q=', word +'&rlz=1C1GCEU_lvLV926LV926&sxsrf=ALeKk01xl0HutDOTshkCUPM5qDFtKyvuKg:1613851219348&source=lnms&tbm=isch&sa=X&ved=2ahUKEwjC0JiloPnuAhWoAxAIHZKdAGUQ_AUoAXoECA4QAw&biw=958&bih=959')

bs = BeautifulSoup(html, 'html.parser')
images = bs.find_all('img', {'src':re.compile('.jpg')})
for image in images: 
    print(image['src']+'\n')

有人能告诉我该怎么做吗？

看起来有些图像是经过编码的，但是试试这个。如果对图像进行编码，您可能无法在src或href中找到.jpg

url = 'https://www.google.com/search?q=guitar'
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
images = soup.find_all(href=re.compile('.jpg'))
for image in images: 
    print(image.get('href'))

它将拉出几个图像URL：

https://www.google.com/imgres?imgurl=https://cdn.mos.cms.futurecdn.net/Ge25ccbyKQ76Et9bBjFnxk-1200-80.jpg&imgrefurl=https://www.guitarworld.com/gear/types-of-guitar-everything-you-need-to-know&h=675&w=1200&tbnid=1bWm5qMm6P85iM&q=guitar&tbnh=84&tbnw=150&usg=AI4_-kR-ixXbUq1jFtJ-kcukVj6j-7KgTw&vet=1&docid=4ZL7MkOS7tG24M&sa=X&ved=2ahUKEwi0qaL_pvnuAhUCXK0KHYLrCWUQ9QEwJHoECAEQCA
https://www.google.com/imgres?imgurl=https://online.berklee.edu/takenote/wp-content/uploads/2020/07/learn_acoustic_blues_guitar_article_image.jpg&imgrefurl=https://online.berklee.edu/takenote/acoustic-blues-guitar-tips/&h=1200&w=1920&tbnid=QR9aabuUf_XeFM&q=guitar&tbnh=94&tbnw=150&usg=AI4_-kSKaX2goL8QU_gf6aNPMvEK3WF3tw&vet=1&docid=hdq2fzc2ogCnkM&sa=X&ved=2ahUKEwi0qaL_pvnuAhUCXK0KHYLrCWUQ9QEwJXoECAEQCg
https://www.google.com/imgres?imgurl=https://images-na.ssl-images-amazon.com/images/I/41jIw1mUV4L._AC_.jpg&imgrefurl=https://www.amazon.com/Yamaha-FG800-Solid-Acoustic-Guitar/dp/B01C92QHLC&h=500&w=204&tbnid=ESB5AJN1MKnK_M&q=guitar&tbnh=130&tbnw=53&usg=AI4_-kQB83ftunCPyX3cXobwJMp0b1UhAg&vet=1&docid=9Ld6uZPysxav6M&sa=X&ved=2ahUKEwi0qaL_pvnuAhUCXK0KHYLrCWUQ9QEwJnoECAEQDA

首先，您没有正确设置请求。您需要定义一个用户代理，否则您的请求将被拒绝。然后，您需要过滤图像。由于谷歌正在使用“gstatic.com”，您需要过滤掉响应

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import re

word = input()

url = "https://www.google.com/search?tbm=isch&q=" + word
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
req = Request(url, headers=headers)

page = urlopen(req)

bs = BeautifulSoup(page, 'html.parser')
images = bs.find_all('img', {'src':re.compile('.*gstatic.com.*')})

for img in images:
    print(img['src'])

请注意，您可以将您的地址简化为

“https://www.google.com/search?tbm=isch&q=“+word

。因为它暴露了你的个人数据（可能），而且是多余的。