Python 从发布的链接/和发布的页面提取主图像_Python_Django

Python 从发布的链接/和发布的页面提取主图像

python django

Python 从发布的链接/和发布的页面提取主图像,python,django,Python,Django,游戏计划是提取那些主要的图像，并在索引页面的缩略图中显示它们。我在这个功能上遇到了很多麻烦，似乎在互联网上没有这个功能的例子。我找到了三个选择 1.beautifulsoup//人们似乎最常使用这种方法，但我不知道beautifulsoup如何才能找到具有代表性的形象……而且我认为这需要做最多的工作。2.蟒蛇鹅//这看起来是合法的。文件上说它提取了主要图像，我想我需要相信他们的话。问题是我不知道如何在django中使用它。 3.Embeddely/…对于我需要的功能，可能选择了错误的选项。我正

游戏计划是提取那些主要的图像，并在索引页面的缩略图中显示它们。我在这个功能上遇到了很多麻烦，似乎在互联网上没有这个功能的例子。我找到了三个选择 1.beautifulsoup//人们似乎最常使用这种方法，但我不知道beautifulsoup如何才能找到具有代表性的形象……而且我认为这需要做最多的工作。2.蟒蛇鹅//这看起来是合法的。文件上说它提取了主要图像，我想我需要相信他们的话。问题是我不知道如何在django中使用它。 3.Embeddely/…对于我需要的功能，可能选择了错误的选项。我正在考虑在这个项目中使用python goose。我的问题是你将如何处理这个问题？你知道什么例子吗，或者能提供一些我可以看的例子吗？对于从用户提供给我的页面的图像中提取图像，我可能会使用sorl缩略图（对吗？)，但对于发布的链接

Edit1：使用python goose，看起来（主）图像抓取非常简单。问题是，我不知道如何将脚本用于我的应用程序，我应该如何将该图像转到右缩略图并显示在我的index.html上。。。这是我的media.py（不确定它是否有效

  import json
from goose import Goose

def extract(request):
    url = request.args.get('url')
    g = Goose()
    article = g.extract(url=url)
    resposne = {'image':article.top_image.src}
    return json.dumps(resposne)

资料来源：博客的例子是使用flask，我试图为使用django的人编写脚本

编辑2：好的，这是我的方法。我真的认为这是正确的，但不幸的是，它没有给我任何东西。没有错误或没有图像，但python语法是正确的……如果有人为什么它不工作，请告诉我

Models.py

班级职务（models.Model）： url=models.URLField（最大长度=250，空白=True，空=True）

Index.html

{% if posts %}
    {% for post in posts %}
      {{ post.extract}}
{%endfor%}
{%endif%}

BeautifulSoup将是实现这一目标的途径，实际上非常简单

首先，HTML中的图像如下所示：

<img src="http://www.url.to/image.png"></img>

我不知道你打算如何决定使用哪个图像作为缩略图，但是你可以通过URL列表提取你想要的图像

更新

我知道你说的是dJango，但我强烈推荐Flask。它简单得多，但仍然很实用

我写了这篇文章，它只显示了你给它的任何网页的第一张图片

from bs4 import BeautifulSoup #Import stuff
import requests
from flask import Flask
app = Flask(__name__)

def getImages(url):
    r  = requests.get(url) #Download website source

    data = r.text  #Get the website source as text

    soup = BeautifulSoup(data) #Setup a "soup" which BeautifulSoup can search

    links = []

    for link in soup.find_all('img'):  #Cycle through all 'img' tags
        imgSrc = link.get('src')   #Extract the 'src' from those tags
        links.append(imgSrc)    #Append the source to 'links'

    return links  #Return 'links'

@app.route('/<site>')
def page(site):
    image = getImages("http://" + site)[0] #Here I find the 1st image on the page
    if image[0] == "/":
        image = "http://" + site + image  #This creates a URL for the image
    return "<img src=%s></img>" % image  #Return the image in an HTML "img" tag

if __name__ == '__main__':
    app.run(debug=True, host="0.0.0.0")  #Run the Flask webserver

从bs4导入BeautifulSoup#导入内容
导入请求
从烧瓶进口烧瓶
app=烧瓶（名称）
def getImages（url）：
r=请求.获取（url）#下载网站源
data=r.text#以文本形式获取网站源
soup=BeautifulSoup（数据）#设置一个BeautifulSoup可以搜索的“soup”
链接=[]
查找所有（'img'）：#遍历所有'img'标记
imgSrc=link.get（'src'）#从这些标记中提取'src'
links.append（imgSrc）#将源代码附加到“links”
返回链接#返回“链接”
@应用程序路径（“/”）
def页面（站点）：
image=getImages（“http://“+site）[0]#在这里我找到页面上的第一个图像
如果图像[0]==“/”：
image=“http://”+site+image#这将为图像创建一个URL
返回“%image”#返回HTML“img”标记中的图像
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
app.run（debug=True，host=“0.0.0.0”）#运行Flask Web服务器

这将在上托管一个web服务器

要输入一个站点，请执行，例如

我刚刚在做这项工作，我决定使用python goose，但在将脚本实现到index.html时遇到了问题…我将编辑我所做的，如果您能查看一下，我将非常感激。您的代码不打印所有图像吗？而不仅仅是一个主图像吗？这就是我切换的原因巨蟒goose@haloyoba你说的“主要形象”是什么意思比如，页面上最大的图像，或者加载时页面实际外观的图像？是的，代表文章的图像，可能是最大的image@haloyoba-好的，那么你所需要做的就是浏览图片URL的列表，并决定哪一个是合适的，可能是通过检查它的尺寸。我真的不知道你是如何但我还是要这样做。

from bs4 import BeautifulSoup #Import stuff
import requests

r  = requests.get("http://www.site-to-extract.com/") #Download website source

data = r.text  #Get the website source as text

soup = BeautifulSoup(data) #Setup a "soup" which BeautifulSoup can search

links = []

for link in soup.find_all('img'):  #Cycle through all 'img' tags
    imgSrc = link.get('src')   #Extract the 'src' from those tags
    links.append(imgSrc)    #Append the source to 'links'

print(links)  #Print 'links'

from bs4 import BeautifulSoup #Import stuff
import requests
from flask import Flask
app = Flask(__name__)

def getImages(url):
    r  = requests.get(url) #Download website source

    data = r.text  #Get the website source as text

    soup = BeautifulSoup(data) #Setup a "soup" which BeautifulSoup can search

    links = []

    for link in soup.find_all('img'):  #Cycle through all 'img' tags
        imgSrc = link.get('src')   #Extract the 'src' from those tags
        links.append(imgSrc)    #Append the source to 'links'

    return links  #Return 'links'

@app.route('/<site>')
def page(site):
    image = getImages("http://" + site)[0] #Here I find the 1st image on the page
    if image[0] == "/":
        image = "http://" + site + image  #This creates a URL for the image
    return "<img src=%s></img>" % image  #Return the image in an HTML "img" tag

if __name__ == '__main__':
    app.run(debug=True, host="0.0.0.0")  #Run the Flask webserver