Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Python中从子Reddits抓取Imgur链接_Python_Web Scraping_Href_Reddit_Imgur - Fatal编程技术网

在Python中从子Reddits抓取Imgur链接

在Python中从子Reddits抓取Imgur链接,python,web-scraping,href,reddit,imgur,Python,Web Scraping,Href,Reddit,Imgur,到目前为止,我的代码成功地从给定子Reddit名称时获取的5个结果中提取HTML代码。现在我想搜索imgur链接,无论是一张相册、包含/a/还是一张图片。然后我想提升此链接并将其发送到另一个类(imgurdl) 考虑到我目前的代码,最好的方法是什么 from bs4 import BeautifulSoup import praw from urllib2 import urlopen import urllib2 import sys from urlparse import urljoin

到目前为止,我的代码成功地从给定子Reddit名称时获取的5个结果中提取HTML代码。现在我想搜索imgur链接,无论是一张相册、包含/a/还是一张图片。然后我想提升此链接并将其发送到另一个类(imgurdl)

考虑到我目前的代码,最好的方法是什么

from bs4 import BeautifulSoup
import praw
from urllib2 import urlopen
import urllib2
import sys
from urlparse import urljoin
import config
import imgurdl
import requests

cache = []
soup = BeautifulSoup
def reddit_login():
    r = praw.Reddit(username = USER,
                password = config.password,
                client_id = config.client_id,
                client_secret = config.client_secret,
                user_agent = " v0.3"
                )
    print("***********logged in successfully***********")
    return r

def get_category_links(subredditName, r):
    print("Grabbing subreddit...")
    submissions = r.subreddit(subredditName).hot(limit=5)
    print("Grabbing comments...")
    #comments = subred.comments(limit = 200)
    for submission in submissions:
        htmlSource = requests.get(submission.url).text
        print (htmlSource)


r = reddit_login()
get_category_links(sys.argv[1], r) 

您可以从PRAW获取url,然后检查它是否来自循环本身中的imgur,然后将其发送到相应的函数。这样就不需要通过html源代码进行访问

 for submission in submissions:
    link = submission.url
    if "imgur.com/a/" in link:
        #Send to imgur album downloader
    elif link.endswith(".jpg") or link.endswith(".png"):
        #Sent to image downloader
    elif "imgur.com/" in link:
        #Send to single image imgur downloader

到目前为止你试过什么?为什么不使用Reddit的API?@KevinMGranger我不知道有一个,我也不熟悉它。你有文档的链接吗?它是否允许我以时尚的方式提升链接?到目前为止,我还没有尝试过任何东西。要在HTML代码中找到imgur链接,您可以使用
re
@BurningKarl。您能详细说明一下吗?您能在for循环中对
submission.url
进行文本匹配吗?