提取1000个URI'；使用Tweepy和Python从Twitter下载_Python_Tweepy

提取1000个URI'；使用Tweepy和Python从Twitter下载

python

提取1000个URI'；使用Tweepy和Python从Twitter下载,python,tweepy,Python,Tweepy,我正在尝试使用Tweepy和Python从Twitter中提取1000个唯一的、完全扩展的URI。具体地说，我对直接指向Twitter之外的链接感兴趣（所以不要返回到其他tweet/retweets/duplicates）我写的代码一直给我一个“实体”的关键错误它会给我一些网址前打破；有些是扩展的，有些不是。我不知道如何着手解决这个问题请帮帮我注意：我遗漏了我的证件这是我的密码： # Import the necessary methods from different lib

我正在尝试使用Tweepy和Python从Twitter中提取1000个唯一的、完全扩展的URI。具体地说，我对直接指向Twitter之外的链接感兴趣（所以不要返回到其他tweet/retweets/duplicates）

我写的代码一直给我一个“实体”的关键错误

它会给我一些网址前打破；有些是扩展的，有些不是。我不知道如何着手解决这个问题

请帮帮我

注意：我遗漏了我的证件

这是我的密码：

    # Import the necessary methods from different libraries
      import tweepy
      from tweepy.streaming import StreamListener
      from tweepy import OAuthHandler
      from tweepy import Stream
      import json

    # Variables that contains the user credentials to access Twitter API
      access_token = "enter token here"
      access_token_secret = "enter token here"
      consumer_key = "enter key here"
      consumer_secret = "enter key here"

    # Accessing tweepy API
    # api = tweepy.API(auth)

    # This is a basic listener that just prints received tweets to stdout.
    class StdOutListener(StreamListener):
         def on_data(self, data):
         # resource: http://code.runnable.com/Us9rrMiTWf9bAAW3/how-to-              stream-data-from-twitter-with-tweepy-for-python
    # Twitter returns data in JSON format - we need to decode it first
    decoded = json.loads(data)

    # resource: http://socialmedia-class.org/twittertutorial.html
    # Print each tweet in the stream to the screen
    # Here we set it to stop after getting 1000 tweets.
    # You don't have to set it to stop, but can continue running
    # the Twitter API to collect data for days or even longer.
    count = 1000

    for url in decoded["entities"]["urls"]:
        count -= 1
        print "%s" % url["expanded_url"] + "\r\n\n"
        if count <= 0:
            break

def on_error(self, status):
    print status


if __name__ == '__main__':
     # This handles Twitter authetification and the connection to Twitter     Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)

# This line filter Twitter Streams to capture data by the keyword: YouTube
stream.filter(track=['YouTube'])

#从不同的库导入必要的方法
进口粗花呢
从tweepy.streaming导入StreamListener
从tweepy导入OAuthHandler
从tweepy导入流
导入json
#包含访问Twitter API的用户凭据的变量
access\u token=“在此处输入token”
access\u token\u secret=“在此处输入token”
消费者密钥=“在此处输入密钥”
consumer\u secret=“在此处输入密钥”
#访问tweepyapi
#api=tweepy.api（auth）
#这是一个基本的侦听器，它只将收到的tweet打印到stdout。
类StdOutListener（StreamListener）：
def on_数据（自身、数据）：
#资源：http://code.runnable.com/Us9rrMiTWf9bAAW3/how-to-              使用tweepy for python从twitter流式传输数据
#Twitter以JSON格式返回数据-我们需要首先对其进行解码
decoded=json.load（数据）
#资源：http://socialmedia-class.org/twittertutorial.html
#将流中的每条推文打印到屏幕上
#在这里，我们设置它在收到1000条推文后停止。
#您不必将其设置为停止，但可以继续运行
#Twitter API可以收集数天甚至更长时间的数据。
计数=1000
对于解码[“实体”][“url”]中的url：
计数-=1
打印“%s”%url[“扩展的\u url”]+“\r\n\n”
如果count似乎API达到了速率限制，那么一个选项是在获得KeyError
时包含异常，然后我看到[u'limit']
。我添加了一个计数显示，以验证它是否达到1000
：
count = 1000 # moved outside of class definition to avoid getting reset

class StdOutListener(StreamListener):
    def on_data(self, data):

        decoded = json.loads(data)

        global count # get the count
        if count <= 0:
            import sys
            sys.exit()
        else:
            try:
                for url in decoded["entities"]["urls"]:
                    count -= 1
                    print count,':', "%s" % url["expanded_url"] + "\r\n\n"

            except KeyError:
                print decoded.keys()

    def on_error(self, status):
        print status


if __name__ == '__main__':

    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    stream.filter(track=['YouTube'])

count=1000#移动到类定义之外以避免重置
类StdOutListener（StreamListener）：
def on_数据（自身、数据）：
decoded=json.load（数据）
全局计数#获取计数
首先，不要在互联网上共享你的私钥。您的授权凭据现在已受损，您应该重新生成密钥。至于你的问题，很难知道如何解决你的问题，因为我不知道“解码”对象是什么样子的。您应该打印解码的第一项并停止脚本<代码>打印（解码[0]）
检查对象-是否有实体属性？哇！我不是故意的。非常感谢。你说它看起来是什么意思？哇！非常感谢你。我不知道把它移出def类会有这么大的不同。非常欢迎。你测试过了吗？我希望有帮助。我只是将它移出了类def，因为我注意到每次实例化类时它都被重置为1000
。从那以后，它似乎算对了：）现在肯定是1000了！我花了这么长时间摆弄它，想让它发挥作用！你真棒！现在我必须弄清楚这些URL是否完全扩展了。太好了，很高兴听到这个消息。它们看起来是，而且应该是，因为您正在使用url[“expanded\u url”]
Ahh。出于某种原因，它仍然只是给了我一些缩短的URL。