MongoDB Python推特流未保存到数据库_Python_Mongodb_Api_Twitter_Streaming

MongoDB Python推特流未保存到数据库

python mongodb api twitter streaming

MongoDB Python推特流未保存到数据库,python,mongodb,api,twitter,streaming,Python,Mongodb,Api,Twitter,Streaming,正在尝试创建一个python脚本来挖掘twitter的数据，但是我运气不好！我不知道我做错了什么 from pymongo import MongoClient import json from tweepy import Stream from tweepy import OAuthHandler from tweepy.streaming import StreamListener import datetime # Auth Variables consumer_key = "INSE

正在尝试创建一个python脚本来挖掘twitter的数据，但是我运气不好！我不知道我做错了什么

from pymongo import MongoClient
import json
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import datetime

# Auth Variables

consumer_key = "INSERT_KEY_HERE"
consumer_key_secret = "INSERT_KEY_HERE"
access_token = "INSERT_KEY_HERE"
access_token_secret = "INSERT_KEY_HERE"

# MongoDB connection info

connection = MongoClient('localhost', 27017)
db = connection.TwitterStream
db.tweets.ensure_index("id", unique=True, dropDups=True)
collection = db.tweets

# Key words to be tracked, (hashtags)

keyword_list = ['#MorningAfter', '#Clinton', '#Trump']


class StdOutListener(StreamListener):
    def on_data(self, data):

        # Load the Tweet into the variable "t"
        t = json.loads(data)

        # Pull important data from the tweet to store in the database.
        tweet_id = t['id_str']  # The Tweet ID from Twitter in string format
        text = t['text']  # The entire body of the Tweet
        hashtags = t['entities']['hashtags']  # Any hashtags used in the Tweet
        time_stamp = t['created_at']  # The timestamp of when the Tweet was created
        language = t['lang']  # The language of the Tweet

        # Convert the timestamp string given by Twitter to a date object called "created"
        created = datetime.datetime.strptime(time_stamp, '%a %b %d %H:%M:%S +0000 %Y')

        # Load all of the extracted Tweet data into the variable "tweet" that will be stored into the database
        tweet = {'id': tweet_id, 'text': text, 'hashtags': hashtags, 'language': language, 'created': created}

        # Save the refined Tweet data to MongoDB
        collection.insert(tweet)

        print(tweet_id + "\n")
        return True

    # Prints the reason for an error to your console
    def on_error(self, status):
        print(status)

l = StdOutListener(api=tweepy.API(wait_on_rate_limit=True))
auth = OAuthHandler(consumer_key, consumer_key_secret)
auth.set_access_token(access_token, access_token_secret)

stream = Stream(auth, listener=l)
stream.filter(track=keyword_list)

这是我到目前为止的剧本。我试着做了一些谷歌搜索，我把我拥有的和他们拥有的进行了比较，但找不到问题的根源。它运行并连接到MongoDB，我创建了正确的数据库，但数据库中没有任何内容。我有一段调试代码，它会打印tweet id，但只会在大约5-10秒的时间间隔内反复打印401。我尝试了一些我在谷歌搜索我想做的事情时发现的基本例子，但仍然没有发生任何事情。我想可能是数据库连接有问题吧？下面是正在运行的数据库的一些图像。

如果您有任何想法，我们将不胜感激，谢谢

我终于明白了！打印401是这里的关键，这是一个身份验证错误。我必须将我的系统时钟连接到互联网，并重置我的系统时钟。

我终于找到了答案！打印401是这里的关键，这是一个身份验证错误。我必须将我的系统时钟连接到互联网，并重置我的系统时钟