Python 如何在streamparse spout中使用tweepy接受推特流并将推特传递给bolt?
最近,我开始研究storm,为了更好地使用python,我决定使用streamparse来研究storm。我计划在spout中接受twitter流,并在bolt中执行一些计算。但我想不出我该如何在喷口中编写代码。我已经阅读了各种streamparse教程,但它们都显示了从静态列表发出的元组,并且没有twitter流api提供的流。 这是我的风暴代码:Python 如何在streamparse spout中使用tweepy接受推特流并将推特传递给bolt?,python,twitter,apache-storm,tweepy,streamparse,Python,Twitter,Apache Storm,Tweepy,Streamparse,最近,我开始研究storm,为了更好地使用python,我决定使用streamparse来研究storm。我计划在spout中接受twitter流,并在bolt中执行一些计算。但我想不出我该如何在喷口中编写代码。我已经阅读了各种streamparse教程,但它们都显示了从静态列表发出的元组,并且没有twitter流api提供的流。 这是我的风暴代码: class WordSpout(Spout): def initialize(self, stormconf, context): se
class WordSpout(Spout):
def initialize(self, stormconf, context):
self.words = itertools.cycle(['dog', 'cat','zebra', 'elephant'])
def next_tuple(self):
word = next(self.words)
self.emit([word])
这是我的tweepy代码:
class listener(StreamListener):
def on_status(self,status):
print(status.text)
print "--------------------------------"
return(True)
def on_error(self, status):
print "error"
def on_connect(self):
print "CONNECTED"
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["california"])
如何集成这两个代码?为此,我设置了一个kafka队列,tweepy侦听器使用该队列将status.text写入队列。然后,喷口不断地从队列中读取数据以执行分析。我的代码看起来有点像这样: listener.py:
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
# print(status.text)
client = KafkaClient(hosts='127.0.0.1:9092')
topic = client.topics[str('tweets')]
with topic.get_producer(delivery_reports=False) as producer:
# print status.text
sentence = status.text
for word in sentence.split(" "):
if word is None:
continue
try:
word = str(word)
producer.produce(word)
except:
continue
def on_error(self, status_code):
if status_code == 420: # exceed rate limit
return False
else:
print("Failing with status code " + str(status_code))
return False
auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth=api.auth, listener=myStreamListener)
myStream.filter(track=['is'])
喷口文件:
from streamparse.spout import Spout
from pykafka import KafkaClient
class TweetSpout(Spout):
words = []
def initialize(self, stormconf, context):
client = KafkaClient(hosts='127.0.0.1:9092')
self.topic = client.topics[str('tweets')]
def next_tuple(self):
consumer = self.topic.get_simple_consumer()
for message in consumer:
if message is not None:
self.emit([message.value])
else:
self.emit()