Python Tweepy跟踪多个术语_Python_Python 2.7_Twitter_Tweepy

Python Tweepy跟踪多个术语

python python-2.7 twitter

Python Tweepy跟踪多个术语,python,python-2.7,twitter,tweepy,Python,Python 2.7,Twitter,Tweepy,我在tweet上做内容分析。我使用tweepy返回符合特定条件的tweets，然后将N条tweets写入CSv文件进行分析。创建文件和获取数据不是问题，但我希望减少数据收集时间。目前，我正在遍历一个文件中的术语列表。一旦达到N（例如500条tweets），它将移动到下一个过滤项我想把我所有的术语（少于400个）输入到一个变量中，并将所有结果进行匹配。这也行得通。我无法从twitter获取状态匹配项的返回值 class CustomStreamListener(tweepy.StreamList

我在tweet上做内容分析。我使用tweepy返回符合特定条件的tweets，然后将N条tweets写入CSv文件进行分析。创建文件和获取数据不是问题，但我希望减少数据收集时间。目前，我正在遍历一个文件中的术语列表。一旦达到N（例如500条tweets），它将移动到下一个过滤项

我想把我所有的术语（少于400个）输入到一个变量中，并将所有结果进行匹配。这也行得通。我无法从twitter获取状态匹配项的返回值

class CustomStreamListener(tweepy.StreamListener):
    def __init__(self, output_file, api=None):
        super(CustomStreamListener, self).__init__()
        self.num_tweets = 0
        self.output_file = output_file

    def on_status(self, status):
       cleaned = status.text.replace('\'','').replace('&amp;','').replace('&gt;','').replace(',','').replace("\n",'')
        self.num_tweets = self.num_tweets + 1
        if self.num_tweets < 500:
            self.output_file.write(topicName + ',' + status.user.location.encode("UTF-8") + ',' + cleaned.encode("UTF-8") + "\n")
            print ("capturing tweet number " + str(self.num_tweets) + " for search term: " + topicName)
            return True
        else:
            return False
            sys.exit("terminating")

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True #Don't kill the stream

with open('termList.txt', 'r') as f:
  topics = [line.strip() for line in f]

for topicName in topics:
    stamp = datetime.datetime.now().strftime(topicName + '-%Y-%m-%d-%H%M%S')
    with open(stamp + '.csv', 'w+') as topicFile:
        sapi = tweepy.streaming.Stream(auth, CustomStreamListener(topicFile))
        sapi.filter(track=[topicName])

class CustomStreamListener（tweepy.StreamListener）：
def uuu init uuu（self，output_file，api=None）：
超级（CustomStreamListener，self）。\uu初始化
self.num_tweets=0
self.output\u file=输出文件
def on_状态（自身、状态）：
已清除=状态。文本。替换（'\''，''）。替换（'&；''，''）。替换（'''，''）。替换（'，''，''）。替换（'\n''，''）
self.num_tweets=self.num_tweets+1
如果self.num_tweets<500：
self.output_file.write（topicName++、'+status.user.location.encode（“UTF-8”）++、'+cleaned.encode（“UTF-8”）+“\n”）
打印（“捕获tweet编号”+str（self.num_tweets）+”作为搜索词：“+topicName”）
返回真值
其他：
返回错误
系统退出（“终止”）
def on_错误（自身、状态代码）：
打印>>sys.stderr，“遇到状态代码错误：”，状态代码
返回True#不要杀死流
def on_超时（自身）：
打印>>sys.stderr，“超时…”
返回True#不要杀死流
以open（'termList.txt'，'r'）作为f：
topics=[line.strip（）表示f中的行]
对于主题中的topicName：
stamp=datetime.datetime.now（）.strftime（topicName+'-%Y-%m-%d-%H%m%S'）
以打开（stamp+'.csv'，w+'作为主题文件：
sapi=tweepy.streaming.Stream（auth，CustomStreamListener（topicFile））
过滤器（磁道=[topicName]）

具体来说，我的问题是这个。如果track变量有多个条目，如何获得匹配的内容？我还将声明，我对python和tweepy还比较陌生

提前感谢您的建议和帮助

您可以根据匹配的术语检查推文文本。比如：

>>> a = "hello this is a tweet"
>>> terms = [ "this "]
>>> matches = []
>>> for i, term in enumerate( terms ):
...     if( term in a ):
...             matches.append( i )
... 
>>> matches
[0]
>>>

这会让你知道那条特定推文，a，匹配的所有术语。在这种情况下，这只是“这个”术语