Python 如何从收集的数据集中删除转发_Python_Jupyter Notebook_Jupyter_Data Mining_Tweepy

Python 如何从收集的数据集中删除转发

python jupyter-notebook

Python 如何从收集的数据集中删除转发,python,jupyter-notebook,jupyter,data-mining,tweepy,Python,Jupyter Notebook,Jupyter,Data Mining,Tweepy,我有一个用python收集的推特数据集（jupyter笔记本）。但是有很多重复的推特。如何使用python（jupyter笔记本）删除这些programmaticaly 当您在tweet列表中迭代时，您可以将tweet列表保存在一个集合中，并检查您是否已经编写了该tweet tweet_set = set() # store tweet ids you've already seen before for tweet in tweepy.Cursor(api.search,q=search_wo

我有一个用python收集的推特数据集（jupyter笔记本）。但是有很多重复的推特。如何使用python（jupyter笔记本）删除这些programmaticaly

当您在tweet列表中迭代时，您可以将tweet列表保存在一个集合中，并检查您是否已经编写了该tweet

tweet_set = set() # store tweet ids you've already seen before
for tweet in tweepy.Cursor(api.search,q=search_words,count=100,
                           lang="id",
                           since=date_since).items():

    if tweet.id not in tweet_set:
        print (tweet.created_at, tweet.text)
        csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

        tweet_set.add(tweet.id) # update the set of tweets

你能提供你的csv文件的样本吗？

search_words = "corona"
date_since = "2020-10-13"

new_search = search_words + " -filter:retweets"
new_search

for tweet in tweepy.Cursor(api.search,q=search_words,count=100,
                           lang="id",
                           since=date_since).items():
    print (tweet.created_at, tweet.text)
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

tweet_set = set() # store tweet ids you've already seen before
for tweet in tweepy.Cursor(api.search,q=search_words,count=100,
                           lang="id",
                           since=date_since).items():

    if tweet.id not in tweet_set:
        print (tweet.created_at, tweet.text)
        csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

        tweet_set.add(tweet.id) # update the set of tweets