Python 如何从收集的数据集中删除转发

Python 如何从收集的数据集中删除转发,python,jupyter-notebook,jupyter,data-mining,tweepy,Python,Jupyter Notebook,Jupyter,Data Mining,Tweepy,我有一个用python收集的推特数据集(jupyter笔记本)。但是有很多重复的推特。如何使用python(jupyter笔记本)删除这些programmaticaly 当您在tweet列表中迭代时,您可以将tweet列表保存在一个集合中,并检查您是否已经编写了该tweet tweet_set = set() # store tweet ids you've already seen before for tweet in tweepy.Cursor(api.search,q=search_wo

我有一个用python收集的推特数据集(jupyter笔记本)。但是有很多重复的推特。如何使用python(jupyter笔记本)删除这些programmaticaly


当您在tweet列表中迭代时,您可以将tweet列表保存在一个集合中,并检查您是否已经编写了该tweet

tweet_set = set() # store tweet ids you've already seen before
for tweet in tweepy.Cursor(api.search,q=search_words,count=100,
                           lang="id",
                           since=date_since).items():

    if tweet.id not in tweet_set:
        print (tweet.created_at, tweet.text)
        csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

        tweet_set.add(tweet.id) # update the set of tweets

你能提供你的csv文件的样本吗?
search_words = "corona"
date_since = "2020-10-13"
new_search = search_words + " -filter:retweets"
new_search
for tweet in tweepy.Cursor(api.search,q=search_words,count=100,
                           lang="id",
                           since=date_since).items():
    print (tweet.created_at, tweet.text)
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
tweet_set = set() # store tweet ids you've already seen before
for tweet in tweepy.Cursor(api.search,q=search_words,count=100,
                           lang="id",
                           since=date_since).items():

    if tweet.id not in tweet_set:
        print (tweet.created_at, tweet.text)
        csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

        tweet_set.add(tweet.id) # update the set of tweets