Python 如何按频率从高到低排序_Python_Python 3.x

Python 如何按频率从高到低排序

python python-3.x

Python 如何按频率从高到低排序,python,python-3.x,Python,Python 3.x,因此，我们的目标是用tweets打开这个twitter文件，并按频率排序标签，以告知趋势主题。我之前已经问过这个问题，但我对代码做了一些修改，现在它正在打印标签和数量，但我如何排序并将其发送到另一个名为trending.txt的文件 counts ={} with open("/Users/Adnan/Desktop/twitter_data.txt") as data: for tag in data: for line in data: for

因此，我们的目标是用tweets打开这个twitter文件，并按频率排序标签，以告知趋势主题。我之前已经问过这个问题，但我对代码做了一些修改，现在它正在打印标签和数量，但我如何排序并将其发送到另一个名为trending.txt的文件

counts ={}
with open("/Users/Adnan/Desktop/twitter_data.txt") as data:
    for tag in data:
        for line in data:
            for part in line.capitalize().split():
                if "#" in part:
                    counts[part] = counts.get(part,0) + 1

for w in counts:
    print((w+','+str(counts[w])+'/n'))

使用字典而不是字典；这是一本专门的词典，包含您想要的现成功能：

from collections import Counter

counts = Counter()
with open("/Users/Adnan/Desktop/twitter_data.txt") as data:
    for tag in data:
        for line in data:
            for part in line.capitalize().split():
                if "#" in part:
                    counts[part] += 1

with open('trending.txt') as trending:
    for hashtag, count in counts.most_common():
        print(hashtag, count, sep=',', file=trending)

按从最频繁到最不频繁的排序顺序生成

（键，计数）

值。您可以通过传递整数来限制返回的条目数：

with open('trending.txt') as trending:
    # The 10 most popular hashtags
    for hashtag, count in counts.most_common(10):
        print(hashtag, count, sep=',', file=trending)

请注意，数据中标记的

只会迭代一次；它将读取第一行，然后处理文件的其余部分。您可以使用next（data，None）
代替该循环：
with open("/Users/Adnan/Desktop/twitter_data.txt") as data:
    tag = next(data, None)  # read the first line
    for line in data:
        for part in line.capitalize().split():
            if "#" in part:
                counts[part] += 1

最后但并非最不重要的一点是，如果您试图生成CSV文件（逗号分隔的数据），请使用：
以上内容按排序顺序将所有计数写入CSV文件。
使用dict和lib将数据写入输出文件：
from collections import Counter
import csv


with open("/Users/Adnan/Desktop/twitter_data.txt") as data, open("trending.txt") as out:
    wr = csv.writer(out)
    counts = Counter(part for tag in map(str.capitalize, data)
                     for part in data.split()
                         if "#" in part)
    wr.writerows(counts.most_common())

使用map（str.capitalize，data）
将str.capitalize
映射到所有行上，这将比在循环中重复调用效率更高，writerows会获取一个iterable的iterable，因此它会将标记、计数
从最常用的元组返回到输出文件的每一行 @Smac89:这是按值排序，顺序相反。此外，这还包括一个计数操作。谢谢你，这是一个有帮助的快速问题，尽管在一些帖子中，他们会“欺凌”和“欺凌”，我怎么能把他们作为同一个标签来计数呢？idk如果.captalize（）正在工作correctly@Rashid，它应该是工作的，如果你添加一个文件的样本，我将有一个look@Rashid，不用担心。
from collections import Counter
import csv


with open("/Users/Adnan/Desktop/twitter_data.txt") as data, open("trending.txt") as out:
    wr = csv.writer(out)
    counts = Counter(part for tag in map(str.capitalize, data)
                     for part in data.split()
                         if "#" in part)
    wr.writerows(counts.most_common())