Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/291.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 给出URL列表,打印出前3个常用文件名_Python_Counting - Fatal编程技术网

Python 给出URL列表,打印出前3个常用文件名

Python 给出URL列表,打印出前3个常用文件名,python,counting,Python,Counting,给出URL列表,打印出前3个常用文件名 url = [ "http://www.google.com/a.txt", "http://www.google.com.tw/a.txt", "http://www.google.com/download/c.jpg", "http://www.google.co.jp/a.txt", "http://www.google.com/b.txt", "http

给出URL列表,打印出前3个常用文件名

url = [
        "http://www.google.com/a.txt",
        "http://www.google.com.tw/a.txt",
        "http://www.google.com/download/c.jpg",
        "http://www.google.co.jp/a.txt",
        "http://www.google.com/b.txt",
        "http://facebook.com/movie/b.txt",
        "http://yahoo.com/123/000/c.jpg",
        "http://gliacloud.com/haha.png",
    ]
程序应该打印出来

a.txt 3  
b.txt 2  
c.jpg 2
使用re和collections如何?它提供了一个计数器,最常用于提取您的前n个点击

import re
from collections import Counter

pattern = re.compile(r"\w+\.\w+$")
Counter(re.findall(pattern,u)[0] for u in url).most_common(3) 
输出:

[('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]
您可以使用集合中的计数器:

from collections import Counter
res = [a.rsplit('/', 1)[-1] for a in url]
print (Counter(res))
#Counter({'a.txt': 3, 'c.jpg': 2, 'b.txt': 2, 'haha.png': 1})
输出:

[('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]
链接:

更新:

OP询问了前三名:


    import collections
    kk = [a.rsplit('/', 1)[-1] for a in url]
    print (collections.Counter(kk).most_common(3))
    # [('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]

这个集合怎么样。柜台和顶部3有柜台。最常见的3


工作演示:

这是否回答了您的问题?你应该把你试过的东西放进去。我们是来帮忙的,不是来解决问题的。这将打印所有计数,但不是前n个点击数。@Fourier谢谢!我更新了
import collections
url = [
        "http://www.google.com/a.txt",
        "http://www.google.com.tw/a.txt",
        "http://www.google.com/download/c.jpg",
        "http://www.google.co.jp/a.txt",
        "http://www.google.com/b.txt",
        "http://facebook.com/movie/b.txt",
        "http://yahoo.com/123/000/c.jpg",
        "http://gliacloud.com/haha.png",
    ]

splited_url = [i.split('/')[-1] for i in url]
counter = collections.Counter(splited_url)
counter = counter.most_common(3)
for p in counter:
    print('{} {}'.format(p[0], p[1]))