Python 函数-返回前10个结果

Python 函数-返回前10个结果,python,pyspark,Python,Pyspark,我有一个关于足球运动员的不同特征的数据集(姓名、年龄、速度、球队等) 现在我想知道前十名最年轻的球员 我已经通过map()函数收集了所有玩家的姓名和年龄,但我只想打印前10个结果 这是我的实际代码: from pyspark import SparkConf, SparkContext conf = SparkConf().setMaster("local").setAppName("CustomerExpenditure") sc = SparkContext(conf = conf) d

我有一个关于足球运动员的不同特征的数据集(姓名、年龄、速度、球队等)

现在我想知道前十名最年轻的球员

我已经通过
map()
函数收集了所有玩家的姓名和年龄,但我只想打印前10个结果

这是我的实际代码:

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("CustomerExpenditure")
sc = SparkContext(conf = conf)

def age(line):
    fields = line.split(",")

    return(str(fields[0]), str(fields[14]))


file = sc.textFile("file:///Users/carlos/PycharmProjects/NONSQL/Project/FullData.csv")

oldestsPlayers = file.map(age)

topOldestPlayers = oldestsPlayers.map(lambda x: (x[1], x[0])).sortByKey()


results=topOldestPlayers.collect()



for result in results:
        print(result)

('33', 'Nathan Rutjes') 
('33', 'Jeppe Curth') 
('33', 'Ognjen Vukojević') 
('33', 'Marco Padalino') 
('33', 'Brian Murphy') 
('33', 'Adrián Cortés') 
('33', 'Yaír Urbina') 
('33', 'Kim Chi Gon')
('33', 'Jacques Faty')
('33', 'Sander Asevedo')
('33', 'Alan Besseiro')
('33', 'Sandro Couteiro')
('33', 'Murilo Sancha')
('33', 'Mateus Couteira')
('33', 'Peixotacinho')
('33', 'Danisco Fachini')
('33', 'Fabiem Jardim')
('33', 'Carlos Travisso')
('33', 'Maksymilian Rogalski')
('33', 'César Valoyes')
('33', 'Dougie Imrie')
('33', 'Darren Jones')
('33', 'Iacopo La Rocca')
('33', 'Dioh Williams')
('33', 'David Fox')
('33', 'Michael Tonge')
('33', 'Paul Green')

这是输出:

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("CustomerExpenditure")
sc = SparkContext(conf = conf)

def age(line):
    fields = line.split(",")

    return(str(fields[0]), str(fields[14]))


file = sc.textFile("file:///Users/carlos/PycharmProjects/NONSQL/Project/FullData.csv")

oldestsPlayers = file.map(age)

topOldestPlayers = oldestsPlayers.map(lambda x: (x[1], x[0])).sortByKey()


results=topOldestPlayers.collect()



for result in results:
        print(result)

('33', 'Nathan Rutjes') 
('33', 'Jeppe Curth') 
('33', 'Ognjen Vukojević') 
('33', 'Marco Padalino') 
('33', 'Brian Murphy') 
('33', 'Adrián Cortés') 
('33', 'Yaír Urbina') 
('33', 'Kim Chi Gon')
('33', 'Jacques Faty')
('33', 'Sander Asevedo')
('33', 'Alan Besseiro')
('33', 'Sandro Couteiro')
('33', 'Murilo Sancha')
('33', 'Mateus Couteira')
('33', 'Peixotacinho')
('33', 'Danisco Fachini')
('33', 'Fabiem Jardim')
('33', 'Carlos Travisso')
('33', 'Maksymilian Rogalski')
('33', 'César Valoyes')
('33', 'Dougie Imrie')
('33', 'Darren Jones')
('33', 'Iacopo La Rocca')
('33', 'Dioh Williams')
('33', 'David Fox')
('33', 'Michael Tonge')
('33', 'Paul Green')


当您使用collect时,您将带回所有数据,然后进行筛选。这意味着有大量的数据,你的记忆可能会崩溃

results = topOldestPlayers.take(10)
这种方法只会带来前10名,而不会带来一切。 然后,如果你只想打印它们

for r in results: print(*r, sep=': ')

您是否尝试过打印(结果[:10])?这是一个选项,但我想逐行打印。如果我只是打印(结果[10]),这就是输出:
[('17','Matthijs de Ligt'),('17','Justin Kluivert'),('17','Kai Havertz'),('17','Alexander Isak'),('17','AbdülkadirÖmür'),('17','Misael Domínguez'),('17','Jos Gomes,'17','17','Alessandro Bastoni'),('17','Boubacar Kamara'),('17','17','Zaydou Youuf]
我想要类似这样的东西:``卡洛斯:17佩佩:17胡安:18```<代码>对于结果中的r[:10]:print(r)这些是python的基础知识。谢谢,我知道这些是基础知识,但我正在自学,所以还有一些函数我不知道。再次感谢你的帮助。