Python 对txt文件中的数据进行分组时遇到问题
我是一个初级程序员,我的项目需要我对文本文件进行分类。 我打开的txt文件如下所示: (这并不是txt文件的完整外观,只是当我复制并通过它时,它看起来太凌乱了。因为某种原因,只有另一列刚刚填充了“map”一词) 我希望输出如下:Python 对txt文件中的数据进行分组时遇到问题,python,Python,我是一个初级程序员,我的项目需要我对文本文件进行分类。 我打开的txt文件如下所示: (这并不是txt文件的完整外观,只是当我复制并通过它时,它看起来太凌乱了。因为某种原因,只有另一列刚刚填充了“map”一词) 我希望输出如下: MAG UTC DATE-TIME LAT LON DEPTH Region 4.3 2014/03/12 20:16:59 25.423 -109.730 10.0
MAG UTC DATE-TIME LAT LON DEPTH Region
4.3 2014/03/12 20:16:59 25.423 -109.730 10.0 GULF OF CALIFORNIA
5.2 2014/03/12 20:09:55 36.747 144.050 24.2 JAPAN
5.0 2014/03/12 20:08:25 35.775 141.893 24.5 JAPAN
4.8 2014/03/12 19:59:01 38.101 142.840 17.6 Japan
4.6 2014/03/12 19:55:28 37.400 142.384 24.7 JAPAN
5.0 2014/03/12 19:45:19 -6.187 154.385 62.0 GUINEA
[日本,'4.3','5.2','5.0','4.8','4.6'],[加利福尼亚湾,4.3',[几内亚,5.0]]
我当前的编码:
(第一个for循环中的vlist[7:]给出了区域名称,第二个for循环中的j[1]给出了magtitude编号。)
给出了以下内容的输出:
[[日本,'4.3'],[加利福尼亚湾,4.3],[几内亚,5.0]]
我想知道为什么它只得到其他区域出现的第一个震级数,而不是其他区域。给你:使用
itertools.groupby
,lambda
,map
,str.split
,str.lower
和str.join
如果您的文件如下所示:
MAG UTC DATE-TIME LAT LON DEPTH Region
4.3 2014/03/12 20:16:59 25.423 -109.730 10.0 GULF OF CALIFORNIA
5.2 2014/03/12 20:09:55 36.747 144.050 24.2 JAPAN
5.0 2014/03/12 20:08:25 35.775 141.893 24.5 JAPAN
4.8 2014/03/12 19:59:01 38.101 142.840 17.6 Japan
4.6 2014/03/12 19:55:28 37.400 142.384 24.7 JAPAN
5.0 2014/03/12 19:45:19 -6.187 154.385 62.0 GUINEA
以下是工作代码:
>>> import itertools
>>> f = open('file.txt')
>>> [[" ".join(x),list(map(lambda z:z[0],list(y)))] for x,y in itertools.groupby(sorted(list(map(str.split,map(str.lower,list(f)[1:]))),key=lambda x:" ".join(x[6:])),key=lambda x:x[6:])]
[['guinea', ['5.0']], ['gulf of california', ['4.3']], ['japan', ['5.2', '5.0', '4.8', '4.6']]]
让我解释一下:
>>> f = open('file.txt')
>>> k = list(map(str.lower,list(f)[1:])) # convert all lines to lower case and leave 1st line
>>> k
['4.3 2014/03/12 20:16:59 25.423 -109.730 10.0 gulf of california\n', '5.2 2014/03/12 20:09:55 36.747 144.050 24.2 japan\n', '5.0 2014/03/12 20:08:25 35.775 141.893 24.5 japan\n', '4.8 2014/03/12 19:59:01 38.101 142.840 17.6 japan\n', '4.6 2014/03/12 19:55:28 37.400 142.384 24.7 japan\n', '5.0 2014/03/12 19:45:19 -6.187 154.385 62.0 guinea\n']
>>> k = list(map(str.split,k)) # it will split the lines on whitespaces
>>> k
[['4.3', '2014/03/12', '20:16:59', '25.423', '-109.730', '10.0', 'gulf', 'of', 'california'], ['5.2', '2014/03/12', '20:09:55', '36.747', '144.050', '24.2', 'japan'], ['5.0', '2014/03/12', '20:08:25', '35.775', '141.893', '24.5', 'japan'], ['4.8', '2014/03/12', '19:59:01', '38.101', '142.840', '17.6', 'japan'], ['4.6', '2014/03/12', '19:55:28', '37.400', '142.384', '24.7', 'japan'], ['5.0', '2014/03/12', '19:45:19', '-6.187', '154.385', '62.0', 'guinea']]
>>> k = sorted(k,key = lambda x:" ".join(x[6:])) # it will sort the k on Region
>>> k
[['5.0', '2014/03/12', '19:45:19', '-6.187', '154.385', '62.0', 'guinea'], ['4.3', '2014/03/12', '20:16:59', '25.423', '-109.730', '10.0', 'gulf', 'of', 'california'], ['5.2', '2014/03/12', '20:09:55', '36.747', '144.050', '24.2', 'japan'], ['5.0', '2014/03/12', '20:08:25', '35.775', '141.893', '24.5', 'japan'], ['4.8', '2014/03/12', '19:59:01', '38.101', '142.840', '17.6', 'japan'], ['4.6', '2014/03/12', '19:55:28', '37.400', '142.384', '24.7', 'japan']]
>>> [[" ".join(x),list(map(lambda z:z[0],list(y)))] for x,y in itertools.groupby(k,key = lambda x:x[6:])]
[['guinea', ['5.0']], ['gulf of california', ['4.3']], ['japan', ['5.2', '5.0', '4.8', '4.6']]]
你查过我的密码了吗??
>>> f = open('file.txt')
>>> k = list(map(str.lower,list(f)[1:])) # convert all lines to lower case and leave 1st line
>>> k
['4.3 2014/03/12 20:16:59 25.423 -109.730 10.0 gulf of california\n', '5.2 2014/03/12 20:09:55 36.747 144.050 24.2 japan\n', '5.0 2014/03/12 20:08:25 35.775 141.893 24.5 japan\n', '4.8 2014/03/12 19:59:01 38.101 142.840 17.6 japan\n', '4.6 2014/03/12 19:55:28 37.400 142.384 24.7 japan\n', '5.0 2014/03/12 19:45:19 -6.187 154.385 62.0 guinea\n']
>>> k = list(map(str.split,k)) # it will split the lines on whitespaces
>>> k
[['4.3', '2014/03/12', '20:16:59', '25.423', '-109.730', '10.0', 'gulf', 'of', 'california'], ['5.2', '2014/03/12', '20:09:55', '36.747', '144.050', '24.2', 'japan'], ['5.0', '2014/03/12', '20:08:25', '35.775', '141.893', '24.5', 'japan'], ['4.8', '2014/03/12', '19:59:01', '38.101', '142.840', '17.6', 'japan'], ['4.6', '2014/03/12', '19:55:28', '37.400', '142.384', '24.7', 'japan'], ['5.0', '2014/03/12', '19:45:19', '-6.187', '154.385', '62.0', 'guinea']]
>>> k = sorted(k,key = lambda x:" ".join(x[6:])) # it will sort the k on Region
>>> k
[['5.0', '2014/03/12', '19:45:19', '-6.187', '154.385', '62.0', 'guinea'], ['4.3', '2014/03/12', '20:16:59', '25.423', '-109.730', '10.0', 'gulf', 'of', 'california'], ['5.2', '2014/03/12', '20:09:55', '36.747', '144.050', '24.2', 'japan'], ['5.0', '2014/03/12', '20:08:25', '35.775', '141.893', '24.5', 'japan'], ['4.8', '2014/03/12', '19:59:01', '38.101', '142.840', '17.6', 'japan'], ['4.6', '2014/03/12', '19:55:28', '37.400', '142.384', '24.7', 'japan']]
>>> [[" ".join(x),list(map(lambda z:z[0],list(y)))] for x,y in itertools.groupby(k,key = lambda x:x[6:])]
[['guinea', ['5.0']], ['gulf of california', ['4.3']], ['japan', ['5.2', '5.0', '4.8', '4.6']]]