Python 3.x 特定字段Python3_Python 3.x

Python 3.x 特定字段Python3

python-3.x

Python 3.x 特定字段Python3,python-3.x,Python 3.x,我尝试从我的Qdata.txt文件中选择特定字段，并使用字段[2]计算每一年的平均值。我的代码只给出总平均值数据文件如下所示：（1.一年中的某一天：101，最后一天：1231）日期3703006701500 20000101 21.00 223.00 20000102 20.00 218.00 200012317.40104.00 20010101 6.70 104.00 20130101 8.37111.63 20131231 45.00 120.98 通过使用前四个字符作为分组键，

我尝试从我的Qdata.txt文件中选择特定字段，并使用字段[2]计算每一年的平均值。我的代码只给出总平均值

数据文件如下所示：（1.一年中的某一天：101，最后一天：1231）

日期3703006701500

20000101 21.00 223.00

20000102 20.00 218.00

200012317.40104.00

20010101 6.70 104.00

20130101 8.37111.63

20131231 45.00 120.98

通过使用前四个字符作为分组键，可以使用itertools.groups

with open("data.txt") as f:
    next(f) # skip first line
    groups = itertools.groupby(f, key=lambda s: s[:4])
    for k, g in groups:
        print(k, [s.split() for s in g])

这将为您提供按年份分组的条目，以便进一步处理。示例数据的输出：

2000 [['20000101', '21.00', '223.00'], ['20000102', '20.00', '218.00'], ['20001231', '7.40', '104.00']]
2001 [['20010101', '6.70', '104.00']]
2013 [['20130101', '8.37', '111.63'], ['20131231', '45.00', '120.98']]

您可以为

total

和

count

创建

dict

（甚至是

defaultdict

）：

import sys
from collections import defaultdict

td=open("Qdata.txt","r") # open file Qdata

total=defaultdict(float)
count=defaultdict(int)
row1=True

for row in td :
    if (row1) :
        row1=False # row1 is for topic

    else:
        fields=row.split() 
        try:
            year = int(fields[0][:4])
            total[year] += float(fields[2])
            count[year] += 1
        # Errors.
        except IndexError:
            continue
        except ValueError:      
            print("File is incorrect.")
            sys.exit()

print("Average in 2000 was: ",total[2000]/count[2000])

每年分开？您必须将您的输入分成多个组，这可能是您想要的：

from collections import defaultdict

row1 = True
year_sums = defaultdict(list)

for row in td:
    if row1:
        row1 = False
        continue
    fields = row.split()
    year = fields[0][:4]
    year_sums[year].append(float(fields[2]))

for year in year_sums:
    avarage = sum(year_sums[year])/count(year_sums[year])
    print("Avarage in {} was: {}".format(year, avarage)

这只是一些示例代码，我不知道它是否确实有效，但应该可以让您知道您可以做些什么

year\u sums

是一个

defaultdict

，包含按年份分组的值列表。如果需要，您可以将其用于其他统计信息。

您可以使用

itertools.groupby

将前四个字符用作分组键。

from collections import defaultdict

row1 = True
year_sums = defaultdict(list)

for row in td:
    if row1:
        row1 = False
        continue
    fields = row.split()
    year = fields[0][:4]
    year_sums[year].append(float(fields[2]))

for year in year_sums:
    avarage = sum(year_sums[year])/count(year_sums[year])
    print("Avarage in {} was: {}".format(year, avarage)