使用简单代码获取csv文件中整个列的平均值(Python)
我见过类似的问题,但从来没有一个能给出简单明了的蟒蛇式答案 我只是想得到csv文件中“high”列的平均值使用简单代码获取csv文件中整个列的平均值(Python),python,csv,numpy,Python,Csv,Numpy,我见过类似的问题,但从来没有一个能给出简单明了的蟒蛇式答案 我只是想得到csv文件中“high”列的平均值 import csv import numpy as np with open('2010-Jan-June.csv', 'r', encoding='utf8', newline='') as f: highs = [] for row in csv.DictReader(f, delimiter=','): high = int(row['h
import csv
import numpy as np
with open('2010-Jan-June.csv', 'r', encoding='utf8', newline='') as f:
highs = []
for row in csv.DictReader(f, delimiter=','):
high = int(row['high'])
print(sum(highs)/len(highs))
我的csv如下所示:
date,high,low,precip
1-Jan,43,41,0
2-Jan,50,25,0
3-Jan,51,25,0
4-Jan,44,25,0
5-Jan,36,21,0
6-Jan,39,20,0
7-Jan,47,21,0.04
8-Jan,30,14,0
9-Jan,30,12,0
使用熊猫:
import pandas as pd
avg = pd.read_csv(r'/path/to/2010-Jan-June.csv', usecols=['high'], squeeze=True).mean()
使用熊猫:
import pandas as pd
avg = pd.read_csv(r'/path/to/2010-Jan-June.csv', usecols=['high'], squeeze=True).mean()
注意,使用普通Python完全可以做到这一点:
import csv
import statistics as stats
with open('2010-Jan-June.csv') as f:
avg = stats.mean(row['high'] for row in csv.DictReader(f, delimiter=','))
print(avg)
注意,使用普通Python完全可以做到这一点:
import csv
import statistics as stats
with open('2010-Jan-June.csv') as f:
avg = stats.mean(row['high'] for row in csv.DictReader(f, delimiter=','))
print(avg)
因为您导入了
numpy
,所以您可以使用它-几乎和pandas
一样简单:
从样本的粘贴副本中读取:
In [36]: txt="""date,high,low,precip
...: 1-Jan,43,41,0
...: 2-Jan,50,25,0
...: 3-Jan,51,25,0
...: 4-Jan,44,25,0
...: 5-Jan,36,21,0
...: 6-Jan,39,20,0
...: 7-Jan,47,21,0.04
...: 8-Jan,30,14,0
...: 9-Jan,30,12,0"""
带有numpy 1.14的Python3喜欢使用编码
参数:
In [38]: data = np.genfromtxt(txt.splitlines(),delimiter=',',dtype=None,names=True,
...: encoding=None)
In [39]: data
Out[39]:
array([('1-Jan', 43, 41, 0. ), ('2-Jan', 50, 25, 0. ),
('3-Jan', 51, 25, 0. ), ('4-Jan', 44, 25, 0. ),
('5-Jan', 36, 21, 0. ), ('6-Jan', 39, 20, 0. ),
('7-Jan', 47, 21, 0.04), ('8-Jan', 30, 14, 0. ),
('9-Jan', 30, 12, 0. )],
dtype=[('date', '<U5'), ('high', '<i8'), ('low', '<i8'), ('precip', '<f8')])
或者在一行中,仅加载一列:
In [44]: np.genfromtxt(txt.splitlines(),delimiter=',',skip_header=1,usecols=[1]).mean()
Out[44]: 41.111111111111114
因为您导入了
numpy
,所以您可以使用它-几乎和pandas
一样简单:
从样本的粘贴副本中读取:
In [36]: txt="""date,high,low,precip
...: 1-Jan,43,41,0
...: 2-Jan,50,25,0
...: 3-Jan,51,25,0
...: 4-Jan,44,25,0
...: 5-Jan,36,21,0
...: 6-Jan,39,20,0
...: 7-Jan,47,21,0.04
...: 8-Jan,30,14,0
...: 9-Jan,30,12,0"""
带有numpy 1.14的Python3喜欢使用编码
参数:
In [38]: data = np.genfromtxt(txt.splitlines(),delimiter=',',dtype=None,names=True,
...: encoding=None)
In [39]: data
Out[39]:
array([('1-Jan', 43, 41, 0. ), ('2-Jan', 50, 25, 0. ),
('3-Jan', 51, 25, 0. ), ('4-Jan', 44, 25, 0. ),
('5-Jan', 36, 21, 0. ), ('6-Jan', 39, 20, 0. ),
('7-Jan', 47, 21, 0.04), ('8-Jan', 30, 14, 0. ),
('9-Jan', 30, 12, 0. )],
dtype=[('date', '<U5'), ('high', '<i8'), ('low', '<i8'), ('precip', '<f8')])
或者在一行中,仅加载一列:
In [44]: np.genfromtxt(txt.splitlines(),delimiter=',',skip_header=1,usecols=[1]).mean()
Out[44]: 41.111111111111114
这里是我尝试在一个pythonic答案只使用csv库
import csv
with open ('names.csv') as csvfile:
reader = csv.DictReader(csvfile)
print sum(float(d['high']) for d in reader) / (reader.line_num - 1)
如果文件中没有行,则将有一个除以0的值。以下是我仅使用csv库的pythonic答案的尝试
import csv
with open ('names.csv') as csvfile:
reader = csv.DictReader(csvfile)
print sum(float(d['high']) for d in reader) / (reader.line_num - 1)
如果文件中没有行,则将有一个除以0的值。+这是我的错+1将
high=int(row['high'])
更改为high。追加(int(row['high'])
在high\u avgs()前面放一个def
:
@PaulPanzer它实际上就在那里。我想把代码从我的函数框中拿出来。将high=int(row['high'])
更改为highs。追加(int(row['high'])
在high\u avgs()前面放一个def
,它实际上就在那里。我想把代码从我的功能盒中拿出来。