Python 按每个转换器的浮点值读取数据
我有一个名为“filename”的csv文件,希望将这些数据作为64float读取,但“hour”列除外。我用pd.read_csv-函数和一个转换器来管理它Python 按每个转换器的浮点值读取数据,python,csv,pandas,Python,Csv,Pandas,我有一个名为“filename”的csv文件,希望将这些数据作为64float读取,但“hour”列除外。我用pd.read_csv-函数和一个转换器来管理它 df = pd.read_csv("../data/filename.csv", delimiter = ';', date_parser = ['hour'], skiprows = 1, conver
df = pd.read_csv("../data/filename.csv",
delimiter = ';',
date_parser = ['hour'],
skiprows = 1,
converters={'column1': lambda x: float(x.replace ('.','').replace(',','.'))})
现在,我有两点:
第一:
分隔符与,但如果我在记事本中查看我的数据,就会发现有“,”而不是“;”。但是如果我取“,”我得到:“pandas.parser.CParserError:错误标记化数据。C错误:第13行应为7个字段,锯为9'
第二:
如果我想对所有列使用转换器,我怎么能得到这个?!正确的术语是什么?
我尝试在readin函数中使用dtype=float,但得到的是'AttributeError:'NoneType'对象没有属性'dtype'',发生了什么?这就是为什么我想用转换器来管理它
df = pd.read_csv("../data/filename.csv",
delimiter = ';',
date_parser = ['hour'],
skiprows = 1,
converters={'column1': lambda x: float(x.replace ('.','').replace(',','.'))})
数据:
,小时,PV,陆上风,海上风,PV.1,陆上风。1,风 海上1,PV.2,陆上风2,海上风2 0,1,0.0,"12,985.0","9,614.0",0.0,"32,825.5","9,495.7",0.0,"13,110.3","10,855.5" 1,2,0.0,"12,908.9","9,290.8",0.0,"36,052.3","9,589.1",0.0,"13,670.2","10,828.6" 2,3,0.0,"12,740.9","8,886.9",0.0,"38,540.9","10,087.3",0.0,"14,610.8","10,828.6" 3,4,0.0,"12,485.3","8,644.5",0.0,"40,734.0","10,087.3",0.0,"15,638.3","10,343.7" 4,5,0.0,"11,188.5","8,079.0",0.0,"42,688.0","10,087.3",0.0,"16,809.4","10,343.7" 5,6,0.0,11219.0,7594.2,0.0,43333.5,10025.0,0.0,18266.9,10343.7 这应该起作用:
In [40]:
# text data
temp=''',hour,PV,Wind onshore,Wind offshore,PV.1,Wind onshore.1,Wind offshore.1,PV.2,Wind onshore.2,Wind offshore.2
0,1,0.0,"12,985.0","9,614.0",0.0,"32,825.5","9,495.7",0.0,"13,110.3","10,855.5"
1,2,0.0,"12,908.9","9,290.8",0.0,"36,052.3","9,589.1",0.0,"13,670.2","10,828.6"
2,3,0.0,"12,740.9","8,886.9",0.0,"38,540.9","10,087.3",0.0,"14,610.8","10,828.6"
3,4,0.0,"12,485.3","8,644.5",0.0,"40,734.0","10,087.3",0.0,"15,638.3","10,343.7"
4,5,0.0,"11,188.5","8,079.0",0.0,"42,688.0","10,087.3",0.0,"16,809.4","10,343.7"
5,6,0.0,"11,219.0","7,594.2",0.0,"43,333.5","10,025.0",0.0,"18,266.9","10,343.7"'''
# so read the csv, pass params quotechar and the thousands character
df = pd.read_csv(io.StringIO(temp), quotechar='"', thousands=',')
df
Out[40]:
Unnamed: 0 hour PV Wind onshore Wind offshore PV.1 Wind onshore.1 \
0 0 1 0 12985.0 9614.0 0 32825.5
1 1 2 0 12908.9 9290.8 0 36052.3
2 2 3 0 12740.9 8886.9 0 38540.9
3 3 4 0 12485.3 8644.5 0 40734.0
4 4 5 0 11188.5 8079.0 0 42688.0
5 5 6 0 11219.0 7594.2 0 43333.5
Wind offshore.1 PV.2 Wind onshore.2 Wind offshore.2
0 9495.7 0 13110.3 10855.5
1 9589.1 0 13670.2 10828.6
2 10087.3 0 14610.8 10828.6
3 10087.3 0 15638.3 10343.7
4 10087.3 0 16809.4 10343.7
5 10025.0 0 18266.9 10343.7
In [41]:
# check the dtypes
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 0 to 5
Data columns (total 11 columns):
Unnamed: 0 6 non-null int64
hour 6 non-null int64
PV 6 non-null float64
Wind onshore 6 non-null float64
Wind offshore 6 non-null float64
PV.1 6 non-null float64
Wind onshore.1 6 non-null float64
Wind offshore.1 6 non-null float64
PV.2 6 non-null float64
Wind onshore.2 6 non-null float64
Wind offshore.2 6 non-null float64
dtypes: float64(9), int64(2)
memory usage: 576.0 bytes
首先在所有感兴趣的列上替换逗号分隔符,然后像这样调用
convert\u objects
会更快:df.convert\u objects(convert\u numeric=True)
您能发布示例数据吗,所有行的格式是否相同是另一个问题。如果read\u csv
无法进行转换,最好在将其作为字符串读入后进行转换。在记事本中,数据如下:,小时,PV,岸上风,海上风,PV.1,岸上风,1,海上风,PV.2,岸上风,2,海上风,20,1,0.0,“12985.0”,“9614.0”,0.0,“32825.5”,“9495.7”,0.0,“13110.3”,“10855.5”1,2,0.0,“12908.9”,“9290.8”,0.0,“36052.3”,“9589.1”,0.0,“13670.2”,“10828.6”2,3,0.0.0,“9495.7”,0.0,“10828.6”13110.3,0.3,0.0,”12908.9,“389”,389“,"10,087.3",0.0,"14,610.8","10,828.6" 3,4,0.0,"12,485.3","8,644.5",0.0,"40,734.0","10,087.3",0.0,"15,638.3","10,343.7" 4,5,0.0,"11,188.5","8,079.0",0.0,"42,688.0","10,087.3",0.0,"16,809.4","10,343.7" 5,6,0.0,"11,219.0","7,594.2",0.0,"43,333.5","10,025.0",0.0,"18,266.9","10,343.7“好的。如果我这样做,我得到的名称“io”是没有定义的。此外:我怎样才能改变所有列的数据类型后,读与读_csv?。。。或特定列的?忽略仅适用于我的io位,对您来说,行应该是:df=pd.read_csv(../data/filename.csv),delimiter=';',date_parser=['hour'],skiprows=1,quotechar=',')
您可以接受(并向上投票)如果答案回答了你的问题,那么答案就是正确的