Python Pandas read_csv():将0保留为0(不将其转换为NaN)
我正在尝试读取一个csv文件,其中一个示例:Python Pandas read_csv():将0保留为0(不将其转换为NaN),python,pandas,nan,Python,Pandas,Nan,我正在尝试读取一个csv文件,其中一个示例: datetime,check,lat,lon,co_alpha,atn,status,bc 2012-10-27 15:00:59,2,0,0,2.427,,, 2012-10-27 15:01:00,2,0,0,2.407,,, 2012-10-27 15:02:49,2,0,0,2.207,-17.358,0,-16162 2012-10-27 15:02:50,2,0,0,2.207,-17.354,0,8192 2012-10-27 15:0
datetime,check,lat,lon,co_alpha,atn,status,bc
2012-10-27 15:00:59,2,0,0,2.427,,,
2012-10-27 15:01:00,2,0,0,2.407,,,
2012-10-27 15:02:49,2,0,0,2.207,-17.358,0,-16162
2012-10-27 15:02:50,2,0,0,2.207,-17.354,0,8192
2012-10-27 15:02:51,1,0,0,2.207,-17.358,0,-8152
2012-10-27 15:02:52,1,0,0,2.207,-17.358,0,648
2012-10-27 15:06:03,0,51.195076,4.444407,2.349,-17.289,0,4909
2012-10-27 15:06:04,0,51.195182,4.44427,2.344,-17.289,0,587
2012-12-05 09:21:34,,,,,42.960,1,16430
2012-12-05 09:21:35,,,,,42.962,1,3597
我遇到的问题是,在只有整数的列中,0被转换为NaN(例如“check”列和“status”列,它们是只有整数的列,但由于缺少实际值,因此该列被读取为浮点)。但我只希望将空值转换为NaN,而不是零
这就是我得到的:
>>> pd.read_clipboard(sep=',', parse_dates=True, index_col=0)
check lat lon co_alpha atn status bc
datetime
2012-10-27 15:00:59 2 0.000000 0.000000 2.427 NaN NaN NaN
2012-10-27 15:01:00 2 0.000000 0.000000 2.407 NaN NaN NaN
2012-10-27 15:02:49 2 0.000000 0.000000 2.207 -17.358 NaN -16162
2012-10-27 15:02:50 2 0.000000 0.000000 2.207 -17.354 NaN 8192
2012-10-27 15:02:51 1 0.000000 0.000000 2.207 -17.358 NaN -8152
2012-10-27 15:02:52 1 0.000000 0.000000 2.207 -17.358 NaN 648
2012-10-27 15:06:03 NaN 51.195076 4.444407 2.349 -17.289 NaN 4909
2012-10-27 15:06:04 NaN 51.195182 4.444270 2.344 -17.289 NaN 587
2012-12-05 09:21:34 NaN NaN NaN NaN 42.960 1 16430
2012-12-05 09:21:35 NaN NaN NaN NaN 42.962 1 3597
因此,在“检查”和“状态”列中,有许多NaN。在“lat”和“lon”列中,0不转换为NaN
- 使用
和na\u values=''
没有帮助。有没有办法指定不将int 0转换为NaN?还是这是一只虫子keep\u default\u na=False
- 我可以使用
关键字将特定列的数据类型指定为int。这使0保持为0,但问题是这些列也包含真实的NaN(空值)。因此,在本例中,这些值也会转换为0,因为在int列中不能有NaN。因此,我必须将所有列保持为浮动dtype
编辑:升级到pandas 0.10.1后,即使未指定
保留默认值\u na
和na\u值,它也能按预期工作:
>>> pd.read_clipboard(sep=',', parse_dates=True, index_col=0)
check lat lon co_alpha atn status bc
datetime
2012-10-27 15:00:59 2 0.000000 0.000000 2.427 NaN NaN NaN
2012-10-27 15:01:00 2 0.000000 0.000000 2.407 NaN NaN NaN
2012-10-27 15:02:49 2 0.000000 0.000000 2.207 -17.358 0 -16162
2012-10-27 15:02:50 2 0.000000 0.000000 2.207 -17.354 0 8192
2012-10-27 15:02:51 1 0.000000 0.000000 2.207 -17.358 0 -8152
2012-10-27 15:02:52 1 0.000000 0.000000 2.207 -17.358 0 648
2012-10-27 15:06:03 0 51.195076 4.444407 2.349 -17.289 0 4909
2012-10-27 15:06:04 0 51.195182 4.444270 2.344 -17.289 0 587
2012-12-05 09:21:34 NaN NaN NaN NaN 42.960 1 16430
2012-12-05 09:21:35 NaN NaN NaN NaN 42.962 1 3597
您必须首先将保持默认值\u na
设置为False
:
df = pd.read_clipboard(sep=',', index_col=0, keep_default_na=False, na_values='')
In [2]: df
Out[2]:
check lat lon co_alpha atn status bc
datetime
2012-10-27 15:00:59 2 0.000000 0.000000 2.427 NaN NaN NaN
2012-10-27 15:01:00 2 0.000000 0.000000 2.407 NaN NaN NaN
2012-10-27 15:02:49 2 0.000000 0.000000 2.207 -17.358 0 -16162
2012-10-27 15:02:50 2 0.000000 0.000000 2.207 -17.354 0 8192
2012-10-27 15:02:51 1 0.000000 0.000000 2.207 -17.358 0 -8152
2012-10-27 15:02:52 1 0.000000 0.000000 2.207 -17.358 0 648
2012-10-27 15:06:03 0 51.195076 4.444407 2.349 -17.289 0 4909
2012-10-27 15:06:04 0 51.195182 4.444270 2.344 -17.289 0 587
2012-12-05 09:21:34 NaN NaN NaN NaN 42.960 1 16430
2012-12-05 09:21:35 NaN NaN NaN NaN 42.962 1 3597
从以下文档字符串:
keep_default_na
:bool,default True
如果指定了na_值
,并且keep_default_na
为False
则默认值为NaN
值将被覆盖,否则将附加到
na_值
:类似于或dict的列表,默认值None
要识别为NA/NaN的其他字符串。如果dict通过,具体的
每列NA值
您必须首先将保持默认值\u na
设置为False
:
df = pd.read_clipboard(sep=',', index_col=0, keep_default_na=False, na_values='')
In [2]: df
Out[2]:
check lat lon co_alpha atn status bc
datetime
2012-10-27 15:00:59 2 0.000000 0.000000 2.427 NaN NaN NaN
2012-10-27 15:01:00 2 0.000000 0.000000 2.407 NaN NaN NaN
2012-10-27 15:02:49 2 0.000000 0.000000 2.207 -17.358 0 -16162
2012-10-27 15:02:50 2 0.000000 0.000000 2.207 -17.354 0 8192
2012-10-27 15:02:51 1 0.000000 0.000000 2.207 -17.358 0 -8152
2012-10-27 15:02:52 1 0.000000 0.000000 2.207 -17.358 0 648
2012-10-27 15:06:03 0 51.195076 4.444407 2.349 -17.289 0 4909
2012-10-27 15:06:04 0 51.195182 4.444270 2.344 -17.289 0 587
2012-12-05 09:21:34 NaN NaN NaN NaN 42.960 1 16430
2012-12-05 09:21:35 NaN NaN NaN NaN 42.962 1 3597
从以下文档字符串:
keep_default_na
:bool,default True
如果指定了na_值
,并且keep_default_na
为False
则默认值为NaN
值将被覆盖,否则将附加到
na_值
:类似于或dict的列表,默认值None
要识别为NA/NaN的其他字符串。如果dict通过,具体的
每列NA值
这对我来说似乎没有什么区别(熊猫0.10.0),我仍然得到NaN而不是0。@joris这很奇怪,也许升级到0.10.1?我在我的问题中添加了我的输出。但是我会在0.10.1中尝试。它在0.10.1中工作!谢谢你的建议!但它甚至可以在不指定保留默认值
和默认值
的情况下工作@joris我想我疯了,我肯定我已经测试过了!你说得很对。这对我来说似乎没有什么区别(熊猫0.10.0),我仍然得到NaN而不是零。@joris这很奇怪,也许升级到0.10.1?我在我的问题中添加了我的输出。但是我会在0.10.1中尝试。它在0.10.1中工作!谢谢你的建议!但它甚至可以在不指定保留默认值
和默认值
的情况下工作@joris我想我疯了,我肯定我已经测试过了!你说得很对。