Python 具有平均函数的NaN结果
我试图在Python数据帧中获得一行的平均值,但每行都会得到一个NaN返回值。 为什么我会得到这个结果,我如何解决这个问题 Goog键比率: 返回:Python 具有平均函数的NaN结果,python,pandas,Python,Pandas,我试图在Python数据帧中获得一行的平均值,但每行都会得到一个NaN返回值。 为什么我会得到这个结果,我如何解决这个问题 Goog键比率: 返回: Gross Margin % NaN dtype: float64 关于,您之所以有一堆nan值,是因为您没有同构的列类型。因此,例如,当您尝试在列之间求平均值时,它没有意义,因为pandas.read\u csv只有在有意义的情况下才会转换为数字列,例如,在与数字相同的列中没有字符串日期或其他文本 我还建议您在进行简单的分析之前,先进行一
Gross Margin % NaN
dtype: float64
关于,您之所以有一堆
nan
值,是因为您没有同构的列类型。因此,例如,当您尝试在列之间求平均值时,它没有意义,因为pandas.read\u csv
只有在有意义的情况下才会转换为数字列,例如,在与数字相同的列中没有字符串日期或其他文本
我还建议您在进行简单的分析之前,先进行一次简单的df.head()
检查数据。当你想知道为什么你的输出是“奇怪的”时,这将为你节省很多时间
也就是说,您可以执行以下操作将内容转换为数值,但这并不一定有意义:
In [35]: df = read_csv('GOOG Key Ratios.csv', skiprows=2, index_col=0, names=['Y%d' % i for i in range(11)])
In [36]: df.head() # not homogeneously typed columns
Out[36]:
Y0 Y1 Y2 Y3 Y4 \
NaN 2003-12 2004-12 2005-12 2006-12 2007-12
Revenue USD Mil 1,466 3,189 6,139 10,605 16,594
Gross Margin % 57.3 54.3 58.1 60.2 59.9
Operating Income USD Mil 342 640 2,017 3,550 5,084
Operating Margin % 23.4 20.1 32.9 33.5 30.6
Y5 Y6 Y7 Y8 Y9 Y10
NaN 2008-12 2009-12 2010-12 2011-12 2012-12 TTM
Revenue USD Mil 21,796 23,651 29,321 37,905 50,175 55,797
Gross Margin % 60.4 62.6 64.5 65.2 58.9 56.7
Operating Income USD Mil 6,632 8,312 10,381 11,742 12,760 12,734
Operating Margin % 30.4 35.1 35.4 31.0 25.4 22.8
In [37]: df.convert_objects(convert_numeric=True).head()
Out[37]:
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Revenue USD Mil NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Gross Margin % 57.3 54.3 58.1 60.2 59.9 60.4 62.6 64.5 65.2 58.9 56.7
Operating Income USD Mil 342.0 640.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Operating Margin % 23.4 20.1 32.9 33.5 30.6 30.4 35.1 35.4 31.0 25.4 22.8
如果您的输出中有
nan
,那么这通常意味着输入中只有nan
,对bug进行模化。如何检查输入是否为nan?因为当我打印grossMargin时,我通过调用任何对象上的pandas.isnull
得到了数值。例如,您可以调用pandas.isnull(the_frame)
。另一种在输出中获取NaN的方法是,如果数据类型与您认为的不同,那么数学运算就不起作用。你确定它们是数字而不是字符串吗?我得到了一个非空的返回值。奇怪的.data.replace(',','',regex=True)进行替换,不使用任何内容,并且易于使用绅士方法转换为浮点类型!很高兴您喜欢replace
方法!
In [35]: df = read_csv('GOOG Key Ratios.csv', skiprows=2, index_col=0, names=['Y%d' % i for i in range(11)])
In [36]: df.head() # not homogeneously typed columns
Out[36]:
Y0 Y1 Y2 Y3 Y4 \
NaN 2003-12 2004-12 2005-12 2006-12 2007-12
Revenue USD Mil 1,466 3,189 6,139 10,605 16,594
Gross Margin % 57.3 54.3 58.1 60.2 59.9
Operating Income USD Mil 342 640 2,017 3,550 5,084
Operating Margin % 23.4 20.1 32.9 33.5 30.6
Y5 Y6 Y7 Y8 Y9 Y10
NaN 2008-12 2009-12 2010-12 2011-12 2012-12 TTM
Revenue USD Mil 21,796 23,651 29,321 37,905 50,175 55,797
Gross Margin % 60.4 62.6 64.5 65.2 58.9 56.7
Operating Income USD Mil 6,632 8,312 10,381 11,742 12,760 12,734
Operating Margin % 30.4 35.1 35.4 31.0 25.4 22.8
In [37]: df.convert_objects(convert_numeric=True).head()
Out[37]:
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Revenue USD Mil NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Gross Margin % 57.3 54.3 58.1 60.2 59.9 60.4 62.6 64.5 65.2 58.9 56.7
Operating Income USD Mil 342.0 640.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Operating Margin % 23.4 20.1 32.9 33.5 30.6 30.4 35.1 35.4 31.0 25.4 22.8