Python 哪一种是使数据帧变平的最有效的方法?
我有一个大熊猫数据框,有8列和几个Python 哪一种是使数据帧变平的最有效的方法?,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个大熊猫数据框,有8列和几个NaN值: 0 1 2 3 4 5 6 7 8 1 Google, Inc. (Date 11/07/2016) NaN NaN NaN NaN NaN NaN NaN NaN 2 Apple Inc. (Date 07/01/2016) Amazon (Date 11/01/2016) NaN NaN NaN NaN N
NaN
值:
0 1 2 3 4 5 6 7 8
1 Google, Inc. (Date 11/07/2016) NaN NaN NaN NaN NaN NaN NaN NaN
2 Apple Inc. (Date 07/01/2016) Amazon (Date 11/01/2016) NaN NaN NaN NaN NaN NaN NaN
3 IBM, Inc. (Date 11/08/2016) NaN NaN NaN NaN NaN NaN NaN NaN
4 Microsoft (Date 11/10/2016) Google, Inc. (Date 11/10/1990) Google, Inc. (Date 11/07/2016) Samsung (Date 05/02/2016) NaN NaN NaN NaN NaN
我怎样才能像这样把它压平:
0 companies
1 Google, Inc. (Date 11/07/2016)
2 Apple Inc. (Date 07/01/2016)
3 Amazon (Date 11/01/2016)
4 IBM, Inc. (Date 11/08/2016)
5 Microsoft (Date 11/10/2016)
6 Google, Inc. (Date 11/10/1990)
7 Google, Inc. (Date 11/07/2016)
8 Samsung (Date 05/02/2016)
我读了这本书,试着:
df.iloc[:,0]
问题是我丢失了其他列的信息和顺序。我想知道如何在其他单元格中展开而不丢失数据并进行排序?这可能会起作用:
df = pd.DataFrame([
["Google, Inc. (Date 11/07/2016)", float("NaN")],
["Apple Inc. (Date 07/01/2016)", "Amazon (Date 11/01/2016)"]])
unstacked = df.T.unstack()
unstacked.dropna(inplace=True)
unstacked.reset_index(drop=True, inplace=True)
unstacked
输出:
0 Google, Inc. (Date 11/07/2016)
1 Apple Inc. (Date 07/01/2016)
2 Amazon (Date 11/01/2016)
dtype: object
请注意,看一下如何在问题中提供好的示例。您可以堆叠列,也可以选择重置索引。默认情况下,堆栈会删除NaN的
df.stack()
Out:
0 0 Google, Inc. (Date 11/07/2016)
1 0 Apple Inc. (Date 07/01/2016)
1 Amazon (Date 11/01/2016)
2 0 IBM, Inc. (Date 11/08/2016)
3 0 Microsoft (Date 11/10/2016)
1 Google, Inc. (Date 11/10/1990)
2 Google, Inc. (Date 11/07/2016)
3 Samsung (Date 05/02/2016)
dtype: object
df.stack().reset_index(drop=True)
Out:
0 Google, Inc. (Date 11/07/2016)
1 Apple Inc. (Date 07/01/2016)
2 Amazon (Date 11/01/2016)
3 IBM, Inc. (Date 11/08/2016)
4 Microsoft (Date 11/10/2016)
5 Google, Inc. (Date 11/10/1990)
6 Google, Inc. (Date 11/07/2016)
7 Samsung (Date 05/02/2016)
dtype: object
看来@ayhan的答案更好,谢谢你的帮助。如果我有兴趣在堆栈中保留nan空间呢?。。我应该做什么:
drop=False
?您必须删除未堆叠的行。dropna(inplace=True)
。谢谢您的帮助。如果我有兴趣在堆栈中保留nan空间呢?。。我应该做什么:drop=False
该drop用于删除索引。相反,您应该使用df.stack(dropna=False)
来保留NAN。谢谢,我得到了:AttributeError:'Series'对象没有属性“stack”
您在这里发布的原始数据帧上尝试过吗?为了得到那个错误,你应该在一个序列上调用那个方法。是的,这必须是原始的。