Python 与空数据帧合并

Python 与空数据帧合并,python,pandas,dataframe,Python,Pandas,Dataframe,我正在尝试将一个数据帧(df1)与另一个数据帧(df2)合并,df2可能为空。合并条件是df1.index=df2.z(df1从不为空),但我得到以下错误 有什么办法可以让它工作吗 In [31]: import pandas as pd In [32]: df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [1, 2, 3]}) df2 = pd.DataFrame({'x':[], 'y':[], 'z':[]}) dfm = p

我正在尝试将一个数据帧(
df1
)与另一个数据帧(
df2
)合并,
df2
可能为空。合并条件是
df1.index=df2.z
df1
从不为空),但我得到以下错误

有什么办法可以让它工作吗

In [31]:
import pandas as pd
In [32]:
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [1, 2, 3]})
df2 = pd.DataFrame({'x':[], 'y':[], 'z':[]})
dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-34-4e9943198dae> in <module>()
----> 1 dfmb = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')

/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
     37                          right_index=right_index, sort=sort, suffixes=suffixes,
     38                          copy=copy)
---> 39     return op.get_result()
     40 if __debug__:
     41     merge.__doc__ = _merge_doc % '\nleft : DataFrame'

/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in get_result(self)
    185 
    186     def get_result(self):
--> 187         join_index, left_indexer, right_indexer = self._get_join_info()
    188 
    189         ldata, rdata = self.left._data, self.right._data

/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in _get_join_info(self)
    277                 join_index = self.left.index.take(left_indexer)
    278             elif self.left_index:
--> 279                 join_index = self.right.index.take(right_indexer)
    280             else:
    281                 join_index = Index(np.arange(len(left_indexer)))

/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in take(self, indexer, axis)
    981 
    982         indexer = com._ensure_platform_int(indexer)
--> 983         taken = np.array(self).take(indexer)
    984 
    985         # by definition cannot propogate freq

IndexError: cannot do a non-empty take from an empty axes.
[31]中的

作为pd进口熊猫
在[32]中:
df1=pd.DataFrame({'a':[1,2,3],'b':[4,5,6],'c':[1,2,3]})
df2=pd.DataFrame({'x':[],'y':[],'z':[]})
dfm=pd.merge(df1,df2,how='outer',left_index=True,right_on='z')
---------------------------------------------------------------------------
索引器回溯(最后一次最近调用)
在()
---->1 dfmb=pd.merge(df1,df2,how='outer',left_index=True,right_on='z')
/合并中的usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc(左、右、如何、开、左、右、左索引、右索引、排序、后缀、副本)
37右索引=右索引,排序=排序,后缀=后缀,
38拷贝=拷贝)
--->39返回操作获取结果()
40如果调试:
41合并.\uuuuu文档\uuuuu=\u合并\u文档%'\n左:数据帧'
/获取结果(self)中的usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc
185
186 def get_结果(自身):
-->187联合索引,左联合索引器,右联合索引器=self.\u获取联合信息()
188
189 ldata,rdata=self.left.\u数据,self.right.\u数据
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in\u get\u join\u info(self)
277 join\u index=self.left.index.take(左索引器)
278 elif自左索引:
-->279 join\u index=self.right.index.take(右索引器)
280其他:
281连接索引=索引(np.arange(len(左索引器)))
/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in take(self、indexer、axis)
981
982索引器=com.\u确保\u平台\u内部(索引器)
-->983 take=np.array(self.take)(索引器)
984
985#根据定义,不能传播频率
索引器:无法从空轴执行非空提取。

可能足以满足您的需求

另一种选择,类似于Joran的:

try:
    dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
except IndexError:
    dfm = df1.reindex_axis(df1.columns.union(df2.columns), axis=1)
我不确定哪个更清楚,但以下两项工作:

In [11]: df1.reindex_axis(df1.columns.union(df2.columns), axis=1)
Out[11]:
   a  b  c   x   y   z
0  1  4  1 NaN NaN NaN
1  2  5  2 NaN NaN NaN
2  3  6  3 NaN NaN NaN

In [12]: df1.loc[:, df1.columns.union(df2.columns)]
Out[12]:
   a  b  c   x   y   z
0  1  4  1 NaN NaN NaN
1  2  5  2 NaN NaN NaN
2  3  6  3 NaN NaN NaN

(我更喜欢前者。)

为什么不先检查一下它是否为空?这根本不需要时间,这不是问题所在。在随后的代码中,我希望合并的数据帧包含
df1
df2
中的列(即使其中一些列可能为None/nan)。这与合并不同。合并后,我希望所有列都是
dfm
的一部分。这很好,但是我该如何保存类型呢
union
只复制列名,而不复制类型。在我的例子中,有些值是datetimes,所以我希望NaT而不是NaN作为列值。@orange我只会在这些列上打上
pd.to\u datetime
。注意
reindex\u axis
自Pandas 0.21.0以来就被弃用了。对熊猫的当前版本使用
reindex
In [11]: df1.reindex_axis(df1.columns.union(df2.columns), axis=1)
Out[11]:
   a  b  c   x   y   z
0  1  4  1 NaN NaN NaN
1  2  5  2 NaN NaN NaN
2  3  6  3 NaN NaN NaN

In [12]: df1.loc[:, df1.columns.union(df2.columns)]
Out[12]:
   a  b  c   x   y   z
0  1  4  1 NaN NaN NaN
1  2  5  2 NaN NaN NaN
2  3  6  3 NaN NaN NaN