Python 仅使用pandas merge合并等效项_Python_Pandas_Numpy_Dataframe_Scipy

Python 仅使用pandas merge合并等效项

python pandas numpy dataframe

Python 仅使用pandas merge合并等效项,python,pandas,numpy,dataframe,scipy,Python,Pandas,Numpy,Dataframe,Scipy,我有以下两个数据帧。A有数据，B有数据的权重。B具有权重作为索引激活的日期，“级别_1”具有权重相关的实体 A = pd.DataFrame(index=pd.date_range(start='2016-01-15', periods=10, freq='B')) B = pd.DataFrame(index=pd.date_range(start='2016-01-01', periods=5, freq='W')) A["X"] = np.random.rand(A.shape[0]) A

我有以下两个数据帧。A有数据，B有数据的权重。B具有权重作为索引激活的日期，“级别_1”具有权重相关的实体

A = pd.DataFrame(index=pd.date_range(start='2016-01-15', periods=10, freq='B'))
B = pd.DataFrame(index=pd.date_range(start='2016-01-01', periods=5, freq='W'))
A["X"] = np.random.rand(A.shape[0])
A["Y"] = np.random.rand(A.shape[0])
A["Z"] = np.random.rand(A.shape[0])


B["X"] = np.random.rand(B.shape[0])
B["Y"] = np.random.rand(B.shape[0])
B["Z"] = np.random.rand(B.shape[0])

A = A.stack(dropna=False).reset_index(level=1)
B = B.stack(dropna=False).reset_index(level=1)

我想在结尾处以类似的方式结束（这相当于A在“权重”列中应用了B中的权重）

让我吃惊的是，B中的索引不是（或不一定）在A中，这意味着我不能只将B中的数据添加到A列下，然后执行A['weights'].fillna（method='ffill'）。我想我可以循环一下，但这太慢了，太乱了。

我想你需要：

好的，在jezrael给我的关键提示（使用合并）之后，我开发了另一个解决方案。这仅使用pandas.merge（）执行上述任务

A = pd.DataFrame(index=pd.date_range(start='2016-01-15', periods=10, freq='B'))
B = pd.DataFrame(index=pd.date_range(start='2016-01-01', periods=5, freq='W'))
A["X"] = np.random.rand(A.shape[0])
A["Y"] = np.random.rand(A.shape[0])
A["Z"] = np.random.rand(A.shape[0])


B["X"] = np.random.rand(B.shape[0])
B["Y"] = np.random.rand(B.shape[0])
B["Z"] = np.random.rand(B.shape[0])

A = A.stack(dropna=False).reset_index(level=1)
B = B.stack(dropna=False).reset_index(level=1)

ss = pd.merge(A.reset_index(),B.reset_index(), how='outer', on="index", sort=True).set_index("index")
ss["level_1"] = ss["level_1_x"].fillna(ss["level_1_y"])
ss = ss.drop(["level_1_x", "level_1_y"], axis=1)
w = ss.reset_index().pivot(index='index', columns="level_1", values='0_y').ffill().stack(dropna=False).reset_index(level=1)

A['weight'] =  w[w.index.isin(A.index)][0]

谢谢，不幸的是，我使用的是旧版本的熊猫。有没有办法用pandas.merge（）而不是pandas.merge\u asof（）来实现这一点？我一直在尝试，但没有成功。不，它是pandas

0.19.0

中的新函数，以前没有类似的函数。不幸的是，我得到了一个只有pandas.merge（）的版本。它比merge_asof（）更复杂，而且可能要慢得多。感谢您为我指明了使用merge（）的正确方向。

np.random.seed(1234)
A = pd.DataFrame(index=pd.date_range(start='2016-01-15', periods=10, freq='B'))
B = pd.DataFrame(index=pd.date_range(start='2016-01-01', periods=5, freq='W'))
A["X"] = np.random.rand(A.shape[0])
A["Y"] = np.random.rand(A.shape[0])
A["Z"] = np.random.rand(A.shape[0])


B["X"] = np.random.rand(B.shape[0])
B["Y"] = np.random.rand(B.shape[0])
B["Z"] = np.random.rand(B.shape[0])

A = A.stack(dropna=False).reset_index(level=1)
B = B.stack(dropna=False).reset_index(level=1)

#print (A)
#print (B)

print (pd.merge_asof(A.reset_index(),
                     B.reset_index().rename(columns={0:'weight'}), on='index', by='level_1'))

       index level_1         0    weight
0  2016-01-15       X  0.191519  0.436173
1  2016-01-15       Y  0.357817  0.218792
2  2016-01-15       Z  0.364886  0.184287
3  2016-01-18       X  0.622109  0.802148
4  2016-01-18       Y  0.500995  0.924868
5  2016-01-18       Z  0.615396  0.047355
6  2016-01-19       X  0.437728  0.802148
7  2016-01-19       Y  0.683463  0.924868
8  2016-01-19       Z  0.075381  0.047355
9  2016-01-20       X  0.785359  0.802148
10 2016-01-20       Y  0.712702  0.924868
11 2016-01-20       Z  0.368824  0.047355
12 2016-01-21       X  0.779976  0.802148
13 2016-01-21       Y  0.370251  0.924868
14 2016-01-21       Z  0.933140  0.047355
15 2016-01-22       X  0.272593  0.802148
16 2016-01-22       Y  0.561196  0.924868
17 2016-01-22       Z  0.651378  0.047355
18 2016-01-25       X  0.276464  0.143767
19 2016-01-25       Y  0.503083  0.442141
20 2016-01-25       Z  0.397203  0.674881
21 2016-01-26       X  0.801872  0.143767
22 2016-01-26       Y  0.013768  0.442141
23 2016-01-26       Z  0.788730  0.674881
24 2016-01-27       X  0.958139  0.143767
25 2016-01-27       Y  0.772827  0.442141
26 2016-01-27       Z  0.316836  0.674881
27 2016-01-28       X  0.875933  0.143767
28 2016-01-28       Y  0.882641  0.442141
29 2016-01-28       Z  0.568099  0.674881

A = pd.DataFrame(index=pd.date_range(start='2016-01-15', periods=10, freq='B'))
B = pd.DataFrame(index=pd.date_range(start='2016-01-01', periods=5, freq='W'))
A["X"] = np.random.rand(A.shape[0])
A["Y"] = np.random.rand(A.shape[0])
A["Z"] = np.random.rand(A.shape[0])


B["X"] = np.random.rand(B.shape[0])
B["Y"] = np.random.rand(B.shape[0])
B["Z"] = np.random.rand(B.shape[0])

A = A.stack(dropna=False).reset_index(level=1)
B = B.stack(dropna=False).reset_index(level=1)

ss = pd.merge(A.reset_index(),B.reset_index(), how='outer', on="index", sort=True).set_index("index")
ss["level_1"] = ss["level_1_x"].fillna(ss["level_1_y"])
ss = ss.drop(["level_1_x", "level_1_y"], axis=1)
w = ss.reset_index().pivot(index='index', columns="level_1", values='0_y').ffill().stack(dropna=False).reset_index(level=1)

A['weight'] =  w[w.index.isin(A.index)][0]