Python 将dataframe缩减为仅已更改的行
我有一个旧数据帧和一个新数据帧,如下所示:Python 将dataframe缩减为仅已更改的行,python,pandas,Python,Pandas,我有一个旧数据帧和一个新数据帧,如下所示: import pandas as pd import numpy as np df_old = pd.DataFrame({ "col1": ["a", "b", "c", "d", "e"], "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
import pandas as pd
import numpy as np
df_old = pd.DataFrame({
"col1": ["a", "b", "c", "d", "e"],
"col2": [1.0, 2.0, 3.0, 4.0, 5.0],
"col3": [1.0, 2.0, 3.0, 4.0, 5.0]
}, columns=["col1", "col2", "col3"])
df_new = pd.DataFrame({
"col1": ["a", "b", "c", "e", "f"],
"col2": [1.0, 2.0, 3.5, 5.0, 6.0],
"col3": [1.0, 4.2, 3.0, 5.0, 6.0]
}, columns=["col1", "col2", "col3"])
# Expected data
df_changed = pd.DataFrame({
"col1": ["b", "c", "d", "f"],
"col2": [2.0, 3.5, np.NaN, 6.0],
"col3": [4.2, 3.0, np.NaN, 6.0]
}, columns=["col1", "col2", "col3"])
print(df_old)
print(df_new)
print(df_changed)
我希望在旧df和新df之间更改(col2或col3)、添加和删除的行。在我的实际数据中,col1是唯一的,因此如果需要,它可以作为索引
编辑
如果我将col1设置为索引
df_old.set_index('col1', inplace=True)
df_new.set_index('col1', inplace=True)
我能跑
print(df_new.ne(df_old))
col2 col3
col1
a False False
b False True
c True False
d True True
e False False
f True True
然后我可以像这样创建一个diff df
df_diff = df_new.ne(df_old)
df_diff = df_diff[df_diff.col2 | df_diff.col3]
不过,我不知道如何将其与数据帧和数据关联起来。您离解决方案还不远。完成
set_index
和ne
操作后,沿列获取一个包含any
的序列,以使每行至少有一个True,并且reindex
df_new仅包含所需的值
g=df_new.set_index('col1')#Reset df_new's index
#subtract the datframes after resetting index and use the loc accessor to filter unwanted rows
g.loc[~(df_old.set_index('col1').sub(df_new.set_index('col1'))[['col2','col3']].reset_index()
col1 col2 col3
0 b 2.0 4.2
1 c 3.5 3.0
2 f 6.0 6.0
df_old = df_old.set_index('col1')
df_new = df_new.set_index('col1')
s = df_new.ne(df_old).any(axis=1) # get True for rows with at least one True
print(s)
# 0 False
# 1 True
# 2 True
# 3 True
# 4 True
# dtype: bool
df_changed = df_new.reindex(s.index[s]).reset_index()
print(df_changed)
col1 col2 col3
0 b 2.0 4.2
1 c 3.5 3.0
2 d NaN NaN
3 f 6.0 6.0
你尝试了什么?抱歉@nidabdella。请参阅编辑,了解我所拥有的sovar。大部分我只是刚刚弄明白。