Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/vb.net/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将dataframe缩减为仅已更改的行_Python_Pandas - Fatal编程技术网

Python 将dataframe缩减为仅已更改的行

Python 将dataframe缩减为仅已更改的行,python,pandas,Python,Pandas,我有一个旧数据帧和一个新数据帧,如下所示: import pandas as pd import numpy as np df_old = pd.DataFrame({ "col1": ["a", "b", "c", "d", "e"], "col2": [1.0, 2.0, 3.0, 4.0, 5.0],

我有一个旧数据帧和一个新数据帧,如下所示:

import pandas as pd
import numpy as np

df_old = pd.DataFrame({
        "col1": ["a", "b", "c", "d", "e"],
        "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
        "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
    }, columns=["col1", "col2", "col3"])

df_new = pd.DataFrame({
        "col1": ["a", "b", "c", "e", "f"],
        "col2": [1.0, 2.0, 3.5, 5.0, 6.0],
        "col3": [1.0, 4.2, 3.0, 5.0, 6.0]
    }, columns=["col1", "col2", "col3"])

# Expected data
df_changed = pd.DataFrame({
        "col1": ["b", "c", "d", "f"],
        "col2": [2.0, 3.5, np.NaN, 6.0],
        "col3": [4.2, 3.0, np.NaN, 6.0]
    }, columns=["col1", "col2", "col3"])

print(df_old)
print(df_new)
print(df_changed)
我希望在旧df和新df之间更改(col2或col3)、添加和删除的行。在我的实际数据中,col1是唯一的,因此如果需要,它可以作为索引

编辑 如果我将col1设置为索引

df_old.set_index('col1', inplace=True)
df_new.set_index('col1', inplace=True)
我能跑

print(df_new.ne(df_old))

       col2   col3
col1
a     False  False
b     False   True
c      True  False
d      True   True
e     False  False
f      True   True
然后我可以像这样创建一个diff df

df_diff = df_new.ne(df_old)
df_diff = df_diff[df_diff.col2 | df_diff.col3]

不过,我不知道如何将其与数据帧和数据关联起来。

您离解决方案还不远。完成
set_index
ne
操作后,沿列获取一个包含
any
的序列,以使每行至少有一个True,并且
reindex
df_new仅包含所需的值

g=df_new.set_index('col1')#Reset df_new's index


#subtract the datframes after resetting index and use the loc accessor to filter unwanted rows
g.loc[~(df_old.set_index('col1').sub(df_new.set_index('col1'))[['col2','col3']].reset_index()




  col1  col2  col3
0    b   2.0   4.2
1    c   3.5   3.0
2    f   6.0   6.0
df_old = df_old.set_index('col1')
df_new = df_new.set_index('col1')

s = df_new.ne(df_old).any(axis=1) # get True for rows with at least one True
print(s)
# 0    False
# 1     True
# 2     True
# 3     True
# 4     True
# dtype: bool

df_changed = df_new.reindex(s.index[s]).reset_index()
print(df_changed)
  col1  col2  col3
0    b   2.0   4.2
1    c   3.5   3.0
2    d   NaN   NaN
3    f   6.0   6.0

你尝试了什么?抱歉@nidabdella。请参阅编辑,了解我所拥有的sovar。大部分我只是刚刚弄明白。