Python 如何连接两个dataframe并保持每个dataframe的某些列？_Python_Pandas_Dataframe_Filter_Merge

Python 如何连接两个dataframe并保持每个dataframe的某些列？

python pandas dataframe filter merge

Python 如何连接两个dataframe并保持每个dataframe的某些列？,python,pandas,dataframe,filter,merge,Python,Pandas,Dataframe,Filter,Merge,我有两个数据帧，名称和声明： names = pd.DataFrame({ 'UniqueID': 'A B C D E F'.split(), 'Name':['Susie', 'George Foreman', 'Charles', 'Nicole', 'Peter Piper', 'Penelope Cruz'], 'Address':['111 3rd St', '123 Bank St', '555 Square Sq', '9 Charlton Ave', '

我有两个数据帧，

名称

和

声明

：

names = pd.DataFrame({
    'UniqueID': 'A B C D E F'.split(),
    'Name':['Susie', 'George Foreman', 'Charles', 'Nicole', 'Peter Piper', 'Penelope Cruz'],
    'Address':['111 3rd St', '123 Bank St', '555 Square Sq', '9 Charlton Ave', 'PO Box 1', 'The White House'], 
    'Phone number':['2032218686', '2032032203', '8048048804', '2232645879', '2564544469', '8005865555']})

  UniqueID            Name          Address Phone number
0        A           Susie       111 3rd St   2032218686
1        B  George Foreman      123 Bank St   2032032203
2        C         Charles    555 Square Sq   8048048804
3        D          Nicole   9 Charlton Ave   2232645879
4        E     Peter Piper         PO Box 1   2564544469
5        F   Penelope Cruz  The White House   8005865555


claims = pd.DataFrame({
    'ClaimNo':range(29,38),
    'ClaimDetails':['Slip and fall','Clmt slipped and fell','Thunderstorms are scary','Hail storm damage',
                   'Property fire','Arson','Shooting','Shooting and fatality','Slip and fall'],
    'PolicyNo':['00058566-0','00056455-5','00058588-8','00011111-2','00088787-0','00045658-0','00012345-6','00065432-1','00088080-4'],
    'UniqueID':'A F F D E A D E E'.split()})

   ClaimNo             ClaimDetails    PolicyNo UniqueID
0       29            Slip and fall  00058566-0        A
1       30    Clmt slipped and fell  00056455-5        F
2       31  Thunderstorms are scary  00058588-8        F
3       32        Hail storm damage  00011111-2        D
4       33            Property fire  00088787-0        E
5       34                    Arson  00045658-0        A
6       35                 Shooting  00012345-6        D
7       36    Shooting and fatality  00065432-1        E
8       37            Slip and fall  00088080-4        E

我想创建一个新的数据框，其中只包含

名称的行

，其唯一标识显示在

声明中

。我不确定它们是否应该合并或过滤。。我一直在尝试不同类型的合并，但似乎无法得到我想要的结果，结果应该是这样的：

  UniqueID           Name          Address Phone number
0        A          Susie       111 3rd St   2032218686
1        D         Nicole   9 Charlton Ave   2232645879
2        E    Peter Piper         PO Box 1   2564544469
3        F  Penelope Cruz  The White House   8005865555

这不管用吗

print (pd.merge(names, claims, on='UniqueID'))

然后，也许您可以删除不需要的列

data = data.drop(columns="some_column_name")

您可以使用合并方法。只需确保两个数据帧中的UniqueID列具有相同的数据类型（在本例中很可能是“str”）

如前所述，如果这不起作用，那是因为您的列具有不同的数据类型。它们也可能有额外的空格。为了改变这两种情况，您可以执行以下操作：

df1['UniqueID'] = df1['UniqueID'].astype(str).str.replace(" ","")
df2['UniqueID'] = df2['UniqueID'].astype(str).str.replace(" ","")

然后可以删除不需要的列：

new_df = new_df.drop(columns=['ClaimDetails','PolicyNo'])

对我来说，这似乎是最简单的方法：

names[names.UniqueID.isin(claims['UniqueID'].to_numpy())]

编辑：对于其他正在回答的人，以下是我用来回答OP问题的一些帮助器字典/数据帧变量：

data1 = {"UniqueID": {"0": "A", "1": "B", "2": "C", "3": "D", "4": "E", "5": "F"}, "Name": {"0": "Susie", "1": "George Foreman", "2": "Charles", "3": "Nicole", "4": "Peter Piper", "5": "Penelope Cruz"}, "Address": {"0": "111 3rd St", "1": "123 Bank St", "2": "555 Square Sq", "3": "9 Charlton Ave", "4": "PO Box 1", "5": "The White House"}, "Phone number": {"0": 2032218686, "1": 2032032203, "2": 8048048804, "3": 2232645879, "4": 2564544469, "5": 8005865555}}
names = pd.DataFrame.from_dict(data1)

data2 = {"ClaimNo": {"0": 29, "1": 30, "2": 31, "3": 32, "4": 33, "5": 34, "6": 35, "7": 36, "8": 37}, "ClaimDetails": {"0": "Slip and fall", "1": "Clmt slipped and fell", "2": "Thunderstorms are scary", "3": "Hail storm damage", "4": "Property fire", "5": "Arson", "6": "Shooting", "7": "Shooting and fatality", "8": "Slip and fall"}, "PolicyNo": {"0": "00058566-0", "1": "00056455-5", "2": "00058588-8", "3": "00011111-2", "4": "00088787-0", "5": "00045658-0", "6": "00012345-6", "7": "00065432-1", "8": "00088080-4"}, "UniqueID": {"0": "A", "1": "F", "2": "F", "3": "D", "4": "E", "5": "A", "6": "D", "7": "E", "8": "E"}}
claims = pd.DataFrame.from_dict(data2)

OP：如果你下次提供这些变量会很有帮助，我必须使用pd.read\u fwf将固定宽度格式的表格读入字典对象。谢谢，我最后使用了这个，而没有使用

。to\u numpy（）

，它工作得很好。这实际上不起作用，第一个命令生成一个数据帧，其中包含

名称

（9行，而不是我想要的4行）中的重复项@ウィエム '我的评论是重复的。另外，两个数据帧中的UniqueID列都是一个没有额外空格的字符串。

data1 = {"UniqueID": {"0": "A", "1": "B", "2": "C", "3": "D", "4": "E", "5": "F"}, "Name": {"0": "Susie", "1": "George Foreman", "2": "Charles", "3": "Nicole", "4": "Peter Piper", "5": "Penelope Cruz"}, "Address": {"0": "111 3rd St", "1": "123 Bank St", "2": "555 Square Sq", "3": "9 Charlton Ave", "4": "PO Box 1", "5": "The White House"}, "Phone number": {"0": 2032218686, "1": 2032032203, "2": 8048048804, "3": 2232645879, "4": 2564544469, "5": 8005865555}}
names = pd.DataFrame.from_dict(data1)

data2 = {"ClaimNo": {"0": 29, "1": 30, "2": 31, "3": 32, "4": 33, "5": 34, "6": 35, "7": 36, "8": 37}, "ClaimDetails": {"0": "Slip and fall", "1": "Clmt slipped and fell", "2": "Thunderstorms are scary", "3": "Hail storm damage", "4": "Property fire", "5": "Arson", "6": "Shooting", "7": "Shooting and fatality", "8": "Slip and fall"}, "PolicyNo": {"0": "00058566-0", "1": "00056455-5", "2": "00058588-8", "3": "00011111-2", "4": "00088787-0", "5": "00045658-0", "6": "00012345-6", "7": "00065432-1", "8": "00088080-4"}, "UniqueID": {"0": "A", "1": "F", "2": "F", "3": "D", "4": "E", "5": "A", "6": "D", "7": "E", "8": "E"}}
claims = pd.DataFrame.from_dict(data2)