Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 合并数据帧时丢失的条目数_Python_Pandas_Dataframe - Fatal编程技术网

Python 合并数据帧时丢失的条目数

Python 合并数据帧时丢失的条目数,python,pandas,dataframe,Python,Pandas,Dataframe,在一个练习中,我被要求将3个数据帧与内部联接(df1+df2+df3=mergedDf)合并,然后在另一个问题中,我被要求告诉我在执行此三向合并时丢失了多少个条目 #DataFrame1 df1 = pd.DataFrame(columns=["Goals","Medals"],data=[[5,2],[1,0],[3,1]]) df1.index = ['Argentina','Angola','Bolivia'] print(df1) Goals Medals

在一个练习中,我被要求将3个数据帧与内部联接(df1+df2+df3=mergedDf)合并,然后在另一个问题中,我被要求告诉我在执行此三向合并时丢失了多少个条目

#DataFrame1
df1 = pd.DataFrame(columns=["Goals","Medals"],data=[[5,2],[1,0],[3,1]])
df1.index = ['Argentina','Angola','Bolivia']
print(df1)
            Goals    Medals
Argentina       5         2
Angola          1         0
Bolivia         3         1

#DataFrame2
df2 = pd.DataFrame(columns=["Dates","Medals"],data=[[1,0],[2,1],[2,2])
df2.index = ['Venezuela','Africa']
print(df2)
            Dates    Medals
Venezuela       1         0
Africa          2         1
Argentina       2         2

#DataFrame3
df3 = pd.DataFrame(columns=["Players","Goals"],data=[[11,5],[11,1],[10,0]])
df3.index = ['Argentina','Australia','Belgica']
print(df3)
           Players    Goals
Argentina       11        5
Australia       11        1
Spain           10        0

#mergedDf
mergedDf = pd.merge(df1,df2,how='inner',left_index=True, right_index=True)
mergedDf = pd.merge(mergedDf,df3,how='inner',left_index=True, right_index=True)
print(mergedDF)
           Goals_X  Medals_X  Dates  Medals_Y  Players  Goals_Y
Argentina        5         2      2         2       11        2

#Calculate number of lost entries by code
我试着用外部连接合并所有内容,然后减去mergedDf,但我不知道怎么做,有人能帮我吗?

您可以在merge中将True传递给
指示器

df1=pd.DataFrame({'A':[1,2,3],'B':[1,1,1]})
df2=pd.DataFrame({'A':[2,3],'B':[1,1]})
df1.merge(df2,on='A',how='inner')
Out[257]: 
   A  B_x  B_y
0  2    1    1
1  3    1    1
df1.merge(df2,on='A',how='outer',indicator =True)
Out[258]: 
   A  B_x  B_y     _merge
0  1    1  NaN  left_only
1  2    1  1.0       both
2  3    1  1.0       both
mergedf=df1.merge(df2,on='A',how='outer',indicator =True)
然后使用
value\u计数
您知道在执行
internal
时损失了多少,因为当
how='internal'

mergedf['_merge'].value_counts()
Out[260]: 
both          2
left_only     1
right_only    0
Name: _merge, dtype: int64
对于具有两个合并列的3个df和过滤器,单词都是

df1.merge(df2, on='A',how='outer',indicator =True).rename(columns={'_merge':'merge'}).merge(df3, on='A',how='outer',indicator =True)

具有外部联接和参数指示符的解决方案,通过
True
值的总和(如
1
s)对两个指示符列
a
b
中都没有
的行进行最后计数:

另一种解决方案是使用内部联接和
求和
每个不匹配的索引的筛选值
mergedDf.index

mergedDf = pd.merge(df1,df2,how='inner',left_index=True, right_index=True)
mergedDf = pd.merge(mergedDf,df3,how='inner',left_index=True, right_index=True)
vals = mergedDf.index
print (vals)
Index(['Argentina'], dtype='object')

dfs = [df1, df2, df3]
missing = sum((~x.index.isin(vals)).sum() for x in dfs)
print (missing)
6
另一种解决方案,如果每个索引中的值唯一:

dfs = [df1, df2, df3]
L = [set(x.index) for x in dfs]

#https://stackoverflow.com/a/25324329/2901002
missing = len(set.union(*L) - set.intersection(*L))
print (missing)
6

我找到了一个简单但有效的解决方案:

合并3个数据帧(内部和外部):
请发布一个.OP need
我被要求用内部连接合并3个数据帧(df1+df2+df3=mergedDf)
dfs = [df1, df2, df3]
L = [set(x.index) for x in dfs]

#https://stackoverflow.com/a/25324329/2901002
missing = len(set.union(*L) - set.intersection(*L))
print (missing)
6
df1 = Df1()
df2 = Df2()
df3 = Df3()
inner = pd.merge(pd.merge(df1,df2,on='<Common column>',how='inner'),df3,on='<Common column>',how='inner')
outer = pd.merge(pd.merge(df1,df2,on='<Common column>',how='outer'),df3,on='<Common column>',how='outer')
return (len(outer)-len(inner))