Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/security/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何区分数据帧中所有行中相同的列?_Python_Pandas - Fatal编程技术网

Python 如何区分数据帧中所有行中相同的列?

Python 如何区分数据帧中所有行中相同的列?,python,pandas,Python,Pandas,我有一个数据帧,如下所示- df1_data = {'sym' :{0:'AAA',1:'BBB',2:'CCC',3:'DDD',4:'DDD',5:'CCC'}, 'id' :{0:'101',1:'102',2:'103',3:'104',4:'105',5:'106'}, 'sal':{0:'1000',1:'1000',2:'1000',3:'1000',4:'1000',5:'1000'}, 'loc':{0:'zzz',1:'zzz'

我有一个数据帧,如下所示-

df1_data = {'sym' :{0:'AAA',1:'BBB',2:'CCC',3:'DDD',4:'DDD',5:'CCC'},
        'id' :{0:'101',1:'102',2:'103',3:'104',4:'105',5:'106'},
        'sal':{0:'1000',1:'1000',2:'1000',3:'1000',4:'1000',5:'1000'},
        'loc':{0:'zzz',1:'zzz',2:'zzz',3:'zzz',4:'zzz',5:'zzz'},
        'name':{0:'abc',1:'abc',2:'abc',3:'pqr',4:'pqr',5:'pqr'}}
df = pd.DataFrame(df1_data)
print df

    id  loc name   sal  sym
0  101  zzz  abc  1000  AAA
1  102  zzz  abc  1000  BBB
2  103  zzz  abc  1000  CCC
3  104  zzz  pqr  1000  DDD
4  105  zzz  pqr  1000  DDD
5  106  zzz  pqr  1000  CCC
我想检查上述数据帧的哪些列在所有行中包含相同的值。基于这个要求,我希望在一个数据帧中有相同的列,在另一个数据帧中有不匹配的列

预期产出-

匹配的\u df-

   loc   sal
0  zzz  1000
1  zzz  1000
2  zzz  1000
3  zzz  1000
4  zzz  1000
5  zzz  1000
    id name  sym
0  101  abc  AAA
1  102  abc  BBB
2  103  abc  CCC
3  104  pqr  DDD
4  105  pqr  DDD
5  106  pqr  CCC
不匹配的\u df-

   loc   sal
0  zzz  1000
1  zzz  1000
2  zzz  1000
3  zzz  1000
4  zzz  1000
5  zzz  1000
    id name  sym
0  101  abc  AAA
1  102  abc  BBB
2  103  abc  CCC
3  104  pqr  DDD
4  105  pqr  DDD
5  106  pqr  CCC

您可以通过以下方式将
df
与第一行进行比较,然后检查所有
True
值:

mask
的另一种方法是比较
numpy数组

arr = df.values
mask = (arr == arr[0]).all(axis=0)
print (mask)
[False  True False  True False]

print (df.loc[:, mask])
   loc   sal
0  zzz  1000
1  zzz  1000
2  zzz  1000
3  zzz  1000
4  zzz  1000
5  zzz  1000

print (df.loc[:, ~mask])
    id name  sym
0  101  abc  AAA
1  102  abc  BBB
2  103  abc  CCC
3  104  pqr  DDD
4  105  pqr  DDD
5  106  pqr  CCC

好极了我从未对dataframe进行过行比较,但它非常惊人且非常快速…谢谢…@jezrael-面临一个未来警告-/usr/local/lib/python2.7/dist packages/pandas/core/ops.py:1247:FutureWarning:numpy equal将来不会检查对象标识。比较没有返回与标识(
is
)所建议的相同的结果,并且将更改。结果=op(x,y)…我可以避免这个警告吗?或者它很重要?我认为这是numpy警告,你可以看到它,因为。在我看来这没问题。@jazreal-ok。我怎样才能避免这个警告?@kit-很难回答的问题,我不知道。但我不确定这是否是个好主意。因为在熊猫身上经常有新的解决方案。