Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫如何通过获取value=';是';在每列的任意行中_Python_Python 3.x_Pandas - Fatal编程技术网

Python 熊猫如何通过获取value=';是';在每列的任意行中

Python 熊猫如何通过获取value=';是';在每列的任意行中,python,python-3.x,pandas,Python,Python 3.x,Pandas,我需要将行与列“name”组合起来,表中的某些行具有值为“yes”的不同列,如下所示 以下模板给出了输入和预期输出: name department feature1 feature2 feature3 x1 cs yes yes x1 cs yes x1 ec x2 cs yes

我需要将行与列“name”组合起来,表中的某些行具有值为“yes”的不同列,如下所示

以下模板给出了输入和预期输出:

name    department  feature1    feature2    feature3
x1         cs                      yes        yes
x1         cs         yes       
x1         ec           
x2         cs         yes          yes
x2         ec                      yes  
我需要得到的输出是:

x1         cs        yes            yes       yes
x1         ec           
x2         cs        yes            yes
x2         ec                       yes 
建议请使用python和pandas。

您可以使用:

#if want filter only `yes` values 
cols = df.columns.difference(['name','department'])
df[cols] = df[cols] == 'yes'
print (df)
  name department  feature1  feature2  feature3
0   x1         cs     False      True      True
1   x1         cs      True     False     False
2   x1         ec     False     False     False
3   x2         cs      True      True     False
4   x2         ec     False      True     False
然后通过聚合和最后一次通过
dict

df= df.groupby(['name','department']) \
      .max() \
      .replace({True:'yes',False:np.nan}) \
      .reset_index()

print (df)
  name department feature1 feature2 feature3
0   x1         cs      yes      yes      yes
1   x1         ec      NaN      NaN      NaN
2   x2         cs      yes      yes      NaN
3   x2         ec      NaN      yes      NaN
感谢您的评论,也可以使用:

如果所有值仅为
yes
NaN
s:

df = df.fillna('').groupby(['name', 'department']).max().reset_index()
print (df)
  name department feature1 feature2 feature3
0   x1         cs      yes      yes      yes
1   x1         ec                           
2   x2         cs      yes      yes         
3   x2         ec               yes         
编辑:

您可以使用聚合函数通过
dict comprehension
创建自定义
dict
,并使用:


您是否保证
yes
不会重叠,然后您可以:
df.groupby(['name','department']).sum()
不使用
.Any()
而不是
.max()
(第一个示例)?似乎更适合bool类型和短路。注意:
.any()
将使用原始数据而不进行任何映射。@AChampion-您是正确的,也可以使用
any
。谢谢。感谢博尼法西奥,耶斯雷尔,@AChampion的及时回复。我尝试了建议的选项,效果很好。需要对原始问题再添加一条评论。我有更多的列和那些列值,我需要maintain@sri-有更多的列需要以另一种方式处理?你能解释更多吗?姓名部门特征1特征2特征3计数x1 cs是10 x1 cs是x1 ec x2 cs是x2 ec是20
df = df.fillna('').groupby(['name', 'department']).max().reset_index()
print (df)
  name department feature1 feature2 feature3
0   x1         cs      yes      yes      yes
1   x1         ec                           
2   x2         cs      yes      yes         
3   x2         ec               yes         
d = {'feature3': ['yes', np.nan, np.nan, np.nan, np.nan], 
     'feature2': ['yes', np.nan, np.nan, 'yes', 'yes'], 
     'name': ['x1', 'x1', 'x1', 'x2', 'x2'], 
     'count': [10.0, 30.0, np.nan, 20.0, 3.0],
     'feature1': [np.nan, 'yes', np.nan, 'yes', np.nan], 
     'department': ['cs', 'cs', 'ec', 'cs', 'ec'], 
     'description': ['xsdepartment1', 'xsdepartment2', np.nan, 'department1', 'department3']}

c = ['name','department','feature1','feature2','feature3','count','description']
df = pd.DataFrame(d, columns = c)
print (df)
  name department feature1 feature2 feature3  count    description
0   x1         cs      NaN      yes      yes   10.0  xsdepartment1
1   x1         cs      yes      NaN      NaN   30.0  xsdepartment2
2   x1         ec      NaN      NaN      NaN    NaN            NaN
3   x2         cs      yes      yes      NaN   20.0    department1
4   x2         ec      NaN      yes      NaN    3.0    department3

cols = df.columns.difference(['name','department','count','description'])

f = lambda x: tuple(x)
d = {x:'max' for x in cols}
d['count'] = f
d['description'] = f
print (d)
{'feature3': 'max', 
'feature1': 'max', 
'feature2': 'max', 
'description': <function <lambda> at 0x000000000F6FC598>, 
'count': <function <lambda> at 0x000000000F6FC598>}
df[cols] = df[cols] == 'yes'
print (df)
  name department  feature1  feature2  feature3  count    description
0   x1         cs     False      True      True   10.0  xsdepartment1
1   x1         cs      True     False     False   30.0  xsdepartment2
2   x1         ec     False     False     False    NaN            NaN
3   x2         cs      True      True     False   20.0    department1
4   x2         ec     False      True     False    3.0    department3

df = df.groupby(['name', 'department']).agg(d).reset_index()
df[cols] = df[cols].replace({True:'yes',False:np.nan})
print (df)
  name department feature3 feature1 feature2                     description  \
0   x1         cs      yes      yes      yes  (xsdepartment1, xsdepartment2)   
1   x1         ec      NaN      NaN      NaN                          (nan,)   
2   x2         cs      NaN      yes      yes                  (department1,)   
3   x2         ec      NaN      NaN      yes                  (department3,)   

          count  
0  (10.0, 30.0)  
1        (nan,)  
2       (20.0,)  
3        (3.0,)