python中的管道或函数序列或过滤器，然后汇总（作为dplyr）_Python_R_Pandas_Dplyr_Pipe

python中的管道或函数序列或过滤器，然后汇总（作为dplyr）

python r pandas

python中的管道或函数序列或过滤器，然后汇总（作为dplyr）,python,r,pandas,dplyr,pipe,Python,R,Pandas,Dplyr,Pipe,语境化。我是一个R用户，但目前正在python（带pandas）之间切换。假设我有这个数据框 data = {'participant': ['p1','p1','p2','p3'], 'metadata': ['congruent_1','congruent_2','incongruent_1','incongruent_2'], 'reaction': [22000,25000,27000,35000] } df_s1 = pd.DataFr

语境化。我是一个R用户，但目前正在python（带pandas）之间切换。假设我有这个数据框

data = {'participant': ['p1','p1','p2','p3'],
        'metadata': ['congruent_1','congruent_2','incongruent_1','incongruent_2'],
        'reaction': [22000,25000,27000,35000]
        }

df_s1 = pd.DataFrame(data, columns = ['participant','metadata', 'reaction'])
df_s1 = df_s1.append([df_s1]*15,ignore_index=True)
df_s1

我想通过以下方式重现我在R（管道函数）中可以轻松完成的工作：

这是不可能的。当我将此代码拆分为部分/变量时，我才能成功：

x = df_s1[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1")]
x = x["reaction"].mean()
x

在dplyr的方式，我会去的

ds_s1 %>% 
  filter(metadata == "congruent_1" | metadata == "incongruent_1") %>% 
  summarise(mean(reaction))

注意：我非常感谢对一个可以将我的R代码转换为Python的站点的简明引用。有几种文献可用，但格式混合，风格灵活

谢谢你是说：

df_s1.loc[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1"), "reaction"].mean()

或更简单地使用

isin

：

df_s1.loc[df_s1.metadata.isin(["congruent_1", "incongruent_1"]), "reaction"].mean()

输出：

我们这里有

.loc

df_s1.loc[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1"), 'reaction'].mean()
Out[117]: 24500.0

更改为

isin

，如Quang所述，尽量减少代码行

在R底

mean(ds_s1$reaction[ds_s1$metadata%in%c('congruent_1','incongruent_1')])

除了其他建议的解决方案外：

df_s1.query('metadata==["congruent_1","incongruent_1"]').agg({"reaction": "mean"})

 reaction    24500.0
 dtype: float64

使用python（我是作者），您可以轻松地将代码从R移植到python：

从datar.all导入*
数据=TIBLE(
参与者=['p1'，'p1'，'p2'，'p3']，
元数据=[“一致的”、“一致的”、“不一致的”、“不一致的”]，
反应=[22000250002700035000]
)
df_s1=数据>>解数（15）
df_s1=df_s1>>\
过滤器（（f.metadata==“一致的”（f.metadata==“不一致的”））>>\
分组依据（f.元数据）>>\
总结（反应平均值=平均值（f反应））
打印（df_s1）

输出：

        metadata  reaction_mean
0    congruent_1        22000.0
1  incongruent_1        27000.0

非常感谢。你能解释一下为什么需要loc吗？谢谢大家!@Luis.loc用于同时对列和索引进行切片，在您的情况下，您通过bool进行索引并希望选择列，因此需要.loc ~谢谢！效果相当不错。我不知道“agg”函数！！非常感谢你！有没有像tidyverse这样的“管道”流量的机会？真的很高兴知道！！谢谢分享这个代码！！！

df_s1.query('metadata==["congruent_1","incongruent_1"]').agg({"reaction": "mean"})

 reaction    24500.0
 dtype: float64

        metadata  reaction_mean
0    congruent_1        22000.0
1  incongruent_1        27000.0