Python 合并数据帧并计算一个数据帧除以另一个数据帧_Python_Pandas_Dataframe

Python 合并数据帧并计算一个数据帧除以另一个数据帧

python pandas dataframe

Python 合并数据帧并计算一个数据帧除以另一个数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,如何将df1和df2添加到原始df并计算df1/df2？您只需执行以下操作： concatted score date status apple banana orange 0 apple_bana 0.500 2010-02-20 high True False False 1 apple 0.400 2010-02-10 high True False False 2 banana 0.530 2010

如何将df1和df2添加到原始df并计算df1/df2？

您只需执行以下操作：

    concatted  score       date status  apple  banana  orange
0  apple_bana  0.500 2010-02-20   high   True   False   False
1       apple  0.400 2010-02-10   high   True   False   False
2      banana  0.530 2010-01-12   high  False    True   False
3        kiwi  0.532 2010-03-03    low  False   False   False
4        cake  0.634 2010-03-05    low  False   False   False 


fruits = ['apple', 'banana', 'orange']
for fruit in fruits:
    df['fruit'] = df['concatted'].str.contains(fruit, regex=True)
df1=df.groupby('date')['status'].apply(lambda x: (x=='high').sum()).reset_index(name='count')
df2 = df['date'].value_counts().sort_index().reset_index(name='total')

数据似乎没有显示任何用于真正测试计算器的功能。如果您希望所有内容都回到原始数据帧中，那么

transform（）

就是您的朋友另外，更改的方式列表用于执行reg expr

contains（）

检查以消除循环

# check if contains fruit, no need for loop
df['fruit'] = df['concatted'].str.contains('|'.join(fruit), regex=True)

# check proportion of "high" in each group
df['prop'] = df.groupby('date')['status'].transform(lambda x: (x=='high').sum() / len(x))

print(df)

    concatted  score        date status  fruit  prop
0  apple_bana  0.500  02/20/2010   high   True   1.0
1       apple  0.400  02/10/2010   high   True   1.0
2      banana  0.530  01/12/2010   high   True   1.0
3        kiwi  0.532  03/03/2010    low  False   0.0
4        cake  0.634  03/05/2010    low   True   0.0

输出

df = pd.DataFrame({"concatted":["apple_bana","apple","banana","kiwi","cake"],"score":[0.5,0.4,0.53,0.532,0.634],"date":["2010-02-19T16:00:00.000Z","2010-02-09T16:00:00.000Z","2010-01-11T16:00:00.000Z","2010-03-02T16:00:00.000Z","2010-03-04T16:00:00.000Z"],"status":["high","high","high","low","low"],"fruit":[False,False,False,False,False]})

fruits = ['apple', 'banana', 'orange']
df["fruit"] = df["concatted"].str.contains("|".join(fruits))
df["highcalc"] = df.groupby('date')['status'].transform(lambda x: (x=='high').sum())
df["datecount"] = df.groupby('date')["date"].transform("count")
df["finalcalc"] = df.apply(lambda r: r["highcalc"]/r["datecount"], axis=1 )
print(df.to_string(index=False))
df

补充列以对行进行分类

  concatted  score                      date status  fruit  highcalc  datecount  finalcalc
 apple_bana  0.500  2010-02-19T16:00:00.000Z   high   True         1          1        1.0
      apple  0.400  2010-02-09T16:00:00.000Z   high   True         1          1        1.0
     banana  0.530  2010-01-11T16:00:00.000Z   high   True         1          1        1.0
       kiwi  0.532  2010-03-02T16:00:00.000Z    low  False         0          1        0.0
       cake  0.634  2010-03-04T16:00:00.000Z    low  False         0          1        0.0

这很慢，因为这里的变换函数很复杂，最好是独立计算大小和和如果我有多个条件，有没有办法做到这一点？i、我有苹果、香蕉和橘子的专栏，如果这些水果出现在专栏中，它们就会显示为真concatted@arv你能用一个例子来更新这个问题吗，我很乐意帮忙：）谢谢，刚刚做完！循环的目的是：对于列表“水果”中的每个水果，保留该列的水果，然后执行计算如果我有多个条件，有没有办法做到这一点？i、我有苹果、香蕉和橘子的专栏，如果这些水果出现在专栏中，它们就会显示为真concatted@arv我已经提供了一个更新来回答。您可以添加任意数量的指示列：）我已经使用了

浓缩

以及

水果

的成分进行了显示，并添加了面包房。深入了解熊猫，有这么多强大的功能非常感谢你！对不起，可以一次计算每一列吗？i、 e.对于列表中的每种水果[‘苹果’、‘面包店’、‘香蕉’、‘蛋糕’、‘水果’、‘猕猴桃’、‘待定’]——执行计算，为每种水果生成3列（highcalc、datecount、finalcal）？@arv您的问题似乎在演变：）昨天的评论是关于分类列，现在是将Calc分类列。所有这些都可能与技术说明。方法是对分解后的数据集进行计算，然后将其旋转成列。从大量的数据经验来看，这是无效的。您没有每个成分水果的源数据。假设聚合数据可以分类是许多数据质量问题的根源对不起，我昨天的评论本打算这么说，但我含糊不清，对不起。

df = pd.DataFrame({"concatted":["apple_bana","apple","banana","kiwi","cake"],"score":[0.5,0.4,0.53,0.532,0.634],"date":["2010-02-19T16:00:00.000Z","2010-02-09T16:00:00.000Z","2010-01-11T16:00:00.000Z","2010-03-02T16:00:00.000Z","2010-03-04T16:00:00.000Z"],"status":["high","high","high","low","low"]})

df["highcalc"] = df.groupby('date')['status'].transform(lambda x: (x=='high').sum())
df["datecount"] = df.groupby('date')["date"].transform("count")
df["finalcalc"] = df.apply(lambda r: r["highcalc"]/r["datecount"], axis=1 )

dfcat = pd.DataFrame({"concatted":df["concatted"].unique(), "cat":np.NaN, "truth":True})
fruits = ['apple', 'banana', 'orange']
bakery = ["cake"]
dfcat.loc[dfcat["cat"].isna() & dfcat["concatted"].str.contains("|".join(fruits)), "cat"] = "fruit"
dfcat.loc[dfcat["cat"].isna() & dfcat["concatted"].str.contains("|".join(bakery)), "cat"] = "bakery"
dfcat = dfcat.fillna("tbd")
dfcat["internal"] = dfcat["cat"] + "_" + dfcat["concatted"]
dfcat["col"] = dfcat.internal.str.split("_")
dfcat = dfcat.explode("col").drop("internal", 1)

dfcat = dfcat.pivot(index="concatted", columns="col", values="truth").reset_index().fillna(False)
df.merge(dfcat, on=["concatted"])