Python 使用DataFrame创建三个新列
我在下面有一个数据框,并尝试创建三个新的列“大”、“小”和“计数”。条件是计算有多少值大于/小于平均值并求和Python 使用DataFrame创建三个新列,python,pandas,dataframe,Python,Pandas,Dataframe,我在下面有一个数据框,并尝试创建三个新的列“大”、“小”和“计数”。条件是计算有多少值大于/小于平均值并求和 df = APPL Std_1 Std_2 Std_3 Mean 0 ACCMGR 106.8754 130.1600 107.1861 114.750510 1 ACCOUNTS 121.7034 113.4927 114.5
df =
APPL Std_1 Std_2 Std_3 Mean
0 ACCMGR 106.8754 130.1600 107.1861 114.750510
1 ACCOUNTS 121.7034 113.4927 114.5482 116.581458
2 AUTH 116.8585 112.4487 115.2700 114.859050
def make_count(comp_cols, mean_col):
count_d = {'greater': 0, 'less': 0}
for col in comp_cols:
if col > mean_col:
count_d['greater'] += 1
elif col < mean_col:
count_d['less'] += 1
return count_d['greater'], count_d['less'], (count_d['greater'] + count_d['less'])
def apply_make_count(df):
a,b,c,*d= df.apply(lambda row: make_count([row['Std_1'], row['Std_2'], row['Std_3']], row['Mean of Std']), axis=1)
df['greater'],df['less'],df['count']=a,b,c
apply_make_count(df)
我想成为什么样的人
df =
APPL Std_1 Std_2 Std_3 Mean greater less count
0 ACCMGR 106.8754 130.1600 107.1861 114.750510 1 2 3
1 ACCOUNTS 121.7034 113.4927 114.5482 116.581458 1 2 3
2 AUTH 116.8585 112.4487 115.2700 114.859050 2 1 3
看来你只需要
sub_df = df[['Std_1', 'Std_2', 'Std_3']]
df['greater'] = sub_df.gt(df.Mean.values).sum(1) # same as (sub_df > df.Mean.values).sum(1)
df['less'] = sub_df.lt(df.Mean.values).sum(1)
df['count'] = sub_df.count(1)
APPL Std_1 Std_2 Std_3 Mean greater less count
0 ACCMGR 106.8754 130.1600 107.1861 114.750510 1 2 3
1 ACCOUNTS 121.7034 113.4927 114.5482 116.581458 1 2 3
2 AUTH 116.8585 112.4487 115.2700 114.859050 2 1 3
看来你只需要
sub_df = df[['Std_1', 'Std_2', 'Std_3']]
df['greater'] = sub_df.gt(df.Mean.values).sum(1) # same as (sub_df > df.Mean.values).sum(1)
df['less'] = sub_df.lt(df.Mean.values).sum(1)
df['count'] = sub_df.count(1)
APPL Std_1 Std_2 Std_3 Mean greater less count
0 ACCMGR 106.8754 130.1600 107.1861 114.750510 1 2 3
1 ACCOUNTS 121.7034 113.4927 114.5482 116.581458 1 2 3
2 AUTH 116.8585 112.4487 115.2700 114.859050 2 1 3
试一试
df['greater']=(df.iloc[:,1:4].值>df['Mean']].值).和(轴=1)
df['less']=(df.iloc[:,1:4).值
试试看
df['greater']=(df.iloc[:,1:4].值>df['Mean']].值).和(轴=1)
df['less']=(df.iloc[:,1:4).值
您会收到错误,因为您在提供的原始解决方案中添加了,*d
# the way you rewrote it
a,b,c,*d= df.apply(lambda row: make_count([row['Std_1'], row['Std_2'], row['Std_3']], row['Mean of Std']), axis=1)
df['greater'], df['less'], df['count'] = a, b, c
# the code you were provided
a, b, c = df.apply(lambda row: make_count([row['Std_1'], row['Std_2'], row['Std_3']], row['Mean']), axis=1)
df['greater'], df['less'], df['count'] = list(zip(a, b, c))
您在此提供的解决方案
此外:
提供给您的原始解决方案是最快的解决方案:
%timeit(apply_make_count(df))
1.93 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
新的解决方案:
def test():
df['greater'] = (df.iloc[:, 1:4].values > df[['Mean']].values).sum(axis=1)
df['less'] = (df.iloc[:, 1:4].values < df[['Mean']].values).sum(axis=1)
df['count'] = df.iloc[:, 1:4].count(1)
%timeit(test())
2.6 ms ± 35.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
def test2():
sub_df = df[['Std_1', 'Std_2', 'Std_3']]
df['greater'] = sub_df.gt(df.Mean.values).sum(1) # same as (sub_df > df.Mean.values).sum(1)
df['less'] = sub_df.lt(df.Mean.values).sum(1)
df['count'] = sub_df.count(1)
%timeit(test2())
2.82 ms ± 263 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
def test():
df['greater']=(df.iloc[:,1:4]。值>df[['Mean']]。值)。总和(轴=1)
df['less']=(df.iloc[:,1:4).值df.Mean.values)相同。总和(1)
df['less']=sub_df.lt(df.平均值).sum(1)
df['count']=子函数df.count(1)
%timeit(test2())
每个回路2.82 ms±263µs(7次运行的平均值±标准偏差,每个100个回路)
您会收到错误,因为您在提供的原始解决方案中添加了,*d
# the way you rewrote it
a,b,c,*d= df.apply(lambda row: make_count([row['Std_1'], row['Std_2'], row['Std_3']], row['Mean of Std']), axis=1)
df['greater'], df['less'], df['count'] = a, b, c
# the code you were provided
a, b, c = df.apply(lambda row: make_count([row['Std_1'], row['Std_2'], row['Std_3']], row['Mean']), axis=1)
df['greater'], df['less'], df['count'] = list(zip(a, b, c))
您在此提供的解决方案
此外:
提供给您的原始解决方案是最快的解决方案:
%timeit(apply_make_count(df))
1.93 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
新的解决方案:
def test():
df['greater'] = (df.iloc[:, 1:4].values > df[['Mean']].values).sum(axis=1)
df['less'] = (df.iloc[:, 1:4].values < df[['Mean']].values).sum(axis=1)
df['count'] = df.iloc[:, 1:4].count(1)
%timeit(test())
2.6 ms ± 35.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
def test2():
sub_df = df[['Std_1', 'Std_2', 'Std_3']]
df['greater'] = sub_df.gt(df.Mean.values).sum(1) # same as (sub_df > df.Mean.values).sum(1)
df['less'] = sub_df.lt(df.Mean.values).sum(1)
df['count'] = sub_df.count(1)
%timeit(test2())
2.82 ms ± 263 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
def test():
df['greater']=(df.iloc[:,1:4]。值>df[['Mean']]。值)。总和(轴=1)
df['less']=(df.iloc[:,1:4).值df.Mean.values)相同。总和(1)
df['less']=sub_df.lt(df.平均值).sum(1)
df['count']=子函数df.count(1)
%timeit(test2())
每个回路2.82 ms±263µs(7次运行的平均值±标准偏差,每个100个回路)