Python 3.x 基于多种条件在pandas中创建列的最快方法
我目前正在使用此功能:Python 3.x 基于多种条件在pandas中创建列的最快方法,python-3.x,pandas,Python 3.x,Pandas,我目前正在使用此功能: def age_groupf(row): if row['Age'] <= 19: val = '15-19' elif row['Age'] <= 24: val = '20-24' elif row['Age'] <= 29: val = '25-29' elif row['Age'] <= 34: val = '30-34' elif ro
def age_groupf(row):
if row['Age'] <= 19:
val = '15-19'
elif row['Age'] <= 24:
val = '20-24'
elif row['Age'] <= 29:
val = '25-29'
elif row['Age'] <= 34:
val = '30-34'
elif row['Age'] <= 39:
val = '35-39'
elif row['Age'] <= 44:
val = '40-44'
elif row['Age'] <= 49:
val = '45-49'
elif row['Age'] <= 54:
val = '50-54'
elif row['Age'] <= 59:
val = '55-59'
else:
val = '60 and more'
return val
它似乎在工作,但速度很慢。我有多个100MB TXT文件,需要更快。与定义的存储箱和标签一起使用
例如:
bins = [15, 20, 25, 30, 35, 40, 45, 50, 55, 60, np.inf]
labels = [f'{x}-{y-1}' if y!=np.inf else f'{x} and more' for x, y in zip(bins[::], bins[1::])]
pd.cut(df['Age'], bins=bins, labels=labels)
选中pd.cut
或np.选择
bins = [15, 20, 25, 30, 35, 40, 45, 50, 55, 60, np.inf]
labels = [f'{x}-{y-1}' if y!=np.inf else f'{x} and more' for x, y in zip(bins[::], bins[1::])]
pd.cut(df['Age'], bins=bins, labels=labels)