Python 如何将字符串方法应用于数据帧的多个列_Python_String_Pandas

Python 如何将字符串方法应用于数据帧的多个列

python string pandas

Python 如何将字符串方法应用于数据帧的多个列,python,string,pandas,Python,String,Pandas,我有一个具有多个字符串列的数据帧。我想使用一个字符串方法，该方法对dataframe的多列上的序列有效。这是我的愿望： df = pd.DataFrame({'A': ['123f', '456f'], 'B': ['789f', '901f']}) df Out[15]: A B 0 123f 789f 1 456f 901f df = df.str.rstrip('f') df Out[16]: A B 0 123 789 1 456

我有一个具有多个字符串列的数据帧。我想使用一个字符串方法，该方法对dataframe的多列上的序列有效。这是我的愿望：

df = pd.DataFrame({'A': ['123f', '456f'], 'B': ['789f', '901f']})
df

Out[15]: 
      A     B
0  123f  789f
1  456f  901f

df = df.str.rstrip('f')
df
Out[16]: 
     A    B
0  123  789
1  456  901

显然，这不起作用，因为str操作仅对pandas系列对象有效。什么是合适的/最合适的pandas-y方法来实现这一点？

函数与

系列一起工作

，因此可以使用：

或创建

系列

by和last：

或使用：

最后，如果需要，将函数应用于某些列：

#add columns to lists
cols = ['A']
df[cols] = df[cols].apply(lambda x: x.str.rstrip('f'))
df[cols] = df[cols].stack().str.rstrip('f').unstack()
df[cols] = df[cols].stack().str.rstrip('f').unstack()

您可以使用

replace

与

regex=True

来模拟

rstrip

的行为，该行为可应用于整个

数据帧

：

df.replace(r'f$', '', regex=True)

由于

rstrip

需要剥离一系列字符，因此您可以轻松扩展：

df.replace(r'[abc]+$', '', regex=True)

您可以使用字典理解并将其馈送到

pd.DataFrame

构造函数：

res = pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})

目前，Pandas

str

方法效率低下。正则表达式甚至效率更低，但更易于扩展。和往常一样，您应该使用数据进行测试

# Benchmarking on Python 3.6.0, Pandas 0.19.2

def jez1(df):
    return df.apply(lambda x: x.str.rstrip('f'))

def jez2(df):
    return df.applymap(lambda x: x.rstrip('f'))

def jpp(df):
    return pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})

def user3483203(df):
    return df.replace(r'f$', '', regex=True)

df = pd.concat([df]*10000)

%timeit jez1(df)         # 33.1 ms per loop
%timeit jez2(df)         # 29.9 ms per loop
%timeit jpp(df)          # 13.2 ms per loop
%timeit user3483203(df)  # 42.9 ms per loop

     A    B
0  123  789
1  456  901

df.replace(r'[abc]+$', '', regex=True)

res = pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})

# Benchmarking on Python 3.6.0, Pandas 0.19.2

def jez1(df):
    return df.apply(lambda x: x.str.rstrip('f'))

def jez2(df):
    return df.applymap(lambda x: x.rstrip('f'))

def jpp(df):
    return pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})

def user3483203(df):
    return df.replace(r'f$', '', regex=True)

df = pd.concat([df]*10000)

%timeit jez1(df)         # 33.1 ms per loop
%timeit jez2(df)         # 29.9 ms per loop
%timeit jpp(df)          # 13.2 ms per loop
%timeit user3483203(df)  # 42.9 ms per loop