Python Oneliner可从多个列创建字符串列_Python_Pandas

Python Oneliner可从多个列创建字符串列

python pandas

Python Oneliner可从多个列创建字符串列,python,pandas,Python,Pandas,考虑以下代码 import pandas as pd df = pd.DataFrame({'col_1' : [1, 2, 3, 4],\ 'col_2' : ['a', 'b', 'c', 'd'],\ 'col_3' : ['hey', 'ho', 'banana', 'go']}) col = df['col_1'].astype(str) + '_' + \ df['col_2'].astype(

考虑以下代码

import pandas as pd
df = pd.DataFrame({'col_1' : [1, 2, 3, 4],\
                   'col_2' : ['a', 'b', 'c', 'd'],\
                   'col_3' : ['hey', 'ho', 'banana', 'go']})

col = df['col_1'].astype(str) + '_' + \
      df['col_2'].astype(str) + '_' + \
      df['col_3'].astype(str)

col
Out[12]: 
0       1_a_hey
1        2_b_ho
2    3_c_banana
3        4_d_go
dtype: object

有人能想到一个使用数组

col\u name=['col\u 1'，'col\u 2'，'col\u 3']

作为输入的一行程序生成

col

也就是说，

col\u sum=smart（col\u name）

显然，如果，例如，

不同的col\u集=['col\u 2'，'col\u 3']

something_smart(different_col_set)
Out[13]: 
0         a_hey
1          b_ho
2      c_banana
3          d_go
dtype: object

关键是col_names实际上是一个数组，包含数据帧列名的任何子集。

选项1]使用

apply

可以

'.join

In [5521]: df[col_names].astype(str).apply('_'.join, axis=1)
Out[5521]:
0       1_a_hey
1        2_b_ho
2    3_c_banana
3        4_d_go
dtype: object

以及

选项2]在这种情况下，使用

reduce

比应用更快

In [5527]: reduce(lambda x, y: x + '_' + y, [df[c].astype(str) for c in col_names])
Out[5527]:
0       1_a_hey
1        2_b_ho
2    3_c_banana
3        4_d_go
dtype: object

In [5528]: reduce(lambda x, y: x + '_' + y, [df[c].astype(str) for c in different_col_set])
Out[5528]:
0       a_hey
1        b_ho
2    c_banana
3        d_go
dtype: object

这类似于

reduce（lambda x，y:x.astype（str）+''.'+y.astype（str），[df[x]表示列名称中的x]）

时间安排

In [5556]: df.shape
Out[5556]: (10000, 3)

In [5553]: %timeit reduce(lambda x, y: x + '_' + y, [df[c].astype(str) for c in col_names])
10 loops, best of 3: 21.7 ms per loop

In [5554]: %timeit reduce(lambda x, y: x.astype(str) + '_' +y.astype(str), [df[x] for x in col_names])
10 loops, best of 3: 22.3 ms per loop

In [5555]: %timeit df[col_names].astype(str).apply('_'.join, axis=1)
1 loop, best of 3: 254 ms per loop

真的很快（很好：）工作起来很有魅力。非常感谢。我会在9分钟内接受你的回答。

可能会更快。

-我相信你会增加时间；）是的，确实更快。

In [5556]: df.shape
Out[5556]: (10000, 3)

In [5553]: %timeit reduce(lambda x, y: x + '_' + y, [df[c].astype(str) for c in col_names])
10 loops, best of 3: 21.7 ms per loop

In [5554]: %timeit reduce(lambda x, y: x.astype(str) + '_' +y.astype(str), [df[x] for x in col_names])
10 loops, best of 3: 22.3 ms per loop

In [5555]: %timeit df[col_names].astype(str).apply('_'.join, axis=1)
1 loop, best of 3: 254 ms per loop