Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/338.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将字符串的数据帧列合并到Pandas中的单个列中_Python_Pandas - Fatal编程技术网

Python 将字符串的数据帧列合并到Pandas中的单个列中

Python 将字符串的数据帧列合并到Pandas中的单个列中,python,pandas,Python,Pandas,我在数据框(从CSV导入)中有包含如下文本的列 "New york", "Atlanta", "Mumbai" "Beijing", "Paris", "Budapest" "Brussels", "Oslo", "Singapore" 我想将所有列折叠/合并为一列,如下所示 New york Atlanta Beijing Paris Budapest Brussels Oslo Singapore 如何在pandas中执行此操作?假设您有这样一个数据帧: >>> df

我在数据框(从CSV导入)中有包含如下文本的列

"New york", "Atlanta", "Mumbai"
"Beijing", "Paris", "Budapest"
"Brussels", "Oslo", "Singapore"
我想将所有列折叠/合并为一列,如下所示

New york Atlanta
Beijing Paris Budapest
Brussels Oslo Singapore

如何在pandas中执行此操作?

假设您有这样一个
数据帧:

>>> df
          0        1          2
0  New york  Atlanta     Mumbai
1   Beijing    Paris   Budapest
2  Brussels     Oslo  Singapore
然后,简单使用
pd.DataFrame.apply
方法将很好地工作:

>>> df.apply(" ".join, axis=1)
0    New york Atlanta Mumbai
1     Beijing Paris Budapest
2    Brussels Oslo Singapore
dtype: object
注意,我必须传递axis=1,以便它跨列应用,而不是跨行应用。即:

>>> df.apply(" ".join, axis=0)
0    New york Beijing Brussels
1           Atlanta Paris Oslo
2    Mumbai Budapest Singapore
dtype: object
更快(但更丑)的版本包括:

在较大(10kx5)数据帧上:

%timeit df.apply(" ".join, axis=1)
10 loops, best of 3: 112 ms per loop

%timeit df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')
100 loops, best of 3: 4.48 ms per loop
          A         B          C                     result
0  New york   Beijing   Brussels  New york Beijing Brussels
1   Atlanta     Paris       Oslo         Atlanta Paris Oslo
2    Mumbai  Budapest  Singapore  Mumbai Budapest Singapore

以下是更多的方法:

def pir(df):
    df = df.copy()
    df.insert(2, 's', ' ', 1)
    df.insert(1, 's', ' ', 1)
    return df.sum(1)

def pir2(df):
    df = df.copy()
    return pd.MultiIndex.from_arrays(df.values.T).to_series().str.join(' ').reset_index(drop=True)

def pir3(df):
    a = df.values[:, 0].copy()
    for j in range(1, df.shape[1]):
        a += ' ' + df.values[:, j]
    return pd.Series(a)

时机 pir3似乎比小型
df

pir3在较大的
df
30000行上仍然最快


为了完整起见:

In [160]: df1.add([' '] * (df1.columns.size - 1) + ['']).sum(axis=1)
Out[160]:
0    New york Atlanta Mumbai
1     Beijing Paris Budapest
2    Brussels Oslo Singapore
dtype: object
说明:

In [162]: [' '] * (df.columns.size - 1) + ['']
Out[162]: [' ', ' ', '']
针对300K行DF的计时:

In [68]: df = pd.concat([df] * 10**5, ignore_index=True)

In [69]: df.shape
Out[69]: (300000, 3)

In [76]: %timeit df.apply(" ".join, axis=1)
1 loop, best of 3: 5.8 s per loop

In [77]: %timeit df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')
10 loops, best of 3: 138 ms per loop

In [79]: %timeit pir(df)
1 loop, best of 3: 499 ms per loop

In [80]: %timeit pir2(df)
10 loops, best of 3: 174 ms per loop

In [81]: %timeit pir3(df)
10 loops, best of 3: 115 ms per loop

In [159]: %timeit df.add([' '] * (df.columns.size - 1) + ['']).sum(axis=1)
1 loop, best of 3: 478 ms per loop

结论:如果您喜欢更明确的内容,当前的赢家是

从如下所示的数据帧df开始:

>>> df
          A         B          C
0  New york   Beijing   Brussels
1   Atlanta     Paris       Oslo
2    Mumbai  Budapest  Singapore
df['result'] = df['A'] + ' ' + df['B'] + ' ' + df['C']
可以创建如下所示的新列:

>>> df
          A         B          C
0  New york   Beijing   Brussels
1   Atlanta     Paris       Oslo
2    Mumbai  Budapest  Singapore
df['result'] = df['A'] + ' ' + df['B'] + ' ' + df['C']
在这种情况下,结果存储在原始数据帧的“结果”列中:

%timeit df.apply(" ".join, axis=1)
10 loops, best of 3: 112 ms per loop

%timeit df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')
100 loops, best of 3: 4.48 ms per loop
          A         B          C                     result
0  New york   Beijing   Brussels  New york Beijing Brussels
1   Atlanta     Paris       Oslo         Atlanta Paris Oslo
2    Mumbai  Budapest  Singapore  Mumbai Budapest Singapore

您可以通过
df.T.apply(“.join”)来保存一些字符。
挑剔:
pir3
应该连接
df.values[:,0]
df.values[:,1]
。@juanpa.arrivillaga您的意思是什么<代码>局部放电浓度
?或
np.连接
?这两个都不能组合字符串。我必须使用
join
。除非我误解了你的意思。我指的是
+
操作员。反正你现在已经修好了。