Python 将dataframe列打包到列表中_Python_Pandas

Python 将dataframe列打包到列表中

python pandas

Python 将dataframe列打包到列表中,python,pandas,Python,Pandas,我需要将数据框列打包到一个包含列表的列中。例如：为了制造列表列： list_col 0 [81,88,1] 1 [42,7,23] 2 [8,37,63] 3 [18,22,20] 如果我尝试 df.apply（列表，轴=1） python返回相同的数据帧以防我尝试 >>> df.apply(lambda r:{'list_col':list(r)},axis=1) a b c 0 NaN NaN NaN 1 NaN NaN NaN 2

我需要将数据框列打包到一个包含列表的列中。例如：

为了

制造列表列：

    list_col
0  [81,88,1]
1  [42,7,23]
2  [8,37,63]
3  [18,22,20]

如果我尝试

df.apply（列表，轴=1）

python返回相同的数据帧

以防我尝试

>>> df.apply(lambda r:{'list_col':list(r)},axis=1)
    a   b   c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN

它不起作用

甚至野蛮的方法

>>> df['list_col'] = ''
>>> for i in df.index:
    df.ix[i,'list_col'] = list(df.ix[i,df.columns[:-1]])

返回错误：

Traceback (most recent call last):
  File "<pyshell#45>", line 2, in <module>
    df.ix[i,'list_col'] = list(df.ix[i,df.columns[:-1]])
  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 88, in __setitem__
    self._setitem_with_indexer(indexer, value)
  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 158, in _setitem_with_indexer
    len(self.obj[labels[0]]) == len(value) or len(plane_indexer[0]) == len(value)):
TypeError: object of type 'int' has no len()

这给了我想要的，但可能还有更直接的方法？

只需在

df上指定列作为列表。值将执行以下操作：
df['list_col'] = list(df.values)

df
    a   b   c      list_col
0  81  88   1   [81, 88, 1]
1  42   7  23   [42, 7, 23]
2   8  37  63   [8, 37, 63]
3  18  22  20  [18, 22, 20]

这是一种矢量化方法，非常类似于：
针对4M行DF的计时：
In [69]: df.shape
Out[69]: (4000000, 3)

In [70]: %timeit list(df.values)
1 loop, best of 3: 2.04 s per loop

In [71]: %timeit df.values.tolist()
1 loop, best of 3: 993 ms per loop

伟大的正是我需要的。谢谢但是为什么df.apply（list，axis=1）不工作呢？这很混乱。df.apply
是数组中的每个值，因此df.apply（list，axis=1）
相当于：对数组中的每个值应用list（）
，即分别应用81=[81]、88=[88]、…
。所以它不会有任何效果。实际上，no.apply适用于整行（如果axis参数=1）。如果您使用df.apply（总和，轴=1），您将收到整行的总和。此外，尝试df.apply（lambda r:'，'.join（[str（e）表示r中的e]），axis=1），您将为每行获得一个结果。df.apply（list，1）失败的原因是，df.apply尝试在末尾将结果强制转换为数据帧或序列对象，因为这通常是您正在寻找的结果。如果不使用迭代器或生成器方法，就无法告诉它保持结果不变。使用df.apply更详细的替代方法是df.apply（lambda x:pd.Series（{“list_col”：list（x）}），1）
，它通过返回一系列列表来解决简化问题。@AndoJurai，从技术上讲，它不是一个ndarray，而是一个行值列表的数组，例如数组（行（值列表））
df['list_col'] = list(df.values)

df
    a   b   c      list_col
0  81  88   1   [81, 88, 1]
1  42   7  23   [42, 7, 23]
2   8  37  63   [8, 37, 63]
3  18  22  20  [18, 22, 20]

In [55]: df
Out[55]:
    a   b   c
0  81  88   1
1  42   7  23
2   8  37  63
3  18  22  20

In [56]: df['list_col'] = df.values.tolist()

In [57]: df
Out[57]:
    a   b   c      list_col
0  81  88   1   [81, 88, 1]
1  42   7  23   [42, 7, 23]
2   8  37  63   [8, 37, 63]
3  18  22  20  [18, 22, 20]

In [69]: df.shape
Out[69]: (4000000, 3)

In [70]: %timeit list(df.values)
1 loop, best of 3: 2.04 s per loop

In [71]: %timeit df.values.tolist()
1 loop, best of 3: 993 ms per loop