Python 从多个列中的值生成行_Python_Pandas

Python 从多个列中的值生成行

python pandas

Python 从多个列中的值生成行,python,pandas,Python,Pandas,我有大约1100万行和21列，因此： area_id_number, c000, c001, c002 ... 01293091302390, 2, 2, 0 ... 01293091302391, 2, 0, 0 ... 01293091302392, 3, 1, 1 ... 我想以这样的方式结束： value_id, area_id_number, value_type 1, 01293091302390, c000

我有大约1100万行和21列，因此：

area_id_number, c000, c001, c002 ...
01293091302390,    2,    2,    0 ...
01293091302391,    2,    0,    0 ...
01293091302392,    3,    1,    1 ...

我想以这样的方式结束：

value_id, area_id_number, value_type
       1, 01293091302390, c000
       2, 01293091302390, c000
       3, 01293091302390, c001
       4, 01293091302390, c001
       5, 01293091302391, c000
       6, 01293091302391, c000
       7, 01293091302392, c000
       8, 01293091302392, c000
       9, 01293091302392, c000
      10, 01293091302392, c001
      11, 01293091302392, c002
 ...

我还没有找到一种方法来做这件事。我查看了unpack/pivot/deaggregate（找不到使用这些术语的正确解决方案…）

这个问题的第二部分是，我会有任何记忆问题吗？有什么效率方面的事情我应该考虑吗？我最终应该会得到大约1.4亿行。

主要进程是通过

ndarray.repeat（）

计算的，我没有足够的内存来测试1100万行，但下面是代码：

首先创建测试数据：

import numpy as np
import pandas as pd

#create sample data
nrows = 500000
ncols = 21

nones = int(70e6)
ntwos = int(20e6)
nthrees = int(10e6)

rint = np.random.randint

counts = np.zeros((nrows, ncols), dtype=np.int8)
counts[rint(0, nrows, nones), rint(0, ncols, nones)] = 1
counts[rint(0, nrows, ntwos), rint(0, ncols, ntwos)] = 2
counts[rint(0, nrows, nthrees), rint(0, ncols, nthrees)] = 3

columns = ["c%03d" % i for i in range(ncols)]
index = ["%014d" % i for i in range(nrows)]

df = pd.DataFrame(counts, index=index, columns=columns)

以下是流程代码：

idx, col = np.where(df.values)
n = df.values[idx, col]
idx2 = df.index.values[idx.repeat(n)]
col2 = df.columns.values[col.repeat(n)]
df2 = pd.DataFrame({"id":idx2, "type":col2})

你为什么要这么做？看来我希望我的编辑能改进这个问题。顺便说一句，这是一个xy问题！我将在这些几何图形中生成随机点海瑞，这个很好用。这远远超出了我的编程水平，我会尽我最大的努力从中学习。非常感谢。