Python 使用pandas、numpy或其他命令将numpy数组连接到两个数组
我生成了一系列numpy数组,例如:Python 使用pandas、numpy或其他命令将numpy数组连接到两个数组,python,pandas,numpy,Python,Pandas,Numpy,我生成了一系列numpy数组,例如: import random N = 5 data = [[random.random() for i in range(N)] for j in range(N)] names = ['a','b','c','d','e'] df = pd.DataFrame(data) df = df.transpose() df.columns = names name value a 0.01 b 0.03 c 0.01 d
import random
N = 5
data = [[random.random() for i in range(N)] for j in range(N)]
names = ['a','b','c','d','e']
df = pd.DataFrame(data)
df = df.transpose()
df.columns = names
name value
a 0.01
b 0.03
c 0.01
d 0.2
e 0.04
a 0.2
b 0.01
....
a 0.1 0.2 0.01 0.2
b 0.3 0.1 0.2 0.01
....
即:
我想把它格式化成这样:
import random
N = 5
data = [[random.random() for i in range(N)] for j in range(N)]
names = ['a','b','c','d','e']
df = pd.DataFrame(data)
df = df.transpose()
df.columns = names
name value
a 0.01
b 0.03
c 0.01
d 0.2
e 0.04
a 0.2
b 0.01
....
a 0.1 0.2 0.01 0.2
b 0.3 0.1 0.2 0.01
....
(数据顺序不重要)
我尝试过数据帧转置:
df = pd.DataFrame(data)
df = df.transpose()
df.columns = names
但结果是这样的:
import random
N = 5
data = [[random.random() for i in range(N)] for j in range(N)]
names = ['a','b','c','d','e']
df = pd.DataFrame(data)
df = df.transpose()
df.columns = names
name value
a 0.01
b 0.03
c 0.01
d 0.2
e 0.04
a 0.2
b 0.01
....
a 0.1 0.2 0.01 0.2
b 0.3 0.1 0.2 0.01
....
你知道如何重新格式化numpy数组/pandas数据帧以获得两列数据吗?这就是你想要的吗
In [11]: df
Out[11]:
a b c d e
0 0.791796 0.428642 0.887860 0.803709 0.860545
1 0.230401 0.105232 0.617007 0.557678 0.590459
2 0.448462 0.314422 0.207188 0.785642 0.022271
3 0.075631 0.707029 0.111538 0.769387 0.174297
4 0.707566 0.299966 0.197642 0.145841 0.231135
In [12]: df.stack().reset_index(level=0, drop=True).reset_index()
Out[12]:
index 0
0 a 0.791796
1 b 0.428642
2 c 0.887860
3 d 0.803709
4 e 0.860545
5 a 0.230401
6 b 0.105232
7 c 0.617007
8 d 0.557678
9 e 0.590459
10 a 0.448462
11 b 0.314422
12 c 0.207188
13 d 0.785642
14 e 0.022271
15 a 0.075631
16 b 0.707029
17 c 0.111538
18 d 0.769387
19 e 0.174297
20 a 0.707566
21 b 0.299966
22 c 0.197642
23 d 0.145841
24 e 0.231135
您只需将
df
中的所有列合并在一起即可。由于列的名称不同,因此需要将它们设置为相同的名称。否则,pandas
将在concat
结果中添加新列
import random
import pandas as pd
N = 5
data = [[random.random() for i in range(N)] for j in range(N)]
names = ['a','b','c','d','e']
df = pd.DataFrame(data)
df.columns = names
df = df.transpose()
print df
# 0 1 2 3 4
# a 0.643042 0.061476 0.415979 0.209272 0.394414
# b 0.175363 0.580336 0.056173 0.468121 0.388956
# c 0.096257 0.570860 0.516667 0.892087 0.956790
# d 0.082906 0.340805 0.466074 0.010123 0.293006
# e 0.430240 0.759413 0.083779 0.442159 0.434603
df_col=[df[[i]] for i in range(len(df))] # separate columns in df
for col in df_col:
col.columns=['value'] # change the columns' name
res = pd.concat(df_col) # concat them all together
res.index.names=['name']
print res
# value
# name
# a 0.643042
# b 0.175363
# c 0.096257
# d 0.082906
# e 0.430240
# a 0.061476
# b 0.580336
# c 0.570860
# d 0.340805
# e 0.759413
# a 0.415979
# b 0.056173
# c 0.516667
# d 0.466074
# e 0.083779
# a 0.209272
# b 0.468121
# c 0.892087
# d 0.010123
# e 0.442159
# a 0.394414
# b 0.388956
# c 0.956790
# d 0.293006
# e 0.434603
您可以用于重复列名和展平数据帧的值:
#random dataframe
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (df)
A B C D E
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8
3 4 0 9 6 2
4 4 1 5 3 4
计时(len(df)=1M
):
如果需要输出numpy数组
添加:
生成“数据”的代码是不完整的很好的解决方案!规模也很好。但是请注意,
np.column\u stack
并不保留数据类型。