Python 将具有图形节点和边级别的数据帧转换为方形矩阵
我的Googlefu让我失望了 我有一个熊猫Python 将具有图形节点和边级别的数据帧转换为方形矩阵,python,pandas,graph,digraphs,Python,Pandas,Graph,Digraphs,我的Googlefu让我失望了 我有一个熊猫数据帧,格式如下: Level 1 Level 2 Level 3 Level 4 ------------------------------------- A B C NaN A B D E A B D F G H NaN NaN G
数据帧
,格式如下:
Level 1 Level 2 Level 3 Level 4
-------------------------------------
A B C NaN
A B D E
A B D F
G H NaN NaN
G I J K
A B C D E F G H I J K
---------------------------------------------
A | 0 1 0 0 0 0 0 0 0 0 0
B | 0 0 1 1 0 0 0 0 0 0 0
C | 0 0 0 0 0 0 0 0 0 0 0
D | 0 0 0 0 1 1 0 0 0 0 0
E | 0 0 0 0 0 0 0 0 0 0 0
F | 0 0 0 0 0 0 0 0 0 0 0
G | 0 0 0 0 0 0 0 1 1 0 0
H | 0 0 0 0 0 0 0 0 0 0 0
I | 0 0 0 0 0 0 0 0 0 1 0
J | 0 0 0 0 0 0 0 0 0 0 1
K | 0 0 0 0 0 0 0 0 0 0 0
它基本上包含一个图的节点,其级别描述了从低阶级别到高阶级别的输出边。我想转换数据帧/创建表单的新数据帧:
Level 1 Level 2 Level 3 Level 4
-------------------------------------
A B C NaN
A B D E
A B D F
G H NaN NaN
G I J K
A B C D E F G H I J K
---------------------------------------------
A | 0 1 0 0 0 0 0 0 0 0 0
B | 0 0 1 1 0 0 0 0 0 0 0
C | 0 0 0 0 0 0 0 0 0 0 0
D | 0 0 0 0 1 1 0 0 0 0 0
E | 0 0 0 0 0 0 0 0 0 0 0
F | 0 0 0 0 0 0 0 0 0 0 0
G | 0 0 0 0 0 0 0 1 1 0 0
H | 0 0 0 0 0 0 0 0 0 0 0
I | 0 0 0 0 0 0 0 0 0 1 0
J | 0 0 0 0 0 0 0 0 0 0 1
K | 0 0 0 0 0 0 0 0 0 0 0
包含1
的单元格表示从相应行到相应列的输出边。在Pandas中是否有一种不需要循环和条件的Pythonic方法来实现这一点?试试下面的代码:
df = pd.DataFrame({'level_1':['A', 'A', 'A', 'G', 'G'], 'level_2':['B', 'B', 'B', 'H', 'I'],
'level_3':['C', 'D', 'D', np.nan, 'J'], 'level_4':[np.nan, 'E', 'F', np.nan, 'K']})
您的输入数据帧是:
level_1 level_2 level_3 level_4
0 A B C NaN
1 A B D E
2 A B D F
3 G H NaN NaN
4 G I J K
解决办法是:
# Get unique values from input dataframe and filter out 'nan' values
list_nodes = []
for i_col in df.columns.tolist():
list_nodes.extend(filter(lambda v: v==v, df[i_col].unique().tolist()))
# Initialize your result dataframe
df_res = pd.DataFrame(columns=sorted(list_nodes), index=sorted(list_nodes))
df_res = df_res.fillna(0)
# Get 'index-column' pairs from input dataframe ('nan's are exluded)
list_indexes = []
for i_col in range(df.shape[1]-1):
list_indexes.extend(list(set([tuple(i) for i in df.iloc[:, i_col:i_col+2]\
.dropna(axis=0).values.tolist()])))
# Use 'index-column' pairs to fill the result dataframe
for i_list_indexes in list_indexes:
df_res.set_value(i_list_indexes[0], i_list_indexes[1], 1)
最终结果是:
A B C D E F G H I J K
A 0 1 0 0 0 0 0 0 0 0 0
B 0 0 1 1 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 1 1 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 1 1 0 0
H 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 1 0
J 0 0 0 0 0 0 0 0 0 0 1
K 0 0 0 0 0 0 0 0 0 0 0