Python 折叠列并插入新行?
我的数据:Python 折叠列并插入新行?,python,pandas,Python,Pandas,我的数据: df Out[79]: INC Theme Theme_Hat TRAIN_TEST 0 123 A NaN TRAIN 1 124 A NaN TRAIN 2 125 A NaN TRAIN 3 126 A NaN TRAIN 4 127 A NaN TRAIN 5 128 A N
df
Out[79]:
INC Theme Theme_Hat TRAIN_TEST
0 123 A NaN TRAIN
1 124 A NaN TRAIN
2 125 A NaN TRAIN
3 126 A NaN TRAIN
4 127 A NaN TRAIN
5 128 A NaN TRAIN
6 129 A NaN TRAIN
7 130 A NaN TRAIN
8 131 B NaN TRAIN
9 132 B B TEST
10 133 B A TEST
11 134 B A TEST
12 135 B A TEST
我试图将Theme\u Hat
列折叠到Theme
列中,同时维护TRAIN\u TEST
指示器。我在下面的循环中使用了,但我的直觉告诉我肯定还有一些熊猫式的解决方案。下面的尝试没有达到我想要的输出,因为测试
在整个df
过程中不断重复,而不是保留列车
信息。这是我想要的输出:
Out[81]:
INC Theme TRAIN_TEST
0 123 A TRAIN
1 124 A TRAIN
2 125 A TRAIN
3 126 A TRAIN
4 127 A TRAIN
5 128 A TRAIN
6 129 A TRAIN
7 130 A TRAIN
8 131 B TRAIN
9 132 B TRAIN
10 132 B TEST
11 133 B TRAIN
12 133 A TEST
13 134 B TRAIN
14 134 A TEST
15 135 B TRAIN
16 135 A TEST
以下是我迄今为止所做的工作:
# copy so we can reference the original dataframe as rows are inserted into df
df2 = df.copy(deep = True)
no_nulls = df2[df2['Theme_Hat'].notnull()]
# get rid of the Theme_Hat column for final dataframe (since we're migrating that info into Theme)
df.drop('Theme_Hat', inplace = True, axis = 1)
# I'm sure there's some pandas built-in functionality that
# can handle this better than a for loop
for idx in no_nulls.index:
# reference the unchanged df2 for INC, Theme_Hat, and TRAIN_TEST info
new_row = pd.DataFrame({"INC": df2.loc[idx, 'INC'],
"Theme": df2.loc[idx, 'Theme_Hat'],
"TRAIN_TEST": df2.loc[idx, 'TRAIN_TEST']}, index = [idx+1])
print(new_row, '\n\n')
# insert the new row right after the row at the current index
df = pd.concat([df.ix[:idx], new_row, df.ix[idx+1:]]).reset_index(drop = True)
您可以使用:
使用,通过和NaN
使用pd.lreshape
,默认情况下自动删除NaNs
。然后,您可以合并考虑中的两列,以在单个列中合并它们的值。最后,根据INC
列值对它们进行排序
pd.lreshape(df, {'Theme': ['Theme','Theme_Hat']}).sort_values('INC').reset_index(drop=True)
print (pd.melt(df, id_vars=['INC','TRAIN_TEST'], value_name='Theme')
.drop('variable', axis=1)
.dropna(subset=['Theme']))
INC TRAIN_TEST Theme
0 123 TRAIN A
1 124 TRAIN A
2 125 TRAIN A
3 126 TRAIN A
4 127 TRAIN A
5 128 TRAIN A
6 129 TRAIN A
7 130 TRAIN A
8 131 TRAIN B
9 132 TEST B
10 133 TEST B
11 134 TEST B
12 135 TEST B
22 132 TEST B
23 133 TEST A
24 134 TEST A
25 135 TEST A
pd.lreshape(df, {'Theme': ['Theme','Theme_Hat']}).sort_values('INC').reset_index(drop=True)