Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 折叠列并插入新行?_Python_Pandas - Fatal编程技术网

Python 折叠列并插入新行?

Python 折叠列并插入新行?,python,pandas,Python,Pandas,我的数据: df Out[79]: INC Theme Theme_Hat TRAIN_TEST 0 123 A NaN TRAIN 1 124 A NaN TRAIN 2 125 A NaN TRAIN 3 126 A NaN TRAIN 4 127 A NaN TRAIN 5 128 A N

我的数据:

df
Out[79]: 
    INC Theme Theme_Hat TRAIN_TEST
0   123     A       NaN      TRAIN
1   124     A       NaN      TRAIN
2   125     A       NaN      TRAIN
3   126     A       NaN      TRAIN
4   127     A       NaN      TRAIN
5   128     A       NaN      TRAIN
6   129     A       NaN      TRAIN
7   130     A       NaN      TRAIN
8   131     B       NaN      TRAIN
9   132     B         B       TEST
10  133     B         A       TEST
11  134     B         A       TEST
12  135     B         A       TEST
我试图将
Theme\u Hat
列折叠到
Theme
列中,同时维护
TRAIN\u TEST
指示器。我在下面的
循环中使用了
,但我的直觉告诉我肯定还有一些
熊猫式的解决方案。下面的尝试没有达到我想要的输出,因为
测试
在整个
df
过程中不断重复,而不是保留
列车
信息。这是我想要的输出:

Out[81]: 
    INC Theme TRAIN_TEST
0   123     A      TRAIN
1   124     A      TRAIN
2   125     A      TRAIN
3   126     A      TRAIN
4   127     A      TRAIN
5   128     A      TRAIN
6   129     A      TRAIN
7   130     A      TRAIN
8   131     B      TRAIN
9   132     B      TRAIN
10  132     B      TEST
11  133     B      TRAIN
12  133     A      TEST
13  134     B      TRAIN
14  134     A      TEST
15  135     B      TRAIN
16  135     A      TEST
以下是我迄今为止所做的工作:

# copy so we can reference the original dataframe as rows are inserted into df
df2 = df.copy(deep = True)
no_nulls = df2[df2['Theme_Hat'].notnull()]

# get rid of the Theme_Hat column for final dataframe (since we're migrating that info into Theme)
df.drop('Theme_Hat', inplace = True, axis = 1)

# I'm sure there's some pandas built-in functionality that 
# can handle this better than a for loop
for idx in no_nulls.index:
    # reference the unchanged df2 for INC, Theme_Hat, and TRAIN_TEST info
    new_row = pd.DataFrame({"INC": df2.loc[idx, 'INC'], 
                            "Theme": df2.loc[idx, 'Theme_Hat'],
                            "TRAIN_TEST": df2.loc[idx, 'TRAIN_TEST']}, index = [idx+1])
    print(new_row, '\n\n')

    # insert the new row right after the row at the current index
    df = pd.concat([df.ix[:idx], new_row, df.ix[idx+1:]]).reset_index(drop = True)
您可以使用:

使用,通过和
NaN


使用
pd.lreshape
,默认情况下自动删除
NaNs
。然后,您可以合并考虑中的两列,以在单个列中合并它们的值。最后,根据
INC
列值对它们进行排序

pd.lreshape(df, {'Theme': ['Theme','Theme_Hat']}).sort_values('INC').reset_index(drop=True)

print (pd.melt(df, id_vars=['INC','TRAIN_TEST'], value_name='Theme')
         .drop('variable', axis=1)
         .dropna(subset=['Theme']))

    INC TRAIN_TEST Theme
0   123      TRAIN     A
1   124      TRAIN     A
2   125      TRAIN     A
3   126      TRAIN     A
4   127      TRAIN     A
5   128      TRAIN     A
6   129      TRAIN     A
7   130      TRAIN     A
8   131      TRAIN     B
9   132       TEST     B
10  133       TEST     B
11  134       TEST     B
12  135       TEST     B
22  132       TEST     B
23  133       TEST     A
24  134       TEST     A
25  135       TEST     A
pd.lreshape(df, {'Theme': ['Theme','Theme_Hat']}).sort_values('INC').reset_index(drop=True)