Python 重塑数据帧的形状_Python_Pandas_Dataframe_Reshape_Lreshape

Python 重塑数据帧的形状

python pandas dataframe

Python 重塑数据帧的形状,python,pandas,dataframe,reshape,lreshape,Python,Pandas,Dataframe,Reshape,Lreshape,假设有这样一个数据帧： df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]], columns = ['A', 'B', 'A1', 'B1']) 我想要一个数据帧，它看起来像：什么不起作用： new_rows = int(df.shape[1]/2) * df.shape[0] new_cols = 2 df.values.reshape(new_rows, new_cols, order='F') 当然，我可以循环使用数据，并创建

假设有这样一个数据帧：

df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]], columns = ['A', 'B', 'A1', 'B1'])

我想要一个数据帧，它看起来像：

什么不起作用：

new_rows = int(df.shape[1]/2) * df.shape[0]
new_cols = 2
df.values.reshape(new_rows, new_cols, order='F')

当然，我可以循环使用数据，并创建一个新的列表，但肯定有更好的方法。有什么想法吗？

对于

id

列，您可以使用：

编辑：

当前未记录，但可能已删除（）

可能的解决方案是将所有3个函数合并为一个-可能是

melt

，但现在还没有实现。也许是新版本的熊猫。然后我的答案将被更新。

我通过三个步骤解决了这个问题：

创建一个新的数据帧

df2

，只保存要添加到初始数据帧

df

中的数据

从

df

中删除将添加到下面的数据（用于生成

df2

）

将

df2

附加到

df

像这样：

# step 1: create new dataframe
df2 = df[['A1', 'B1']]
df2.columns = ['A', 'B']

# step 2: delete that data from original
df = df.drop(["A1", "B1"], 1)

# step 3: append
df = df.append(df2, ignore_index=True)

请注意，在执行

df.append（）

操作时，需要指定

ignore\u index=True

，以便将新列追加到索引中，而不是保留它们的旧索引

您的最终结果应该是原始数据帧，数据按照您想要的方式重新排列：

In [16]: df
Out[16]:
    A   B
0   1   2
1   5   6
2   9  10
3   3   4
4   7   8
5  11  12

像这样使用

pd.concat（）

：

#Split into separate tables
df_1 = df[['A', 'B']]
df_2 = df[['A1', 'B1']]
df_2.columns = ['A', 'B'] # Make column names line up

# Add the ID column
df_1 = df_1.assign(id=1)
df_2 = df_2.assign(id=2)

# Concatenate
pd.concat([df_1, df_2])

pd.wide\u to\u long

函数几乎完全是针对这种情况构建的，在这种情况下，许多相同的变量前缀以不同的数字后缀结尾。唯一的区别是，第一组变量没有后缀，因此需要先重命名列

pd.wide\u to\u long

的唯一问题是它必须有一个标识变量，

，而不像

melt

，

reset\u index

用于创建一个这个唯一标识列，稍后会删除。我认为这可能在将来得到纠正

df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
  .reset_index()[['A', 'B', 'id']]

    A   B id
0   1   2  1
1   5   6  1
2   9  10  1
3   3   4  2
4   7   8  2
5  11  12  2

@Moritz-我明白了。我个人只会在for循环中这样做。虽然@jezrael的

lreshape

解决方案可能更适合这种情况。这是一个糟糕的解决方案。为什么不使用

pd.wide\u to\u long

？它是为这种情况而构建的。我添加了一个更健壮的答案，概括了几乎与您所处的情况相同的情况。

df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
  .reset_index()[['A', 'B', 'id']]

    A   B id
0   1   2  1
1   5   6  1
2   9  10  1
3   3   4  2
4   7   8  2
5  11  12  2