Python 对dataframe列进行二值化，并相应地拆分其他列值_Python_Pandas

Python 对dataframe列进行二值化，并相应地拆分其他列值

python pandas

Python 对dataframe列进行二值化，并相应地拆分其他列值,python,pandas,Python,Pandas,我正在寻找一种方法来对x进行二值化，并根据x 这是我试图实现的输出： df = pd.DataFrame({ 'x':[1,1,1,1,0,0,0,0,2,2,2,2], 'y':[1.2,3.4,5.2,4.8,5.4,5.9,4.3,2.1,1.2,6.7,2.9,7.3] }) 为了实现上述结果，我基本上创建了新的列 df2['x1']=（df.x==1）.astype（int），df2['y1']=df2.x1*df.y等等，但我希望有更好的方法来实现这一点交织不

我正在寻找一种方法来对

进行二值化，并根据

这是我试图实现的输出：

df = pd.DataFrame({
    'x':[1,1,1,1,0,0,0,0,2,2,2,2],
    'y':[1.2,3.4,5.2,4.8,5.4,5.9,4.3,2.1,1.2,6.7,2.9,7.3]
})

为了实现上述结果，我基本上创建了新的列

df2['x1']=（df.x==1）.astype（int），df2['y1']=df2.x1*df.y

等等，但我希望有更好的方法来实现这一点

交织

不同组合概念

d = pd.get_dummies(df.x)
pd.concat(
    {'x': d, 'y': d.mul(df.y, axis=0)},
    axis=1
).swaplevel(0, 1, 1).sort_index(1)

    0       1       2     
    x    y  x    y  x    y
0   0  0.0  1  1.2  0  0.0
1   0  0.0  1  3.4  0  0.0
2   0  0.0  1  5.2  0  0.0
3   0  0.0  1  4.8  0  0.0
4   1  5.4  0  0.0  0  0.0
5   1  5.9  0  0.0  0  0.0
6   1  4.3  0  0.0  0  0.0
7   1  2.1  0  0.0  0  0.0
8   0  0.0  0  0.0  1  1.2
9   0  0.0  0  0.0  1  6.7
10  0  0.0  0  0.0  1  2.9
11  0  0.0  0  0.0  1  7.3

创造性的或

另类

i, u = pd.factorize(df.x)
r = np.arange(len(df))
out = np.zeros((len(df), len(u), 2))
out[r, i, 0] = 1
out[r, i, 1] = df.y

pd.DataFrame(out.reshape(len(df), -1), df.index)

交叉表

pd.concat({x:y.assign(x=1) for x , y in df.groupby('x')},1)
Out[431]: 
      0         1         2     
      x    y    x    y    x    y
0   NaN  NaN  1.0  1.2  NaN  NaN
1   NaN  NaN  1.0  3.4  NaN  NaN
2   NaN  NaN  1.0  5.2  NaN  NaN
3   NaN  NaN  1.0  4.8  NaN  NaN
4   1.0  5.4  NaN  NaN  NaN  NaN
5   1.0  5.9  NaN  NaN  NaN  NaN
6   1.0  4.3  NaN  NaN  NaN  NaN
7   1.0  2.1  NaN  NaN  NaN  NaN
8   NaN  NaN  NaN  NaN  1.0  1.2
9   NaN  NaN  NaN  NaN  1.0  6.7
10  NaN  NaN  NaN  NaN  1.0  2.9
11  NaN  NaN  NaN  NaN  1.0  7.3

@尤卡啊哈，让它发生吧。：-）伙计，我真的觉得你能想出这么好的东西真是了不起

i, u = pd.factorize(df.x)
r = np.arange(len(df))
out = np.zeros((len(df), len(u) * 2))
out[r, i * 2] = 1
out[r, i * 2 + 1] = df.y

pd.DataFrame(out, df.index)

      0    1    2    3    4    5
0   1.0  1.2  0.0  0.0  0.0  0.0
1   1.0  3.4  0.0  0.0  0.0  0.0
2   1.0  5.2  0.0  0.0  0.0  0.0
3   1.0  4.8  0.0  0.0  0.0  0.0
4   0.0  0.0  1.0  5.4  0.0  0.0
5   0.0  0.0  1.0  5.9  0.0  0.0
6   0.0  0.0  1.0  4.3  0.0  0.0
7   0.0  0.0  1.0  2.1  0.0  0.0
8   0.0  0.0  0.0  0.0  1.0  1.2
9   0.0  0.0  0.0  0.0  1.0  6.7
10  0.0  0.0  0.0  0.0  1.0  2.9
11  0.0  0.0  0.0  0.0  1.0  7.3

i, u = pd.factorize(df.x)
r = np.arange(len(df))
out = np.zeros((len(df), len(u), 2))
out[r, i, 0] = 1
out[r, i, 1] = df.y

pd.DataFrame(out.reshape(len(df), -1), df.index)

pd.concat({x:y.assign(x=1) for x , y in df.groupby('x')},1)
Out[431]: 
      0         1         2     
      x    y    x    y    x    y
0   NaN  NaN  1.0  1.2  NaN  NaN
1   NaN  NaN  1.0  3.4  NaN  NaN
2   NaN  NaN  1.0  5.2  NaN  NaN
3   NaN  NaN  1.0  4.8  NaN  NaN
4   1.0  5.4  NaN  NaN  NaN  NaN
5   1.0  5.9  NaN  NaN  NaN  NaN
6   1.0  4.3  NaN  NaN  NaN  NaN
7   1.0  2.1  NaN  NaN  NaN  NaN
8   NaN  NaN  NaN  NaN  1.0  1.2
9   NaN  NaN  NaN  NaN  1.0  6.7
10  NaN  NaN  NaN  NaN  1.0  2.9
11  NaN  NaN  NaN  NaN  1.0  7.3

s=pd.crosstab([df.x,df.y],df.x)
s1=s.copy()
s1[:]=s1.values*(s1.index.get_level_values(1).values[:,None])
pd.concat([s,s1],axis=1,keys=['x','y'])
Out[479]: 
       x          y          
x      0  1  2    0    1    2
x y                          
0 2.1  1  0  0  2.1  0.0  0.0
  4.3  1  0  0  4.3  0.0  0.0
  5.4  1  0  0  5.4  0.0  0.0
  5.9  1  0  0  5.9  0.0  0.0
1 1.2  0  1  0  0.0  1.2  0.0
  3.4  0  1  0  0.0  3.4  0.0
  4.8  0  1  0  0.0  4.8  0.0
  5.2  0  1  0  0.0  5.2  0.0
2 1.2  0  0  1  0.0  0.0  1.2
  2.9  0  0  1  0.0  0.0  2.9
  6.7  0  0  1  0.0  0.0  6.7
  7.3  0  0  1  0.0  0.0  7.3