Python 在pandas中查找3列可能的唯一组合
我试图在pandas中找到3个变量列的所有可能组合。示例df如下所示:Python 在pandas中查找3列可能的唯一组合,python,pandas,dataframe,Python,Pandas,Dataframe,我试图在pandas中找到3个变量列的所有可能组合。示例df如下所示: Variable_Name Variable1 Variable2 Variable3 0 X 6.0% 8.0% 10.0% 1 Y 3.0% 4.0% 5.0% 2 Z 1.0% 3.0% 5.0% 这些组合只能
Variable_Name Variable1 Variable2 Variable3
0 X 6.0% 8.0% 10.0%
1 Y 3.0% 4.0% 5.0%
2 Z 1.0% 3.0% 5.0%
这些组合只能从该列中获取值,而不能将值移动到其他列,例如,使用4.0%作为“X”是不正确的
尝试使用itertools.compositions
,itertools.product
,itertools.permutation
,但这些结果给出了所有可能的组合
我希望结果如下所示,给出27种可能的组合:
Y X Z
0 3.0% 6.0% 1.0%
1 3.0% 6.0% 3.0%
2 3.0% 6.0% 5.0%
3 3.0% 8.0% 1.0%
4 3.0% 8.0% 3.0%
5 3.0% 8.0% 5.0%
6 3.0% 10.0% 1.0%
7 3.0% 10.0% 3.0%
8 3.0% 10.0% 5.0%
9 4.0% 8.0% 3.0%
10 4.0% 8.0% 1.0%
11 4.0% 8.0% 5.0%
12 4.0% 6.0% 1.0%
13 4.0% 6.0% 3.0%
14 4.0% 6.0% 5.0%
15 4.0% 10.0% 1.0%
16 4.0% 10.0% 3.0%
17 4.0% 10.0% 5.0%
18 5.0% 10.0% 5.0%
19 5.0% 10.0% 1.0%
20 5.0% 10.0% 3.0%
21 5.0% 8.0% 1.0%
22 5.0% 8.0% 3.0%
23 5.0% 8.0% 5.0%
24 5.0% 6.0% 1.0%
25 5.0% 6.0% 3.0%
26 5.0% 6.0% 5.0%
任何帮助都将不胜感激。让我们尝试连续交叉合并每个变量的值:
从functools导入reduce
作为pd进口熊猫
df=pd.DataFrame({'Variable_Name':{0:X',1:Y',2:Z'},
'Variable1':{0:'6.0%',1:'3.0%',2:'1.0%},
'Variable2':{0:'8.0%',1:'4.0%',2:'3.0%},
'Variable3':{0:'10.0%',1:'5.0%',2:'5.0%'})
#保存变量名称以备以后使用
变量名称=df['Variable\u Name']
#在自己的行中获取变量选项
new_df=df.set_index('Variable_Name').stack()\
.液滴液位(1,0)\
.reset_index()
#获取数据帧集合,每个数据帧都有自己的变量
dfs=元组(新的_-df[新的_-df['Variable _-Name']].eq(v)]
.drop(列=['Variable\u Name'])用于变量名中的v)
#连续交叉合并
new_df=reduce(lambda left,right:pd.merge(left,right,how='cross'),dfs)
#固定列名
new_df.columns=变量名称
#固定轴名称
new_df=new_df.重命名_轴(无,轴=1)
#展示
打印(新的字符串到字符串())
输出:
X Y Z
0 6.0% 3.0% 1.0%
1 6.0% 3.0% 3.0%
2 6.0% 3.0% 5.0%
3 6.0% 4.0% 1.0%
4 6.0% 4.0% 3.0%
5 6.0% 4.0% 5.0%
6 6.0% 5.0% 1.0%
7 6.0% 5.0% 3.0%
8 6.0% 5.0% 5.0%
9 8.0% 3.0% 1.0%
10 8.0% 3.0% 3.0%
11 8.0% 3.0% 5.0%
12 8.0% 4.0% 1.0%
13 8.0% 4.0% 3.0%
14 8.0% 4.0% 5.0%
15 8.0% 5.0% 1.0%
16 8.0% 5.0% 3.0%
17 8.0% 5.0% 5.0%
18 10.0% 3.0% 1.0%
19 10.0% 3.0% 3.0%
20 10.0% 3.0% 5.0%
21 10.0% 4.0% 1.0%
22 10.0% 4.0% 3.0%
23 10.0% 4.0% 5.0%
24 10.0% 5.0% 1.0%
25 10.0% 5.0% 3.0%
26 10.0% 5.0% 5.0%
X Y Z
0 6.0% 3.0% 1.0%
1 6.0% 3.0% 3.0%
2 6.0% 3.0% 5.0%
3 6.0% 4.0% 1.0%
4 6.0% 4.0% 3.0%
5 6.0% 4.0% 5.0%
6 6.0% 5.0% 1.0%
7 6.0% 5.0% 3.0%
8 6.0% 5.0% 5.0%
9 8.0% 3.0% 1.0%
10 8.0% 3.0% 3.0%
11 8.0% 3.0% 5.0%
12 8.0% 4.0% 1.0%
13 8.0% 4.0% 3.0%
14 8.0% 4.0% 5.0%
15 8.0% 5.0% 1.0%
16 8.0% 5.0% 3.0%
17 8.0% 5.0% 5.0%
18 10.0% 3.0% 1.0%
19 10.0% 3.0% 3.0%
20 10.0% 3.0% 5.0%
21 10.0% 4.0% 1.0%
22 10.0% 4.0% 3.0%
23 10.0% 4.0% 5.0%
24 10.0% 5.0% 1.0%
25 10.0% 5.0% 3.0%
26 10.0% 5.0% 5.0%
您可以使用交叉连接。在pandas中,您可以使用参数
how='cross'
使用pd.merge()
或pd.DataFrame.join()
。但是在交叉连接之前,您需要将每个变量放置在长(未插入)格式的数据帧中(您的表是宽格式(透视的))
如果您需要在循环中使用代码,它将是这样的
variables = df['Variable_Name'].unique()
columns_to_cross = ['Variable1', 'Variable2', 'Variable3']
cross_join_df = df.loc[df['Variable_Name'] == variables[0], columns_to_cross].T
for var in variables[1:]:
to_join_df = df.loc[df['Variable_Name'] == var, columns_to_cross].T
cross_join_df = pd.merge(cross_join_df, to_join_df, how='cross')
cross_join_df.columns = variables
@HenryEcker这是一个错误,它已被更改。
variables = df['Variable_Name'].unique()
columns_to_cross = ['Variable1', 'Variable2', 'Variable3']
cross_join_df = df.loc[df['Variable_Name'] == variables[0], columns_to_cross].T
for var in variables[1:]:
to_join_df = df.loc[df['Variable_Name'] == var, columns_to_cross].T
cross_join_df = pd.merge(cross_join_df, to_join_df, how='cross')
cross_join_df.columns = variables