Python 熊猫透视:具有多个列的透视
我最初从以下数据帧开始: 数据集与回答多个问题的用户相关,这些问题具有多个答案选择,并且用户能够回答多个答案Python 熊猫透视:具有多个列的透视,python,pandas,pivot-table,Python,Pandas,Pivot Table,我最初从以下数据帧开始: 数据集与回答多个问题的用户相关,这些问题具有多个答案选择,并且用户能够回答多个答案 movie_id, user_id, rated_value, question_id, answer_id, genre, user_gender, user_ethnicity 101, 345, 3.5, 1, 1, comedy, male, white 101, 345, 3.5, 1, 2, comedy, male, white 101, 345, 3.5, 2, 1, c
movie_id, user_id, rated_value, question_id, answer_id, genre, user_gender, user_ethnicity
101, 345, 3.5, 1, 1, comedy, male, white
101, 345, 3.5, 1, 2, comedy, male, white
101, 345, 3.5, 2, 1, comedy, male, white
125, 345, 4.5, 1, 4, drama, male, white
101, 233, 4.0, 1, 3, comedy, female, black
101, 233, 4.0, 2, 2, comedy, female, black
125, 233, 3.0, 1, 1, drama, female, black
125, 233, 3.0, 2, 2, drama, female, black
125, 333, 3.0, 1, 1, comedy, male, asian
125, 333, 3.0, 2, 2, comedy, male, asian
我想通过旋转使这张桌子变平。我可以在不引入流派、用户性别、用户种族的情况下成功地做到以下几点:
pivoted_df = df_to_pivot.assign(val=1).pivot_table(
index=['movie_id',
'user_id',
'rated_value'],
columns=['question_id',
'answer_id'],
values=['question_id', 'answer_id'],
fill_value=0)
然后将问题和答案id组合起来,使列反映为1\u 1,1\u 2
pivoted_df.columns = pivoted_df.columns.droplevel()
pivoted_df.columns = ['{}_{}'.format(l1, l2).strip() for l1, l2 in pivoted_df.columns.values]
pivoted_df = pivoted_df.reset_index()
movie\u id user\u id rating\u value 1\u 1\u 2 1\u 3 1\u 4…
但是当尝试添加类型、用户性别、用户种族时
pivoted_df = df_to_pivot.assign(val=1).pivot_table(
index=['movie_id',
'user_id',
'rated_value'],
columns=['question_id',
'answer_id', 'genre', 'user_gender','user_ethnicity'],
values=['question_id', 'answer_id', 'genre', 'user_gender','user_ethnicity'],
fill_value=0)
这真的不管用
我的目标是像其他专栏一样关注类型、用户性别、用户种族
movie\u id user\u id rated\u value 1\u 1\u 2 1\u 3 1\u 4…喜剧、戏剧…男性、女性、黑人、白人、亚裔
output:
movie_id, user_id, rated_value , 1_1, 1_2, 1_3, 1_4, comedy, drama, male, female, white, black, asian
101, 345, 3.5, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0
目标是获得电影id、每行的用户id对以及所有其他反映为1和0的内容。将问题id和答案id合并到一列中,然后使用pd.get\u假人
df['QandA'] = df['question_id'].astype(str) + '_' + df['answer_id'].astype(str)
pd.get_dummies(df, columns=['QandA','genre','user_gender','user_ethnicity'])
输出:
movie_id user_id rated_value question_id answer_id QandA_1_1 QandA_1_2 QandA_1_3 QandA_1_4 QandA_2_1 QandA_2_2 genre_comedy genre_drama user_gender_female \
0 101 345 3.5 1 1 1 0 0 0 0 0 1 0 0
1 101 345 3.5 1 2 0 1 0 0 0 0 1 0 0
2 101 345 3.5 2 1 0 0 0 0 1 0 1 0 0
3 125 345 4.5 1 4 0 0 0 1 0 0 0 1 0
4 101 233 4.0 1 3 0 0 1 0 0 0 1 0 1
5 101 233 4.0 2 2 0 0 0 0 0 1 1 0 1
6 125 233 3.0 1 1 1 0 0 0 0 0 0 1 1
7 125 233 3.0 2 2 0 0 0 0 0 1 0 1 1
8 125 333 3.0 1 1 1 0 0 0 0 0 1 0 0
9 125 333 3.0 2 2 0 0 0 0 0 1 1 0 0
user_gender_male user_ethnicity_asian user_ethnicity_black user_ethnicity_white
0 1 0 0 1
1 1 0 0 1
2 1 0 0 1
3 1 0 0 1
4 0 0 1 0
5 0 0 1 0
6 0 0 1 0
7 0 0 1 0
8 1 1 0 0
9 1 1 0 0
movie_id user_id rated_value question_id answer_id genre_comedy genre_drama user_gender_female user_gender_male user_ethnicity_asian user_ethnicity_black \
0 101 345 3.5 1 1 1 0 0 1 0 0
1 101 345 3.5 1 2 1 0 0 1 0 0
2 101 345 3.5 2 1 1 0 0 1 0 0
3 125 345 4.5 1 4 0 1 0 1 0 0
4 101 233 4.0 1 3 1 0 1 0 0 1
5 101 233 4.0 2 2 1 0 1 0 0 1
6 125 233 3.0 1 1 0 1 1 0 0 1
7 125 233 3.0 2 2 0 1 1 0 0 1
8 125 333 3.0 1 1 1 0 0 1 1 0
9 125 333 3.0 2 2 1 0 0 1 1 0
user_ethnicity_white
0 1
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 0
9 0
我想你需要警察局的人来做个傻瓜
pd.get_dummies(df, columns=['genre','user_gender','user_ethnicity'])
输出:
movie_id user_id rated_value question_id answer_id QandA_1_1 QandA_1_2 QandA_1_3 QandA_1_4 QandA_2_1 QandA_2_2 genre_comedy genre_drama user_gender_female \
0 101 345 3.5 1 1 1 0 0 0 0 0 1 0 0
1 101 345 3.5 1 2 0 1 0 0 0 0 1 0 0
2 101 345 3.5 2 1 0 0 0 0 1 0 1 0 0
3 125 345 4.5 1 4 0 0 0 1 0 0 0 1 0
4 101 233 4.0 1 3 0 0 1 0 0 0 1 0 1
5 101 233 4.0 2 2 0 0 0 0 0 1 1 0 1
6 125 233 3.0 1 1 1 0 0 0 0 0 0 1 1
7 125 233 3.0 2 2 0 0 0 0 0 1 0 1 1
8 125 333 3.0 1 1 1 0 0 0 0 0 1 0 0
9 125 333 3.0 2 2 0 0 0 0 0 1 1 0 0
user_gender_male user_ethnicity_asian user_ethnicity_black user_ethnicity_white
0 1 0 0 1
1 1 0 0 1
2 1 0 0 1
3 1 0 0 1
4 0 0 1 0
5 0 0 1 0
6 0 0 1 0
7 0 0 1 0
8 1 1 0 0
9 1 1 0 0
movie_id user_id rated_value question_id answer_id genre_comedy genre_drama user_gender_female user_gender_male user_ethnicity_asian user_ethnicity_black \
0 101 345 3.5 1 1 1 0 0 1 0 0
1 101 345 3.5 1 2 1 0 0 1 0 0
2 101 345 3.5 2 1 1 0 0 1 0 0
3 125 345 4.5 1 4 0 1 0 1 0 0
4 101 233 4.0 1 3 1 0 1 0 0 1
5 101 233 4.0 2 2 1 0 1 0 0 1
6 125 233 3.0 1 1 0 1 1 0 0 1
7 125 233 3.0 2 2 0 1 1 0 0 1
8 125 333 3.0 1 1 1 0 0 1 1 0
9 125 333 3.0 2 2 1 0 0 1 1 0
user_ethnicity_white
0 1
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 0
9 0
你能发布你想要的输出数据帧吗?即使您的示例实际上也没有为mei运行,我想我有一个不正确的列名。我希望question\u id
和answer\u id
使用列值作为列标题。也许我可以分两步来做。首先像我成功的那样提问并回答问题,然后使用get\u dummies这仍然有多行用户id和用户评分的相同电影。@无效假设1\u 1和1\u 2是问题id和答案id的组合?希望这有帮助。我相信你能从这里处理好。快乐编码!