python:如何对包含字符串数组的列进行热编码
我想知道如何对包含字符串数组的列进行热编码 我正在尝试从df到df2:python:如何对包含字符串数组的列进行热编码,python,pandas,Python,Pandas,我想知道如何对包含字符串数组的列进行热编码 我正在尝试从df到df2: import pandas as pd # This is the original data frame df = pd.DataFrame({'menu': [['Italian', 'Greek'], ['Japanese'], ['Italian','Greek', 'Japanese']], 'price': ['$$', '$$', '$']}) df.head() #
import pandas as pd
# This is the original data frame
df = pd.DataFrame({'menu': [['Italian', 'Greek'], ['Japanese'],
['Italian','Greek', 'Japanese']], 'price': ['$$', '$$', '$']})
df.head()
# This is the desired result
df2 = pd.DataFrame({'menu': [['Italian', 'Greek'], ['Japanese'],
['Italian','Greek', 'Japanese']],
'price': ['$$', '$$', '$'],
'Italian': [1,0,1],
'Greek': [1,0,1],
'Japanese': [0,1,1]
})
df2.head()
用于:
您可以使用
pd.get_dummies
,pd.apply
,DataFrame.join
和Series.stack
df.join(pd.get_dummies(df.menu.apply(pd.Series).stack()).sum(level=0))
输出:
menu price Greek Italian Japanese
0 [Italian, Greek] $$ 1 1 0
1 [Japanese] $$ 0 0 1
2 [Italian, Greek, Japanese] $ 1 1 1
menu price Greek Italian Japanese
0 [Italian, Greek] $$ 1 1 0
1 [Japanese] $$ 0 0 1
2 [Italian, Greek, Japanese] $ 1 1 1