Python 将数据帧中2个类别的值计数到透视表中
是一个参考,我已经发现做类似的操作,但不准确 我拥有的是:Python 将数据帧中2个类别的值计数到透视表中,python,pandas,pivot,Python,Pandas,Pivot,是一个参考,我已经发现做类似的操作,但不准确 我拥有的是: foll中的数据帧。格式: Tweets Classified FreqWord calm director day science meetings nasal talk cutting edge remote sensing research drought veg fluorescence calm lov
foll中的数据帧。格式:
Tweets Classified FreqWord
calm director day science meetings nasal talk cutting edge remote sensing research drought veg fluorescence calm love Positive drought
love thought drought Positive drought
reign mother kerr funny none tried make come back drought Positive drought
wonder could help thai market b post reuters drought devastates south europe crops Negative drought
wonder could help thai market b post reuters drought devastates south europe crops Negative crops
wonder could help thai market b post reuters drought devastates south europe crops Negative crops
wonder could help thai market b post reuters drought devastates south europe crops Negative business
every child safe drinking water thank uk aid providing suppo ensure children rights drought Positive drought
every child safe drinking water thank uk aid providing suppo ensure children rights drought Positive water
我需要的是:数据透视表中的数据帧,其中索引为
分类
,列为FreqWord
,值需要是在该频繁词中分类的出现次数tweet。简言之,类似foll的东西
Classified drought crops business water
Positive 5 0 0 1
Negative 1 2 1 0
另请注意对于这个数据集,我有更多的“常用词”和“分类词”您可以这样做:
pd.crosstab(df.Classified, df.FreqWord)
输出
FreqWord business crops drought water
Classified
Negative 1 2 1 0
Positive 0 0 4 1
或者得到你的假人:
df_out = pd.get_dummies(df[['Classified','FreqWord']], columns=['FreqWord'])\
.set_index('Classified').sum(level=0)
df_out.columns = df_out.columns.str.split('_').str[1]
输出:
business crops drought water
Classified
Positive 0 0 4 1
Negative 1 2 1 0
并且,如果您希望可以重置_索引:
df_out.reset_index()
Classified business crops drought water
0 Positive 0 0 4 1
1 Negative 1 2 1 0
了不起的工作@Scott!这很简单。我差点就因为这个扯头发!多全面的回答啊@马祖:谢谢你