Python 在新列中计算dataframe中以逗号分隔的字符串
我有以下建议:Python 在新列中计算dataframe中以逗号分隔的字符串,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有以下建议: df = pd.DataFrame({'Name': ['John', 'Sara', 'Paul', 'Guest'], 'Interaction': ['share,like,share,like,like,like', 'love,like,share,like,love,like', 'share,like,share,like,like,like,share,like,share,like,like,hug','share,like,care,like,like,lik
df = pd.DataFrame({'Name': ['John', 'Sara', 'Paul', 'Guest'], 'Interaction': ['share,like,share,like,like,like', 'love,like,share,like,love,like', 'share,like,share,like,like,like,share,like,share,like,like,hug','share,like,care,like,like,like']})
Name Interaction
0 John share,like,share,like,like,like
1 Sara love,like,share,like,love,like
2 Paul share,like,share,like,like,like,share,like,sha...
3 Guest share,like,care,like,like,like
我想创建第三列,将单个交互的数量计算为int
我所做的:
df['likes'] = df[df['Interaction'] == 'like'].groupby('Name')['Interaction'].transform(lambda x: x[x.str.contains('like')].count())
为了分享,关心,我也做了同样的事。。等
但它不起作用
Name Interaction likes shares
0 John share,like,share,like,like,like NaN NaN
1 Sara love,like,share,like,love,like NaN NaN
2 Paul share,like,share,like,like,like,share,like,sha... NaN NaN
3 Guest share,like,care,like,like,like NaN NaN
如何将每个交互计算为int
,然后在最后一列中找到每行的总数
谢谢您可以按
,
拆分字符串,将其分解并计算值
:
df.join(df['Interaction'].str.split(',')
.explode()
.groupby(level=0).value_counts()
.unstack(fill_value=0))
输出:
Name Interaction care hug like love share
0 John share,like,share,like,like,like 0 0 4 0 2
1 Sara love,like,share,like,love,like 0 0 3 2 1
2 Paul share,like,share,like,like,like,share,like,sha... 0 1 7 0 4
3 Guest share,like,care,like,like,like 1 0 4 0 1
首先,您需要
str.split
逗号上的列,展开结果以创建一个数据帧,stack
以获取一个序列,并使用str.get\u dummies
为每个不同的单词创建一列,并为序列中的相应值添加1。最后,sum
on level=0返回原始形状<代码>将结果连接到原始数据帧
df = df.join( df['Interaction'].str.split(',', expand=True)
.stack()
.str.get_dummies()
.sum(level=0)
)
print(df)
Name Interaction care hug like \
0 John share,like,share,like,like,like 0 0 4
1 Sara love,like,share,like,love,like 0 0 3
2 Paul share,like,share,like,like,like,share,like,sha... 0 1 7
3 Guest share,like,care,like,like,like 1 0 4
love share
0 0 2
1 2 1
2 0 4
3 0 1
让我们做一下pd.交叉表
s = df.Interaction.str.split(',').explode()
df = df.join(pd.crosstab(s.index,s))