Pandas 如何分割文本数据并计算数据框中出现的次数？_Pandas_Dataframe_Split

Pandas 如何分割文本数据并计算数据框中出现的次数？

pandas dataframe

Pandas 如何分割文本数据并计算数据框中出现的次数？,pandas,dataframe,split,Pandas,Dataframe,Split,我在dataframe中有以下格式的数据： df=pd.DataFrame([ [42,{"tags":["illustration","logo","design","ui"]}], [81,{"tags":["typography","icon","vector","ux"]}], [98,{"tags":["branding","app"]}], [52,{"tags":["animation","web","flat"]}], [17,{"tags

我在dataframe中有以下格式的数据：

df=pd.DataFrame([
    [42,{"tags":["illustration","logo","design","ui"]}],
    [81,{"tags":["typography","icon","vector","ux"]}],
    [98,{"tags":["branding","app"]}],
    [52,{"tags":["animation","web","flat"]}],
    [17,{"tags":["type","lettering"]}],
    [37,{"tags":["illustration","typography","branding","typography","branding"]}],
    [63,{"tags":["logo","icon","app","web","lettering"]}],
    [47,{"tags":["ui","ux"]}],
    [6,{"tags":["design","vector","icon","flat","lettering","branding","app"]}],
    [53,{"tags":["ui","ux","lettering","branding","app","animation","web","flat"]}],
    [64,{"tags":["branding","app","typography","branding"]}],
    [89,{"tags":["typography","branding","ux","lettering","branding"]}]
],columns=["_id","tags"])

我想用特定数量的标签（这个数字的分布）计算“id”的数量，因此对于上面的数据，它将是：

Number of posts    Number of tags 
     3                 2
     1                 3
     3                 4 
     3                 5
     1                 7

对于此任务，我应该如何处理给定格式的文本标记

谢谢

使用

DataFrame

constructor+和

列表

了解每个

标签的计数长度

作为

列表

s：

from collections import Counter

c = Counter([len(x['tags']) for x in df['tags']])

df = pd.DataFrame({'Number of posts':list(c.values()), ' Number of tags ': list(c.keys())})
print (df)
   Number of posts   Number of tags 
0                3                 4
1                3                 2
2                1                 3
3                3                 5
4                1                 7
5                1                 8

或使用

应用

：

非常感谢。我只是注意到一些数据可能不是我指定的格式。数据可能包括也可能不包括一些其他信息。请检查这里：pastebin.com/Pv4mXN8e您能告诉我在这种情况下如何更改代码吗？谢谢@耶斯雷尔索里，我现在离线，只打电话。所以，稍后再尝试寻找解决方案，这样我就可以获得一个只带有标记的数据帧（就像原来的问题一样）。当我运行这两种方法时，我得到了相同的错误：TypeError：字符串索引必须是整数

df = (df['tags'].apply(lambda x: len(x['tags']))
                .value_counts()
                .rename_axis('Number of tags')
                .reset_index(name='Number of posts')
                [['Number of posts','Number of tags']])
print (df)
   Number of posts  Number of tags
0                3               5
1                3               4
2                3               2
3                1               8
4                1               7
5                1               3