Python 统计数据帧中的哈希标记频率_Python_Pandas_Dataframe

Python 统计数据帧中的哈希标记频率

python pandas dataframe

Python 统计数据帧中的哈希标记频率,python,pandas,dataframe,Python,Pandas,Dataframe,我试图计算数据框的“文本”列中的标签词的频率 index text 1 ello ello ello ello #hello #ello 2 red green blue black #colours 3 Season greetings #hello #goodbye 4 morning #goodMorning #hello 5 my favourite animal

我试图计算数据框的“文本”列中的标签词的频率

index        text
1            ello ello ello ello #hello #ello
2            red green blue black #colours
3            Season greetings #hello #goodbye 
4            morning #goodMorning #hello
5            my favourite animal #dog

上面的代码将对文本列中的所有字符串执行频率计数，但我只想返回hashtag频率

例如，在上面的数据帧上运行代码后，它应该返回

#hello        3
#goodbye      1
#goodMorning  1
#ello         1
#colours      1
#dog          1

有没有一种方法可以稍微重新调整我的word_freq代码，这样它只计算hashtag单词，并以我上面的方式返回它们？提前感谢。

使用列

文本

查找所有标签词，然后使用+：

另一个想法是使用+：

结果:

print(counts)
#hello          3
#dog            1
#colours        1
#ello           1
#goodMorning    1
#goodbye        1
Name: text, dtype: int64

使用该选项的一种方法是从结果中删除

。然后

值\u也会计数
s = df['text'].str.extractall('(?<=#)(\w*)')[0].value_counts()
print(s)
hello          3
colours        1
goodbye        1
ello           1
goodMorning    1
dog            1
Name: 0, dtype: int64

s=df['text'].str.extractall（'（？这是一个稍微详细的解决方案，但它可以解决问题
dictionary_count=data_100.TicketDescription.str.split(expand=True).stack().value_counts().to_dict()

dictionary_count={'accessgtgtjust': 1,
'sent': 1,
'investigate': 1,
'edit': 1,
'#prd': 1,
'getting': 1}

ert=[i for i in list(dictionary_count.keys()) if '#' in i]

ert
Out[238]: ['#prd']

unwanted = set(dictionary_count.keys()) - set(ert)

for unwanted_key in unwanted: 
   del dictionary_count[unwanted_key]

dictionary_count
Out[241]: {'#prd': 1}

请包括您是否尝试在单元格中筛选单词，并仅保留以#开头的单词？欢迎使用。规则要求您显示您自己尝试调整代码，并发布。这没有MCVE。您不能只发布您想要为您编写的代码的规范。
print(counts)
#hello          3
#dog            1
#colours        1
#ello           1
#goodMorning    1
#goodbye        1
Name: text, dtype: int64

s = df['text'].str.extractall('(?<=#)(\w*)')[0].value_counts()
print(s)
hello          3
colours        1
goodbye        1
ello           1
goodMorning    1
dog            1
Name: 0, dtype: int64

dictionary_count=data_100.TicketDescription.str.split(expand=True).stack().value_counts().to_dict()

dictionary_count={'accessgtgtjust': 1,
'sent': 1,
'investigate': 1,
'edit': 1,
'#prd': 1,
'getting': 1}

ert=[i for i in list(dictionary_count.keys()) if '#' in i]

ert
Out[238]: ['#prd']

unwanted = set(dictionary_count.keys()) - set(ert)

for unwanted_key in unwanted: 
   del dictionary_count[unwanted_key]

dictionary_count
Out[241]: {'#prd': 1}