相当于wordpress sanitize_文本的Python
我需要与wordpress sanitize_文本等效的Python 标题:相当于wordpress sanitize_文本的Python,python,django,wordpress,slug,stop-words,Python,Django,Wordpress,Slug,Stop Words,我需要与wordpress sanitize_文本等效的Python 标题: 'mygubbi raises $25 mn seed funding from bigbasket co founder others' wordpress提供 "mygubbi-raises-2-5-mn-seed-funding-bigbasket-co-founder-others" Python slugify提供 "mygubbi-raises-2-5-mn-seed-funding-from-bigb
'mygubbi raises $25 mn seed funding from bigbasket co founder others'
wordpress提供
"mygubbi-raises-2-5-mn-seed-funding-bigbasket-co-founder-others"
Python slugify提供
"mygubbi-raises-2-5-mn-seed-funding-from-bigbasket-co-founder-others"
我使用了python slugify python库
我是否应该仅仅删除诸如from、in和to之类的词。哪里可以找到这些停止词?有一个名为nltk的python模块。这为您提供了完全实现这一点的可能性 只需在这个网站上向下滚动一点,就可以找到标题“删除停止词”。有一些使用此模块执行此操作的示例。库中有一个
stopwords
参数,可与nltk
一起使用,如下所示:
from slugify import slugify
from nltk.corpus import stopwords
text = 'mygubbi raises $25 mn seed funding from bigbasket co founder others'
print slugify(text, stopwords=stopwords.words('english'))
import nltk
nltk.download()
这将打印:
mygubbi-raises-25-mn-seed-funding-bigbarket-co-founder-others
安装nltk
后,您可以安装其他语料库,其中一个是stopwords
。要执行此操作,请按如下方式运行其内置下载实用程序:
from slugify import slugify
from nltk.corpus import stopwords
text = 'mygubbi raises $25 mn seed funding from bigbasket co founder others'
print slugify(text, stopwords=stopwords.words('english'))
import nltk
nltk.download()
选择Corpora
,向下滚动至stopwords
,然后单击下载
按钮