Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/354.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
相当于wordpress sanitize_文本的Python_Python_Django_Wordpress_Slug_Stop Words - Fatal编程技术网

相当于wordpress sanitize_文本的Python

相当于wordpress sanitize_文本的Python,python,django,wordpress,slug,stop-words,Python,Django,Wordpress,Slug,Stop Words,我需要与wordpress sanitize_文本等效的Python 标题: 'mygubbi raises $25 mn seed funding from bigbasket co founder others' wordpress提供 "mygubbi-raises-2-5-mn-seed-funding-bigbasket-co-founder-others" Python slugify提供 "mygubbi-raises-2-5-mn-seed-funding-from-bigb

我需要与wordpress sanitize_文本等效的Python

标题:

'mygubbi raises $25 mn seed funding from bigbasket co founder others'
wordpress提供

"mygubbi-raises-2-5-mn-seed-funding-bigbasket-co-founder-others"
Python slugify提供

"mygubbi-raises-2-5-mn-seed-funding-from-bigbasket-co-founder-others"
我使用了python slugify python库


我是否应该仅仅删除诸如from、in和to之类的词。哪里可以找到这些停止词?

有一个名为nltk的python模块。这为您提供了完全实现这一点的可能性

只需在这个网站上向下滚动一点,就可以找到标题“删除停止词”。有一些使用此模块执行此操作的示例。

库中有一个
stopwords
参数,可与
nltk
一起使用,如下所示:

from slugify import slugify
from nltk.corpus import stopwords

text = 'mygubbi raises $25 mn seed funding from bigbasket co founder others'
print slugify(text, stopwords=stopwords.words('english'))
import nltk

nltk.download()
这将打印:

mygubbi-raises-25-mn-seed-funding-bigbarket-co-founder-others
安装
nltk
后,您可以安装其他语料库,其中一个是
stopwords
。要执行此操作,请按如下方式运行其内置下载实用程序:

from slugify import slugify
from nltk.corpus import stopwords

text = 'mygubbi raises $25 mn seed funding from bigbasket co founder others'
print slugify(text, stopwords=stopwords.words('english'))
import nltk

nltk.download()

选择
Corpora
,向下滚动至
stopwords
,然后单击
下载
按钮