不同计算机上与NLTK库相关的一个python代码的不同结果
我编写了以下代码,在我的计算机上运行良好,但在其他计算机上返回null。你能帮我解决这个问题吗不同计算机上与NLTK库相关的一个python代码的不同结果,python,nltk,anaconda,Python,Nltk,Anaconda,我编写了以下代码,在我的计算机上运行良好,但在其他计算机上返回null。你能帮我解决这个问题吗 import string import nltk from nltk.tokenize import RegexpTokenizer from nltk.corpus import stopwords def preprocess(sentence): sentence = sentence.lower() specialChrs={'\xc2',''} pattern=
import string
import nltk
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
def preprocess(sentence):
sentence = sentence.lower()
specialChrs={'\xc2',''}
pattern=pattern = r'''(?x) # set flag to allow verbose regexps
([A-Z]\.)+ # abbreviations, e.g. U.S.A.
| \$?\d+%?
| \$?\d+(,|.\d+)*
| \w+([-'/]\w+)* # words w/ optional internal hyphens/apostrophe
|/\m+([-'/]\w+)*
'''
tokenizer = RegexpTokenizer(pattern)
tokens = tokenizer.tokenize(sentence)
print tokens
realToken= [e for e in tokens if len(e)>= 3 and len(e)<10]
stopWords = set(stopwords.words('english'))
stop_words = [w for w in realToken if not w in stopWords]
filtered_words = [w for w in stop_words if not w in specialChrs]
print filtered_words
# final_words = [w for w in filtered_words if not w[0]=='0' and w[1]=='x']
return filtered_words
str='I have one generalized rule, where in shellscript I check for all need packages, if any package does not exist, then install it other wise skip to next check. As I need to check and execute few other python as well shellscripts, I am using it. Is using shellscript for this is bad idea?'
preprocess(str)
导入字符串
导入nltk
从nltk.tokenize导入RegexpTokenizer
从nltk.corpus导入停止词
def预处理(句子):
句子=句子。较低()
specialChrs={'\xc2','}
pattern=pattern=r''(?x)#设置允许详细正则表达式的标志
([A-Z]\)+#缩写,例如美国。
|\$?\d+%?
|\$?\d+(,|.\d+)*
|\w+([-'/]\w+*#带可选内部连字符/撇号的单词
|/\m+([-'/]\w+)*
'''
标记器=RegexpTokenizer(模式)
tokens=tokenizer.tokenize(句子)
打印代币
realToken=[e表示令牌中的e,如果len(e)>=3且len(e)您的问题得到了回答
您需要以这种方式更改正则表达式,以解决您的问题
`pattern = r'''(?x) # set flag to allow verbose regexps
(?:[A-Z]\.)+ # abbreviations, e.g. U.S.A.
| \$?\d+(?:\.\d+)?%?
| \w+(?:-\w+)* # words with optional internal hyphens
|/\m+(?:[-'/]\w+)*
'''`
在windows系统(您的朋友)上,字符串的处理方式不同。@alvas linux2怎么样,因为我对它进行了测试。我该怎么办?