不同计算机上与NLTK库相关的一个python代码的不同结果

不同计算机上与NLTK库相关的一个python代码的不同结果,python,nltk,anaconda,Python,Nltk,Anaconda,我编写了以下代码,在我的计算机上运行良好,但在其他计算机上返回null。你能帮我解决这个问题吗 import string import nltk from nltk.tokenize import RegexpTokenizer from nltk.corpus import stopwords def preprocess(sentence): sentence = sentence.lower() specialChrs={'\xc2',''} pattern=

我编写了以下代码,在我的计算机上运行良好,但在其他计算机上返回null。你能帮我解决这个问题吗

import string
import nltk
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords

def preprocess(sentence):
    sentence = sentence.lower()
    specialChrs={'\xc2',''} 
    pattern=pattern = r'''(?x)               # set flag to allow verbose regexps
              ([A-Z]\.)+         # abbreviations, e.g. U.S.A.
              | \$?\d+%?
              | \$?\d+(,|.\d+)*
              | \w+([-'/]\w+)*    # words w/ optional internal hyphens/apostrophe
              |/\m+([-'/]\w+)*
            '''
    tokenizer = RegexpTokenizer(pattern)
    tokens = tokenizer.tokenize(sentence)
    print tokens
    realToken= [e for e in tokens if  len(e)>= 3 and len(e)<10]
    stopWords = set(stopwords.words('english'))
    stop_words = [w for w in realToken if not w in stopWords]
    filtered_words = [w for w in stop_words if not w in specialChrs]
    print filtered_words
   # final_words = [w for w in filtered_words if not w[0]=='0' and w[1]=='x']
    return filtered_words


str='I have one generalized rule, where in shellscript I check for all need packages, if any package does not exist, then install it other wise skip to next check. As I need to check and execute few other python as well shellscripts, I am using it. Is using shellscript for this is bad idea?'
preprocess(str)
导入字符串
导入nltk
从nltk.tokenize导入RegexpTokenizer
从nltk.corpus导入停止词
def预处理(句子):
句子=句子。较低()
specialChrs={'\xc2','}
pattern=pattern=r''(?x)#设置允许详细正则表达式的标志
([A-Z]\)+#缩写,例如美国。
|\$?\d+%?
|\$?\d+(,|.\d+)*
|\w+([-'/]\w+*#带可选内部连字符/撇号的单词
|/\m+([-'/]\w+)*
'''
标记器=RegexpTokenizer(模式)
tokens=tokenizer.tokenize(句子)
打印代币
realToken=[e表示令牌中的e,如果len(e)>=3且len(e)您的问题得到了回答

您需要以这种方式更改正则表达式,以解决您的问题

`pattern = r'''(?x)          # set flag to allow verbose regexps
            (?:[A-Z]\.)+        # abbreviations, e.g. U.S.A.
         | \$?\d+(?:\.\d+)?%?
         | \w+(?:-\w+)*        # words with optional internal hyphens
         |/\m+(?:[-'/]\w+)*
      '''`

在windows系统(您的朋友)上,字符串的处理方式不同。@alvas linux2怎么样,因为我对它进行了测试。我该怎么办?