不同计算机上与NLTK库相关的一个python代码的不同结果_Python_Nltk_Anaconda

不同计算机上与NLTK库相关的一个python代码的不同结果

python anaconda

不同计算机上与NLTK库相关的一个python代码的不同结果,python,nltk,anaconda,Python,Nltk,Anaconda,我编写了以下代码，在我的计算机上运行良好，但在其他计算机上返回null。你能帮我解决这个问题吗 import string import nltk from nltk.tokenize import RegexpTokenizer from nltk.corpus import stopwords def preprocess(sentence): sentence = sentence.lower() specialChrs={'\xc2',''} pattern=

我编写了以下代码，在我的计算机上运行良好，但在其他计算机上返回null。你能帮我解决这个问题吗

import string
import nltk
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords

def preprocess(sentence):
    sentence = sentence.lower()
    specialChrs={'\xc2',''} 
    pattern=pattern = r'''(?x)               # set flag to allow verbose regexps
              ([A-Z]\.)+         # abbreviations, e.g. U.S.A.
              | \$?\d+%?
              | \$?\d+(,|.\d+)*
              | \w+([-'/]\w+)*    # words w/ optional internal hyphens/apostrophe
              |/\m+([-'/]\w+)*
            '''
    tokenizer = RegexpTokenizer(pattern)
    tokens = tokenizer.tokenize(sentence)
    print tokens
    realToken= [e for e in tokens if  len(e)>= 3 and len(e)<10]
    stopWords = set(stopwords.words('english'))
    stop_words = [w for w in realToken if not w in stopWords]
    filtered_words = [w for w in stop_words if not w in specialChrs]
    print filtered_words
   # final_words = [w for w in filtered_words if not w[0]=='0' and w[1]=='x']
    return filtered_words


str='I have one generalized rule, where in shellscript I check for all need packages, if any package does not exist, then install it other wise skip to next check. As I need to check and execute few other python as well shellscripts, I am using it. Is using shellscript for this is bad idea?'
preprocess(str)

导入字符串
导入nltk
从nltk.tokenize导入RegexpTokenizer
从nltk.corpus导入停止词
def预处理（句子）：
句子=句子。较低（）
specialChrs={'\xc2'，'}
pattern=pattern=r''（？x）#设置允许详细正则表达式的标志
（[A-Z]\）+#缩写，例如美国。
|\$？\d+%？
|\$？\d+（，|.\d+）*
|\w+（[-'/]\w+*#带可选内部连字符/撇号的单词
|/\m+（[-'/]\w+）*
'''
标记器=RegexpTokenizer（模式）
tokens=tokenizer.tokenize（句子）
打印代币
realToken=[e表示令牌中的e，如果len（e）>=3且len（e）您的问题得到了回答
您需要以这种方式更改正则表达式，以解决您的问题
`pattern = r'''(?x)          # set flag to allow verbose regexps
            (?:[A-Z]\.)+        # abbreviations, e.g. U.S.A.
         | \$?\d+(?:\.\d+)?%?
         | \w+(?:-\w+)*        # words with optional internal hyphens
         |/\m+(?:[-'/]\w+)*
      '''`

在windows系统（您的朋友）上，字符串的处理方式不同。@alvas linux2怎么样，因为我对它进行了测试。我该怎么办？