Python 停止删除无法正常工作的单词
知道为什么停止字删除不能正常工作吗?它错误地替换了内容,有时将say a替换为an,或者不能将Python 停止删除无法正常工作的单词,python,regex,string,split,stop-words,Python,Regex,String,Split,Stop Words,知道为什么停止字删除不能正常工作吗?它错误地替换了内容,有时将say a替换为an,或者不能将视为单个单词 stop_words=open("stopwords.txt") stop_words=stop_words.read().split("\n") print stop_words for line in splitted_tweets: #print line #print "***************************************" if
视为单个单词
stop_words=open("stopwords.txt")
stop_words=stop_words.read().split("\n")
print stop_words
for line in splitted_tweets:
#print line
#print "***************************************"
if (line.__contains__("text='")):
start_index=line.index("text='")+6
end_index=line.index("',", start_index)
tweet=line[start_index:end_index]
print tweet
print "**********"
tweet_words = re.sub("[^\w]", " " , tweet).split()
print tweet_words
for word in stop_words:
if word in tweet_words:
print word
tweet=tweet.replace(word, "")
print "?????????????????????????"
print tweet
以下是一些示例输出:
['RT', 'sayingsforgirls', 'Do', 'not', 'touch', 'MY', 'iPhone', 'It', 's', 'not', 'an', 'usPhone', 'it', 's', 'not', 'a', 'wePhone', 'it', 's', 'not', 'an', 'ourPhone', 'it', 's', 'an', 'iPhone']
a
an
it
not
?????????????????????????
RT @syingsforgirls: Do touch MY iPhone. It's n usPhone, 's wePhone, 's n ourPhone, 's n iPhone.
Do not touch MY iPhone. It's not an usPhone, it's not a wePhone, it's not an ourPhone, it's an iPhone.
**********
['Do', 'not', 'touch', 'MY', 'iPhone', 'It', 's', 'not', 'an', 'usPhone', 'it', 's', 'not', 'a', 'wePhone', 'it', 's', 'not', 'an', 'ourPhone', 'it', 's', 'an', 'iPhone']
a
an
it
not
?????????????????????????
Do touch MY iPhone. It's n usPhone, 's wePhone, 's n ourPhone, 's n iPhone.
RT @BrianaaSymonee: she says imma dog, but it takes one to know one...
**********
['RT', 'BrianaaSymonee', 'she', 'says', 'imma', 'dog', 'but', 'it', 'takes', 'one', 'to', 'know', 'one']
but
it
she
to
?????????????????????????
RT @BrianaaSymonee: says imma dog, takes one know one...
she says imma dog, but it takes one to know one...
**********
略为O/T,但行。uuu包含
通常会写入“text=””行中
。谢谢,但该行正在工作!。。。我没有说它不是,但是如果你遵循你所使用的语言的惯例,它会使你的代码更容易阅读、理解和调试。请看一看,并将其精简为一个。你期望得到什么结果?我是@jornsharpe的。如果你想得到一个正确的答案,你需要解释到底是什么问题。@Kasramvd如果你看看我提到的问题,比如“it's”,它把它分成了“it”,“s”,这是不正确的。我只想说一个字。