Python 正则表达式函数的替代方法

Python 正则表达式函数的替代方法,python,regex,list,indexing,append,Python,Regex,List,Indexing,Append,这是我的密码 import re with open('newfiles.txt') as f: k = f.read() p = re.compile(r'\w+|[^\w\-\s]') originaltext = p.findall(k) uniquelist = [] for word in originaltext: if word not in uniquelist: uniquelist.append(word) indexes = ' '.join(st

这是我的密码

import re
with open('newfiles.txt') as f:
   k = f.read()
p = re.compile(r'\w+|[^\w\-\s]')
originaltext = p.findall(k)
uniquelist = []
for word in originaltext:
   if word not in uniquelist:
       uniquelist.append(word)
indexes = ' '.join(str(uniquelist.index(word)+1) for word in originaltext)
print('Here are the index positions of the text file : ' + indexes)
它获取一个文本文件(两个带有标点符号的随机句子),然后输出每个单词/标点符号出现的位置。如果某个位置重复两次,则显示第一个出现的位置。在本程序中,标点符号被视为单个单词

我试着玩代码,然后试着简化它。使用regex函数,我只需要两行代码来查找和分隔单词和标点符号,因此非常高效。但是,有人知道一种比使用正则表达式更简单、更简单的方法吗?如果我问你,如果你回答了,请不要改变代码的其他部分,只是用另一种方法来实现相同的功能(显示单词的索引),而不是使用正则表达式。很明显,时间会更长,所以这无关紧要

newfiles.txt

Parkour, also known as freerunning, is a relatively new sport founded by Sebastian Foucan, who showed off his skills in the James Bond movie "Casino Royale", which was released in 2006. Parkour is running, jumping over obstacles, or climbing over buildings and walls.
It is daring, breathtaking and at times terrifying, and now it is also an official sport in the UK, making the UK the first country in the world to recognise it. This means that people can teach parkour in schools.
Some people are worried about the sport being too dangerous, but the founder says that it is as safe as any sport, comparing to rugby, wrestling, surfing or climbing, but, - if you do not do it in the right way, you can get hurt.
Here are the index positions of the text file : 1 2 3 4 5 6 2 7 8 9 10 11 12 13 14 15 2 16 17 18 19 20 21 22 23 24 25 26 27 28 26 2 29 30 31 21 32 33 1 7 34 2 35 36 37 2 38 39 36 40 41 42 33 43 7 44 2 45 41 46 47 48 2 41 49 50 7 3 51 52 11 21 22 53 2 54 22 53 22 55 56 21 22 57 58 59 50 33 60 61 62 63 64 65 66 21 67 33 68 63 69 70 71 22 11 72 73 74 2 75 22 76 77 62 50 7 5 78 5 79 11 2 80 58 81 2 82 2 83 38 39 2 75 2 84 85 86 87 86 50 21 22 88 89 2 85 64 90 91 33
输出

Parkour, also known as freerunning, is a relatively new sport founded by Sebastian Foucan, who showed off his skills in the James Bond movie "Casino Royale", which was released in 2006. Parkour is running, jumping over obstacles, or climbing over buildings and walls.
It is daring, breathtaking and at times terrifying, and now it is also an official sport in the UK, making the UK the first country in the world to recognise it. This means that people can teach parkour in schools.
Some people are worried about the sport being too dangerous, but the founder says that it is as safe as any sport, comparing to rugby, wrestling, surfing or climbing, but, - if you do not do it in the right way, you can get hurt.
Here are the index positions of the text file : 1 2 3 4 5 6 2 7 8 9 10 11 12 13 14 15 2 16 17 18 19 20 21 22 23 24 25 26 27 28 26 2 29 30 31 21 32 33 1 7 34 2 35 36 37 2 38 39 36 40 41 42 33 43 7 44 2 45 41 46 47 48 2 41 49 50 7 3 51 52 11 21 22 53 2 54 22 53 22 55 56 21 22 57 58 59 50 33 60 61 62 63 64 65 66 21 67 33 68 63 69 70 71 22 11 72 73 74 2 75 22 76 77 62 50 7 5 78 5 79 11 2 80 58 81 2 82 2 83 38 39 2 75 2 84 85 86 87 86 50 21 22 88 89 2 85 64 90 91 33

谢谢

写“可读代码”真的很难。我仍然不明白为什么要这样做,但这是一个很好的挑战:)我情不自禁地改变了你构建独特集合的方式(使用OrderedICT):


写“可读代码”真的很难。我仍然不明白为什么要这样做,但这是一个很好的挑战:)我情不自禁地改变了你构建独特集合的方式(使用OrderedICT):


请提供
newfiles.txt
中的一些示例数据。包括当前输入和输出。@请检查问题-我已添加it@MYGz我想你可以把它作为引语读得更清楚,但没关系:)出于好奇,你为什么反对正则表达式?这是迄今为止解决此问题最有效的方法,而且可以说它比非正则表达式解决方案“更简单”。请提供
newfiles.txt
中的一些示例数据。包括当前的输入和输出。@ppasler检查问题-我已添加it@MYGz我想你可以把它当作引语读得更好,但别担心:)出于好奇,你为什么反对regex?到目前为止,它是解决这个问题的最有效的方法,而且可以说它比非正则表达式解决方案“更简单”。