Python2.7在数字之前选择列表中的单词_Python_List_Select

Python2.7在数字之前选择列表中的单词

python list select

Python2.7在数字之前选择列表中的单词,python,list,select,Python,List,Select,我有一个文本文件a.txt，其中包含： Hydrocortisone 10 MG/ML Topical Cream Tretinoin 0.25 MG/ML Topical Cream Benzoyl Peroxide 50 MG/ML Topical Lotion Ketoconazole 20 MG/ML Medicated Shampoo etc 我需要一种方法来选择第一个数字之前的任何单词，并将它们写入另一个文件b.txt： Hydrocortisone Tretinoin Benz

我有一个文本文件a.txt，其中包含：

Hydrocortisone 10 MG/ML Topical Cream
Tretinoin 0.25 MG/ML Topical Cream
Benzoyl Peroxide 50 MG/ML Topical Lotion
Ketoconazole 20 MG/ML Medicated Shampoo
etc

我需要一种方法来选择第一个数字之前的任何单词，并将它们写入另一个文件b.txt：

Hydrocortisone
Tretinoin 
Benzoyl Peroxide
Ketoconazole
etc

我对如何在文件中查找和替换有一个基本的想法，但是对python的掌握非常有限，这几乎是可笑的，所以我最初的想法是

infile = open('a.txt')
outfile = open('b.txt', 'w')
replacements = {'1':'', '2':'' up to twenty and then a list based on words commonly occuring after the numbers such as 'topical':'' etc}
for line in infile:
for src, target in replacements.iteritems():
line = line.replace(src, target)
outfile.write(line)
infile.close()
outfile.close()

但所要做的就是删除“替换”中指定的内容。有数千种变体，所以我无法全部列出

抱歉没有说得更清楚，谢谢您的帮助

为什么不循环并使用

isdigit（）

确定第一个数字？比如：

writef = open('b.txt', 'w')
with open('a.txt') as f:
    while True:
        line = f.readline()
        if not line:
            break
        words = line.split()
        for i in range(len(words)):
            if words[i].replace('.', '').isdigit():
                writef.write(words[i-1] + '\n')
                continue
writef.close()

尝试此操作，它将根据编号进行拆分，并获得名称部分：

import re

exp = re.compile(r'(\d+\.?\d+)')

with open('mainfile.txt') as f, open('names.txt','w') as out:
   for line in f:
      line = line.strip()
      if len(line):
           try:
               out.write('{}\n'.format(re.split(exp, line)[0].strip()))
           except:
               print('Could not parse {}'.format(line))

正则表达式

\d+\.？\d+

表示：

```
\d+
```
一个或多个数字
```
\.？
```
一个可选的
（注意正则表达式中
有特殊的含义，所以当我们指的是文字
时，我们将其转义）
```
\d+
```
后跟一个或多个数字

它周围的

（）

使它成为一个捕获组；其结果如下：

>>> x = r'(\d+\.?\d+)'
>>> l = 'Benzoyl Peroxide 50 MG/ML Topical Lotion'
>>> re.split(x, l)
['Benzoyl Peroxide ', '50', ' MG/ML Topical Lotion']

到目前为止您尝试了什么？抱歉，StackOverflow不是这样工作的。你必须努力研究，发布你尝试过的代码，解释为什么它不起作用，并提出一个特定的问题。regex可以做到这一点，但对于这个特定的案例来说，它可能有点高级regex人regex Brilliant，它工作得很好。非常感谢你们的解释，它真的帮助了我的学习。嗨，对不起，我必须做一些密集的事情，因为我得到了SyntaxError:“打破”外部循环。对不起，谢谢你的邀请help@lobe哦，那是我的错，忘了添加

，而：）