Regex 如何排除包含连字符的行？Python（3.6）RE_Regex_Python 3.x_Hyphen

Regex 如何排除包含连字符的行？Python（3.6）RE

regex python-3.x

Regex 如何排除包含连字符的行？Python（3.6）RE,regex,python-3.x,hyphen,Regex,Python 3.x,Hyphen,从输入中，我想打印出以小写字母（hum）开头，以0001结尾的行。此外，我想排除那些打印中包含连字符的行（从当前输出中排除最后两行）。我的猜测是在正则表达式中包含[^-\s]，这意味着排除带有连字符的行，但它不起作用输入 humabddd001 humhudiwhde001 rehfhfepfhfpehr001oifdjv iurehfehofewoh001 jfeijjjrefoefojrefoj001 humfiowhewiwihowfhiowfeoewo991 hum0001ofejof

从输入中，我想打印出以小写字母（hum）开头，以0001结尾的行。此外，我想排除那些打印中包含连字符的行（从当前输出中排除最后两行）。我的猜测是在正则表达式中包含[^-\s]，这意味着排除带有连字符的行，但它不起作用

输入

humabddd001
humhudiwhde001
rehfhfepfhfpehr001oifdjv
iurehfehofewoh001
jfeijjjrefoefojrefoj001
humfiowhewiwihowfhiowfeoewo991
hum0001ofejofrjwoorejfoejfo001
foiwejowjfojfojwofwofjew9032i92i
humifhihweoowhefiwofowfo001
Humerfhofrorr001
HUmhuhdeowhdoewh000001
HUMwifoiewjow001
0001fhdisuhum
hUmfhweoofhwfoh001
humhum001hum
humhumhufih001
humifwje001001
hum30204-439-0942-4029-0001
humouio--hohohoho0001

我的代码

import re
hand = open('D:/Python/Test.txt')
x = hand
for j in x:
     h = re.findall('hum.*\S+001+$',j)
#    h = re.findall('hum+\S+001+$',j)
     if(len(h)>0):
          print(h)

我的当前输出

['humabddd001']
['humhudiwhde001']
['hum0001ofejofrjwoorejfoejfo001'] 
['humifhihweoowhefiwofowfo001']
['humhumhufih001']
['humifwje001001']
['hum30204-439-0942-4029-0001']
['humouio--hohohoho0001']

使用此regexp:

^hum[^-]*001$

输出：

['humabddd001']
['humhudiwhde001']
['hum0001ofejofrjwoorejfoejfo001']
['humifhihweoowhefiwofowfo001']
['humhumhufih001']
['humifwje001001']

正如@Patrick Haugh所说，这不需要正则表达式。正确使用

startswith

，

endswith

和

not in

将是完美的。

我在这里根本不会使用正则表达式。您的需求完全属于现有的字符串方法，并且不够复杂，不需要正则表达式

with open('Test.txt') as f:
    for line in f:
        line = line.rstrip()
        if line.startswith('hum') and line.endswith('001') and '-' not in line:
            print(line)

印刷品

humabddd001
humhudiwhde001
hum0001ofejofrjwoorejfoejfo001
humifhihweoowhefiwofowfo001
humhumhufih001
humifwje001001

问题是，您正在将求反字符类

[^-\s]

添加到已包含

的模式中，该模式是一种贪婪的点模式，与任何字符（换行符除外）匹配
*\S+
将匹配除换行符以外的任何字符，然后匹配最后一个非空白字符（在这种情况下，
\S
之后的
+
是多余的）
另一个问题是
re.findall
在字符串中的任何位置搜索匹配项，但您只需要在行的开头搜索匹配项。因此，您需要在模式开始处添加
^
锚定，或者使用
re.match
方法
以下是针对您的方法的修复方法：

results = [j for j in x if re.search(r'^hum[^-\s]*001$', j)] # => ['humabddd001', 'humhudiwhde001', 'hum0001ofejofrjwoorejfoejfo001', 'humifhihweoowhefiwofowfo001', 'humhumhufih001', 'humifwje001001']
请参阅和
详细信息

^
-字符串的开头

hum
-文字子字符串

[^-\s]*
-0或更多
-
或空白字符

001
-a
001
literal子字符串

$
-字符串结束

正如Patricks所说，除非您想轻松处理所有Unicode空格，否则您并不真正需要正则表达式。在这种情况下，您可以使用

no_regex_results = [j for j in x if j.startswith('hum') and j.endswith('001') and '-' not in j and ' ' not in j]

它稍长一点，不处理Unicode空格。
这不是一个真正的正则表达式问题：
如果line.startswith（'hum'）和line.endswith（'001'）和“-”不在line:print（line）

no_regex_results = [j for j in x if j.startswith('hum') and j.endswith('001') and '-' not in j and ' ' not in j]