Python 如果单词后面或前面没有<；或>；_Python_Regex

Python 如果单词后面或前面没有<；或>；

python regex

Python 如果单词后面或前面没有<；或>；,python,regex,Python,Regex,我试图不匹配后面或前面有XML标记的单词 import re strTest = "<random xml>hello this was successful price<random xml>" for c in re.finditer(r'(?<![<>])(\b\w+\b)(?<!=[<>])(\W+)',strTest): c1 = c.group(1) c2 = c.group(2) if (

我试图不匹配后面或前面有XML标记的单词

import re

strTest = "<random xml>hello this was successful price<random xml>"

for c in re.finditer(r'(?<![<>])(\b\w+\b)(?<!=[<>])(\W+)',strTest):
     c1 = c.group(1)
     c2 = c.group(2)
     if ('<' != c2[0]) and ('<' != c.group(1)[len(c.group(1))-1]):
          print c1

通缉结果：

this
was
successful

我一直在尝试消极前瞻和消极前瞻断言。我不确定这是否是正确的方法，我将非常感谢您的帮助。

首先，直接回答您的问题：

我通过检查每个由一系列字符组成的“单词”，这些字符包含（主要）字母或“”。当正则表达式将它们提供给

some_only

时，我会查找后两个字符中的一个。如果两者都没有出现，我打印“单词”

>>> import re
>>> strTest = "<random xml>hello this was successful price<random xml>"
>>> def some_only(matchobj):
...     if '<' in matchobj.group() or '>' in matchobj.group():
...         pass
...     else:
...         print (matchobj.group())
...         pass
... 
>>> ignore = re.sub(r'[<>\w]+', some_only, strTest)
this
was
successful

我要试一试。由于我们已经做了不止一个正则表达式，请将其放入列表，并删除第一个/最后一个项目：

import re

strTest = "<random xml>hello this was successful price<random xml>"

thelist = []

for c in re.finditer(r'(?<![<>])(\b\w+\b)(?<!=[<>])(\W+)',strTest):
     c1 = c.group(1)
     c2 = c.group(2)
     if ('<' != c2[0]) and ('<' != c.group(1)[len(c.group(1))-1]):
          thelist.append(c1)

thelist = thelist[1:-1]

print (thelist)

我个人会尝试解析XML，但由于您已经编写了这段代码，因此稍作修改就可以做到这一点。

一种简单的方法是使用列表，但我假设后面或前面的单词后面有XML标记，而正确的标记之间没有空格：

test = "<random xml>hello this was successful price<random xml>"

test = test.split()

new_test = []
for val in test:
  if "<" not in val and ">" not in val:
   new_test.append(val)

print(new_test)

我的灵魂。。。我认为根本不需要使用

regex

，您可以在一行列表中解决它：

words = [w for w in test.split() if "<" not in w and ">" not in w]

words=[w代表test.split（）中的w，如果“”不在w中]

您不需要使用正则表达式来解析XML。曾经使用XML解析器。Python有一个。或安装。。使用XML解析器。可以是：匹配你不想要的，但你需要的。我非常喜欢这个解决方案，但我只想使用stdlib。如何使用xml.etree.ElementTree实现这一点。顺便说一句，我运行的是Python2.7。@Bman425，基本相同<代码>导入xml.etree.ElementTree作为ET；tree=ET.fromstring（strTest）；打印tree.text.split（“”）[1:-1]顺便说一句，这里可能需要做一些工作来提高这个答案的适用性——从树上下来寻找元素并合并

.tail

以及

.text

，例如；OP的样本输入明显不符合其实际意图。同意。我担心的是，这可能很容易超出OP的技能水平。事实上，简单的问题，简单的答案。这对我举的例子来说很有效，但我担心它不能很好地扩展。我同意我应该尝试使用XML解析器。

['this', 'was', 'successful']

test = "<random xml>hello this was successful price<random xml>"

test = test.split()

new_test = []
for val in test:
  if "<" not in val and ">" not in val:
   new_test.append(val)

print(new_test)

['this', 'was', 'successful']

words = [w for w in test.split() if "<" not in w and ">" not in w]