在Python中查找字符串下方（和之间）的单词_Python_Regex

在Python中查找字符串下方（和之间）的单词

python regex

在Python中查找字符串下方（和之间）的单词,python,regex,Python,Regex,我有以下案文： <div style="margin-left:10px;margin-right:10px;">  There are times when I've wondered<br /> And times when I've cried<br /> When my prayers they were answered<br /> At times when I've lied

我有以下案文：

<div style="margin-left:10px;margin-right:10px;">
<!-- start of lyrics -->
There are times when I've wondered<br />
And times when I've cried<br />
When my prayers they were answered<br />
At times when I've lied<br />
But if you asked me a question<br />
Would I tell you the truth<br />
Now there's something to bet on<br />
You've got nothing to lose<br />
<br />
When I've sat by the window<br />
And gazed at the rain<br />
With an ache in my heart<br />
But never feeling the pain<br />
And if you would tell me<br />
Just what my life means<br />
Walking a long road<br />
Never reaching the end<br />
<br />
God give me the answer to my life<br />
God give me the answer to my dreams<br />
God give me the answer to my prayers<br />
God give me the answer to my being
<!-- end of lyrics -->
</div>

您不应该使用正则表达式来解析HTML

看起来您正在抓取一个网站。您可以将和

lxml

与

xpath

一起使用

Python 2.7.5+ (default, Sep 19 2013, 13:48:49) 
>>> html = """<div style="margin-left:10px;margin-right:10px;">
... <!-- start of lyrics -->
... There are times when I've wondered<br />
... And times when I've cried<br />
... When my prayers they were answered<br />
... At times when I've lied<br />
... But if you asked me a question<br />
... Would I tell you the truth<br />
... Now there's something to bet on<br />
... You've got nothing to lose<br />
... <br />
... When I've sat by the window<br />
... And gazed at the rain<br />
... With an ache in my heart<br />
... But never feeling the pain<br />
... And if you would tell me<br />
... Just what my life means<br />
... Walking a long road<br />
... Never reaching the end<br />
... <br />
... God give me the answer to my life<br />
... God give me the answer to my dreams<br />
... God give me the answer to my prayers<br />
... God give me the answer to my being
... <!-- end of lyrics -->
... </div>"""
>>> import lxml.html
>>> html = lxml.html.fromstring(html)
>>> html.text_content()
"\n\nThere are times when I've wondered\nAnd times when I've cried\nWhen my prayers they were answered\nAt times when I've lied\nBut if you asked me a question\nWould I tell you the truth\nNow there's something to bet on\nYou've got nothing to lose\n\nWhen I've sat by the window\nAnd gazed at the rain\nWith an ache in my heart\nBut never feeling the pain\nAnd if you would tell me\nJust what my life means\nWalking a long road\nNever reaching the end\n\nGod give me the answer to my life\nGod give me the answer to my dreams\nGod give me the answer to my prayers\nGod give me the answer to my being\n\n"
>>>

Python 2.7.5+（默认，2013年9月19日，13:48:49）
>>>html=”“”
... 
…有时我会想

…还有我哭过的时候

…我的祈祷得到了回应

…在我撒谎的时候

…但如果你问我一个问题

…我能告诉你真相吗

…现在有东西可以打赌了

…你没有什么可失去的

…

…当我坐在窗前时

…凝视着雨水

…心痛

…但从未感受过痛苦

…如果你能告诉我

…正是我生命的意义

…走一条长长的路

…永远不会到达终点

…

…上帝给我生命的答案

…上帝给我梦想的答案

…上帝给我祈祷的答案

…上帝给我我存在的答案
... 
... """
>>>导入lxml.html
>>>html=lxml.html.fromstring（html）
>>>html.text_content（）
“\n\n有些时候我曾纳闷\n有些时候我曾哭泣\n当我的祈祷得到回应\n有些时候我曾说谎\n但是如果你问我一个问题\n我能告诉你真相\n现在有什么东西可以打赌\n当我坐在窗前\n看着雨\n心里疼痛\n但从未感觉到他痛苦\n如果你能告诉我\n我的生命意味着什么\n走过漫长的路\n永远走到尽头\n\n给我生命的答案\n给我梦想的答案\n给我祈祷的答案\n给我存在的答案\n\n“
>>>

试试这个：

with open(r'<file_path>','r') as file:
        for line in file:
            if  re.match(r'^<', line) == None:
                print line[:line.find(r'<')]

编辑： 使用Url库并从web中提取歌词：

对于HTML代码的这一特定部分，我不明白为什么re.findall不起作用。四行实际代码加上文本可以生成输出

from re import findall

html = """
<div style="margin-left:10px;margin-right:10px;">
<!-- start of lyrics -->
There are times when I've wondered<br />
And times when I've cried<br />
When my prayers they were answered<br />
At times when I've lied<br />
But if you asked me a question<br />
Would I tell you the truth<br />
Now there's something to bet on<br />
You've got nothing to lose<br />
<br />
When I've sat by the window<br />
And gazed at the rain<br />
With an ache in my heart<br />
But never feeling the pain<br />
And if you would tell me<br />
Just what my life means<br />
Walking a long road<br />
Never reaching the end<br />
<br />
God give me the answer to my life<br />
God give me the answer to my dreams<br />
God give me the answer to my prayers<br />
God give me the answer to my being
<!-- end of lyrics -->
</div>
"""

raw = findall(r'.*<br />', html)

for line in raw:
    line = line.strip('<br />')
    print(line)

从重新导入findall
html=”“”
有时候我会想

还有我哭过的时候

我的祈祷得到了回应

在我撒谎的时候

但是如果你问我一个问题

我能告诉你真相吗

现在有东西可以打赌了

你没有什么可失去的



当我坐在窗前时

凝视着雨水

心痛

但永远不要感受到痛苦

如果你能告诉我

正是我生命的意义

走一条长长的路

永远不会到达终点



上帝给我生命的答案

上帝给我梦想的答案

上帝给我祈祷的答案

上帝给我我存在的答案
"""
raw=findall（r'.*
'，html）
对于原始中的行：
line=line.strip（“
”）
打印（行）

re.findall

和

re.search

当然仍然有效，并且在这种情况下也会有效，所以您只是没有使用正确的正则表达式。由于你还没有发布你正在做的事情，这将使人们很难帮助你。对不起，我接受了这个html:view-source:Kewl中的歌词，但你应该在问题中提到这一点。不管怎样，很高兴你找到了解决办法。我的是平面文件。我设计它是为了提高效率。不，我没有找到解决方案：（已解决。非常感谢。抱歉，HTML代码如下：。如何解析？如果加载页面，可以使用以下内容：

page.xpath（'//div[@style=“margin left:10px；margin right:10px；“]）。text\u content（）

。但这已经是一个不同的问题了。请看带有标签和标签的问题

There are times when I've wondered
And times when I've cried
When my prayers they were answered
At times when I've lied
But if you asked me a question
Would I tell you the truth
Now there's something to bet on
You've got nothing to lose
When I've sat by the window
And gazed at the rain
With an ache in my heart
But never feeling the pain
And if you would tell me
Just what my life means
Walking a long road
Never reaching the end
God give me the answer to my life
God give me the answer to my dreams
God give me the answer to my prayers
God give me the answer to my being

from lxml import etree
import urllib, StringIO

# Rip file from URL        
resultado=urllib.urlopen('http://www.azlyrics.com/lyrics/ironmaiden/noprayerforthedying.html')
html = resultado.read()
# Parse html to etree
parser= etree.HTMLParser()
tree=etree.parse(StringIO.StringIO(html),parser)
# Apply the xpath rule
e = tree.xpath("//div[@style='margin-left:10px;margin-right:10px;']/text()")
# print output
for i in e:
    print str(i).strip()

from re import findall

html = """
<div style="margin-left:10px;margin-right:10px;">
<!-- start of lyrics -->
There are times when I've wondered<br />
And times when I've cried<br />
When my prayers they were answered<br />
At times when I've lied<br />
But if you asked me a question<br />
Would I tell you the truth<br />
Now there's something to bet on<br />
You've got nothing to lose<br />
<br />
When I've sat by the window<br />
And gazed at the rain<br />
With an ache in my heart<br />
But never feeling the pain<br />
And if you would tell me<br />
Just what my life means<br />
Walking a long road<br />
Never reaching the end<br />
<br />
God give me the answer to my life<br />
God give me the answer to my dreams<br />
God give me the answer to my prayers<br />
God give me the answer to my being
<!-- end of lyrics -->
</div>
"""

raw = findall(r'.*<br />', html)

for line in raw:
    line = line.strip('<br />')
    print(line)