Python 无法从某些html元素中提取某些地址_Python_Python 3.x_Web Scraping_Beautifulsoup

Python 无法从某些html元素中提取某些地址

python python-3.x web-scraping

Python 无法从某些html元素中提取某些地址,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我用python编写了一个脚本，从html元素块中提取地址。地址位于两个br标记内。但是，当我运行脚本时，我会将此[，，，，]作为输出如何获取完整地址？我试图从中收集地址的html元素： <div class="ACA_TabRow ACA_FLeft"> Mailing <br/> 1961 MAIN ST #186 <br/> WATSONVILLE, CA, 95076 <br/> United States <br

我用python编写了一个脚本，从html元素块中提取地址。地址位于两个

br

标记内。但是，当我运行脚本时，我会将此

[
，
，
，
，
]

作为输出

如何获取完整地址？

我试图从中收集地址的html元素：

<div class="ACA_TabRow ACA_FLeft">
 Mailing
 <br/>
 1961 MAIN ST #186
 <br/>
 WATSONVILLE, CA, 95076
 <br/>
 United States
 <br/>
</div>


邮寄


1961年主街186号


沃特森维尔，加利福尼亚州，95076


美国

到目前为止，我已经尝试过：

from bs4 import BeautifulSoup
import re

html = """
<div class="ACA_TabRow ACA_FLeft">
 Mailing
 <br/>
 1961 MAIN ST #186
 <br/>
 WATSONVILLE, CA, 95076
 <br/>
 United States
 <br/>
</div>
"""
soup = BeautifulSoup(html,"lxml")
items = soup.find(class_="ACA_TabRow").find(string=re.compile("Mailing")).find_next_siblings()
print(items)

从bs4导入美化组
进口稀土
html=”“”
邮寄


1961年主街186号


沃特森维尔，加利福尼亚州，95076


美国


"""
soup=BeautifulSoup（html，“lxml”）
items=soup.find（class=“ACA\u TabRow”）.find（string=re.compile（“Mailing”）.find\u next\u同胞（）
打印（项目）

我将通过

邮件
soup = BeautifulSoup(html,"lxml")
items = soup.find(class_="ACA_TabRow")

for i,item in enumerate(items.stripped_strings):
    if i==0 and not item.startswith('Mailing'):
        break
    if i!=0:
        print(item)

输出
1961 MAIN ST #186
WATSONVILLE, CA, 95076
United States

我将继续检查div中的剥离字符串是否开始使用邮件
soup = BeautifulSoup(html,"lxml")
items = soup.find(class_="ACA_TabRow")

for i,item in enumerate(items.stripped_strings):
    if i==0 and not item.startswith('Mailing'):
        break
    if i!=0:
        print(item)

输出
1961 MAIN ST #186
WATSONVILLE, CA, 95076
United States

看来我找到了更好的解决方案：
from bs4 import BeautifulSoup
import re

html = """
<div class="ACA_TabRow ACA_FLeft">
 Mailing
 <br/>
 1961 MAIN ST #186
 <br/>
 WATSONVILLE, CA, 95076
 <br/>
 United States
 <br/>
</div>
"""
soup = BeautifulSoup(html,"lxml")
items = soup.find(class_="ACA_TabRow").find(string=re.compile("Mailing")).find_parent()
find_text = ' '.join([item.strip() for item in items.strings])
print(find_text)

看来我找到了更好的解决方案：
from bs4 import BeautifulSoup
import re

html = """
<div class="ACA_TabRow ACA_FLeft">
 Mailing
 <br/>
 1961 MAIN ST #186
 <br/>
 WATSONVILLE, CA, 95076
 <br/>
 United States
 <br/>
</div>
"""
soup = BeautifulSoup(html,"lxml")
items = soup.find(class_="ACA_TabRow").find(string=re.compile("Mailing")).find_parent()
find_text = ' '.join([item.strip() for item in items.strings])
print(find_text)