如何使用BeautifulSoup获取Python中的特定内容?

如何使用BeautifulSoup获取Python中的特定内容?,python,web,beautifulsoup,screen-scraping,Python,Web,Beautifulsoup,Screen Scraping,我是Python新手,我正在用Python和BeautifulSoup编写一个小scraper,以便从网页中获取地址。我已经附上了它的照片 我使用BeautifulSoup获得了完整内容,但我不知道如何提取“完整地址”的内容。我看到它在“div”里,但我不知道下一步该怎么办 links=soup.find_all('div') 非常感谢 您可以使用解析数据: #!/usr/bin/env python from bs4 import BeautifulSoup import json

我是Python新手,我正在用Python和BeautifulSoup编写一个小scraper,以便从网页中获取地址。我已经附上了它的照片


我使用BeautifulSoup获得了完整内容,但我不知道如何提取“完整地址”的内容。我看到它在“div”里,但我不知道下一步该怎么办

links=soup.find_all('div')

非常感谢

您可以使用解析数据:

#!/usr/bin/env python 

from bs4 import BeautifulSoup
import json

data = '''
</div>
    </div>
    <div data-integration-name="redux-container" data-payload='{"name":"LocationsMapList","props":{"locations":[{"id":17305,"company_id":106906,"description":"","city":"New York","country":"United States","address":"5 Crosby St  3rd Floor","state":"New York","region":"","latitude":40.719753,"longitude":-74.0001954,"hq":true,"created_at":"2015-01-19T01:32:16.317Z","updated_at":"2016-05-05T07:57:19.282Z","zip_code":"10013","country_code":"US","full_address":"5 Crosby St  3rd Floor, New York, 10013, New York, USA","dirty":false,"to_params":"new-york-us"}]},"storeName":null}' data-rwr-element="true">
'''

soup = BeautifulSoup(data, 'html.parser')
for i in soup.find_all('div', attrs={'data-integration-name':'redux-container'}):
    info = json.loads(i.get('data-payload'))
    for i in info['props']['locations']:
        print i['address']
#/usr/bin/env python
从bs4导入BeautifulSoup
导入json
数据=“”
'''
soup=BeautifulSoup(数据'html.parser')
对于汤中的i.find_all('div',attrs={'data-integration-name':'redux-container'}):
info=json.load(i.get('data-payload'))
对于信息中的i['props']['locations']:
打印i[“地址”]

(请将代码添加为文本而不是图片)我添加了它。谢谢
'data-payload'
属性是json,因此如果您不熟悉html-,请使用
json.loads
。有一些好的基础知识-如果你在阅读时牢记你的问题,你可能会开始看到解决方案。你可能还想花一些时间来了解你可以使用的工具。请仔细阅读,上面写着:KeyError:“locations”@Laura为了让我的解决方案发挥作用,我假设您试图解析的数据与您文章中的数据完全相同。您的数据是否与您发布的错误相同?您的数据中似乎不存在“位置”?此外,要查看可以执行的键,请将其限制为具有
数据集成name=“redux container”
属性的
div
。@Laura的答案更多地是为了演示如何使用
json
,以及如何使用它来解析此类数据。现在,您应该能够进行一些研究,并将其应用于不同的数据。此外,如评论部分所述,您应该非常仔细地阅读docs@coder,它仍然显示相同的错误。但是非常感谢!我将返回并阅读bs4文档。
#!/usr/bin/env python 

from bs4 import BeautifulSoup
import json

data = '''
</div>
    </div>
    <div data-integration-name="redux-container" data-payload='{"name":"LocationsMapList","props":{"locations":[{"id":17305,"company_id":106906,"description":"","city":"New York","country":"United States","address":"5 Crosby St  3rd Floor","state":"New York","region":"","latitude":40.719753,"longitude":-74.0001954,"hq":true,"created_at":"2015-01-19T01:32:16.317Z","updated_at":"2016-05-05T07:57:19.282Z","zip_code":"10013","country_code":"US","full_address":"5 Crosby St  3rd Floor, New York, 10013, New York, USA","dirty":false,"to_params":"new-york-us"}]},"storeName":null}' data-rwr-element="true">
'''

soup = BeautifulSoup(data, 'html.parser')
for i in soup.find_all('div', attrs={'data-integration-name':'redux-container'}):
    info = json.loads(i.get('data-payload'))
    for i in info['props']['locations']:
        print i['address']