Python 美丽的汤-在带字符串的标签中找到标签?第N个孩子?
我在使用下面的HTML刮片时遇到了一些问题Python 美丽的汤-在带字符串的标签中找到标签?第N个孩子?,python,web-scraping,beautifulsoup,python-requests,Python,Web Scraping,Beautifulsoup,Python Requests,我在使用下面的HTML刮片时遇到了一些问题 res = <div class="gunDetails"> <h4>Specifications</h4> <ul class="features"> <li><label>Make:</label><span itemprop="brand">Gamo</span></li>
res = <div class="gunDetails">
<h4>Specifications</h4>
<ul class="features">
<li><label>Make:</label><span itemprop="brand">Gamo</span></li>
<li><label>Model:</label><span itemprop="model">Coyote Black Tactical</span></li>
<li><label>Licence:</label><span>No Licence</span></li>
<li><label>Orient.:</label><span>Ambidextrous</span></li>
<li><label>Scope:</label><span>Unknown 3-9x32</span></li>
<li><label>Origin:</label><span>Spanish</span></li>
<li><label>Cased:</label><span>Other</span></li>
<li><label>Trigger:</label><span>1</span></li>
<li><label>Condition:</label><span itemprop="itemCondition">Used</span></li>
</ul>
</div>
输出
Gamo
Coyote Black Tactical
No License
Ambidextrous
Unknown 3-9x32
Spanish
Other
1
Used
是否可以为每个标签文本创建一个变量?
什么样的
gun_make = gun_details.findAll('label', String="Make:")
print(gun_make).text
这是完整的代码:
from bs4 import BeautifulSoup
import requests
import csv
all_links=[]
labels = []
spans = []
url="https://www.guntrader.uk/dealers/redcar/spencers-sporting-guns/guns?page={}"
for page in range(1,3):
res=requests.get(url.format(page)).text
soup=BeautifulSoup(res,'html.parser')
for link in soup.select('a[href*="/dealers/redcar/spencers-sporting-guns/guns/shotguns"]'):
all_links.append("https://www.guntrader.uk" + link['href'])
print(len(all_links))
for a_link in all_links:
res = requests.get(a_link).text
soup = BeautifulSoup(res, 'html.parser')
gun_details = soup.select('div.gunDetails')
for l in gun_details.select('label'):
labels.append(l.text.replace(':',''))
for s in gun_details.select('span'):
spans.append(s.text)
my_dict = dict(zip(labels, spans))
with open('gundealer.csv','w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=None)
for key in mydict.keys():
csvfile.write(f"{key},{my_dict[key]}\n")
本节似乎独立工作,给出了正确的(ish)输出:
对于输出:
Make: Gamo
但是我不知道我在做什么来扰乱来自循环的初始响应,使上面的代码片段不起作用让我们试试这个:
res = """ <div class="gunDetails">
<h4>Specifications</h4>
<ul class="features">
<li><label>Make:</label><span itemprop="brand">Gamo</span></li>
<li><label>Model:</label><span itemprop="model">Coyote Black Tactical</span></li>
<li><label>Licence:</label><span>No Licence</span></li>
<li><label>Orient.:</label><span>Ambidextrous</span></li>
<li><label>Scope:</label><span>Unknown 3-9x32</span></li>
<li><label>Origin:</label><span>Spanish</span></li>
<li><label>Cased:</label><span>Other</span></li>
<li><label>Trigger:</label><span>1</span></li>
<li><label>Condition:</label><span itemprop="itemCondition">Used</span></li>
</ul>
</div>
""
from bs4 import BeautifulSoup as bs
import csv
labels = []
spans = []
soup = bs(res, 'html.parser')
gun_details = soup.select_one('div.gunDetails')
for l in gun_details.select('label'):
labels.append(l.text.replace(':',''))
for s in gun_details.select('span'):
spans.append(s.text)
my_dict = dict(zip(labels, spans))
with open('mycsvfile.csv','w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=None)
for key in my_dict.keys():
csvfile.write(f"{key},{my_dict[key]}\n")
您希望输出的内容是什么?您真的希望能够将每个项目标识为一个单独的项目并打印其文本。我甚至想知道我如何得到整个
“make:”:“Gamo”,“Model:”:“Coyote Black Tactical”
,等等,我用我的完整代码更新了它,看看是否有人能看到我去哪里,现在还不清楚;能发布准确的预期输出吗?嗨,杰克,谢谢你。在这种情况下效果最好。我遇到的一个问题是,在我上面的代码中加入了关于rage(1,3)页面完整脚本中循环的内容。。。。。。继续获取错误“NoneType”对象没有属性“select”,我想这就是我调用for循环中的链接的方式。我将再次编辑完整脚本,以显示当前脚本及其更新,包括您在其中有24个链接;我使用了这些链接的随机选择,没有得到错误。请尝试查找发生错误的确切链接。@AndrewGlass-So,您找到有问题的url了吗?@AndrewGlass-当然,但您必须将其作为单独的问题(So策略)发布,我来看看。而且,如果你完成了这个,你应该接受它的答案(如果可以接受的话)。
Make: Gamo
res = """ <div class="gunDetails">
<h4>Specifications</h4>
<ul class="features">
<li><label>Make:</label><span itemprop="brand">Gamo</span></li>
<li><label>Model:</label><span itemprop="model">Coyote Black Tactical</span></li>
<li><label>Licence:</label><span>No Licence</span></li>
<li><label>Orient.:</label><span>Ambidextrous</span></li>
<li><label>Scope:</label><span>Unknown 3-9x32</span></li>
<li><label>Origin:</label><span>Spanish</span></li>
<li><label>Cased:</label><span>Other</span></li>
<li><label>Trigger:</label><span>1</span></li>
<li><label>Condition:</label><span itemprop="itemCondition">Used</span></li>
</ul>
</div>
""
from bs4 import BeautifulSoup as bs
import csv
labels = []
spans = []
soup = bs(res, 'html.parser')
gun_details = soup.select_one('div.gunDetails')
for l in gun_details.select('label'):
labels.append(l.text.replace(':',''))
for s in gun_details.select('span'):
spans.append(s.text)
my_dict = dict(zip(labels, spans))
with open('mycsvfile.csv','w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=None)
for key in my_dict.keys():
csvfile.write(f"{key},{my_dict[key]}\n")
Make Gamo
Model Coyote Black Tactical
Licence No Licence
Orient. Ambidextrous
Scope Unknown 3-9x32
Origin Spanish
Cased Other
Trigger 1
Condition Used