Python 美丽的汤-在带字符串的标签中找到标签？第N个孩子？_Python_Web Scraping_Beautifulsoup_Python Requests

Python 美丽的汤-在带字符串的标签中找到标签？第N个孩子？

python web-scraping

Python 美丽的汤-在带字符串的标签中找到标签？第N个孩子？,python,web-scraping,beautifulsoup,python-requests,Python,Web Scraping,Beautifulsoup,Python Requests,我在使用下面的HTML刮片时遇到了一些问题 res = <div class="gunDetails"> <h4>Specifications</h4> <ul class="features"> <li><label>Make:</label><span itemprop="brand">Gamo</span></li>

我在使用下面的HTML刮片时遇到了一些问题

 res =   <div class="gunDetails">
    <h4>Specifications</h4>
    <ul class="features">
        <li><label>Make:</label><span itemprop="brand">Gamo</span></li>
        <li><label>Model:</label><span itemprop="model">Coyote Black Tactical</span></li>
        <li><label>Licence:</label><span>No Licence</span></li>
        <li><label>Orient.:</label><span>Ambidextrous</span></li>
        <li><label>Scope:</label><span>Unknown&nbsp;3-9x32</span></li>
        <li><label>Origin:</label><span>Spanish</span></li>
        <li><label>Cased:</label><span>Other</span></li>
        <li><label>Trigger:</label><span>1</span></li>
        <li><label>Condition:</label><span itemprop="itemCondition">Used</span></li>
    </ul>
  </div>

输出

Gamo
Coyote Black Tactical
No License
Ambidextrous
Unknown 3-9x32
Spanish
Other
1
Used

是否可以为每个标签文本创建一个变量？什么样的

gun_make = gun_details.findAll('label', String="Make:")
print(gun_make).text

这是完整的代码：

from bs4 import BeautifulSoup
import requests
import csv

all_links=[]
labels = []
spans = []
url="https://www.guntrader.uk/dealers/redcar/spencers-sporting-guns/guns?page={}"

for page in range(1,3):
  res=requests.get(url.format(page)).text
  soup=BeautifulSoup(res,'html.parser')
  for link in soup.select('a[href*="/dealers/redcar/spencers-sporting-guns/guns/shotguns"]'):
  all_links.append("https://www.guntrader.uk" + link['href'])


print(len(all_links))
for a_link in all_links:
  res = requests.get(a_link).text
  soup = BeautifulSoup(res, 'html.parser')
  gun_details = soup.select('div.gunDetails')
  for l in gun_details.select('label'):
   labels.append(l.text.replace(':',''))
  for s in gun_details.select('span'):
   spans.append(s.text)

my_dict = dict(zip(labels, spans))
with open('gundealer.csv','w') as csvfile:
 writer = csv.DictWriter(csvfile, fieldnames=None)
 for key in mydict.keys():
   csvfile.write(f"{key},{my_dict[key]}\n")

本节似乎独立工作，给出了正确的（ish）输出：

对于输出：

Make: Gamo

但是我不知道我在做什么来扰乱来自循环的初始响应，使上面的代码片段不起作用

让我们试试这个：

res =  """ <div class="gunDetails">
    <h4>Specifications</h4>
    <ul class="features">
        <li><label>Make:</label><span itemprop="brand">Gamo</span></li>
        <li><label>Model:</label><span itemprop="model">Coyote Black Tactical</span></li>
        <li><label>Licence:</label><span>No Licence</span></li>
        <li><label>Orient.:</label><span>Ambidextrous</span></li>
        <li><label>Scope:</label><span>Unknown&nbsp;3-9x32</span></li>
        <li><label>Origin:</label><span>Spanish</span></li>
        <li><label>Cased:</label><span>Other</span></li>
        <li><label>Trigger:</label><span>1</span></li>
        <li><label>Condition:</label><span itemprop="itemCondition">Used</span></li>
    </ul>
  </div>
""

from bs4 import BeautifulSoup as bs
import csv

labels = []
spans = []
soup = bs(res, 'html.parser')
gun_details = soup.select_one('div.gunDetails')
for l in gun_details.select('label'):
    labels.append(l.text.replace(':',''))
for s in gun_details.select('span'):
    spans.append(s.text)

my_dict = dict(zip(labels, spans))
with open('mycsvfile.csv','w') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=None)
        for key in my_dict.keys():
            csvfile.write(f"{key},{my_dict[key]}\n")

您希望输出的内容是什么？您真的希望能够将每个项目标识为一个单独的项目并打印其文本。我甚至想知道我如何得到整个

并在制作时将其拆分：只给我span输出我不确定你到底在寻找什么，但你可以创建一个包含键/值对的字典，如

“make:”：“Gamo”，“Model:”：“Coyote Black Tactical”

，等等，我用我的完整代码更新了它，看看是否有人能看到我去哪里，现在还不清楚；能发布准确的预期输出吗？嗨，杰克，谢谢你。在这种情况下效果最好。我遇到的一个问题是，在我上面的代码中加入了关于rage（1,3）页面完整脚本中循环的内容。。。。。。继续获取错误“NoneType”对象没有属性“select”，我想这就是我调用for循环中的链接的方式。我将再次编辑完整脚本，以显示当前脚本及其更新，包括您在其中有24个链接；我使用了这些链接的随机选择，没有得到错误。请尝试查找发生错误的确切链接。@AndrewGlass-So，您找到有问题的url了吗？@AndrewGlass-当然，但您必须将其作为单独的问题（So策略）发布，我来看看。而且，如果你完成了这个，你应该接受它的答案（如果可以接受的话）。

Make: Gamo

res =  """ <div class="gunDetails">
    <h4>Specifications</h4>
    <ul class="features">
        <li><label>Make:</label><span itemprop="brand">Gamo</span></li>
        <li><label>Model:</label><span itemprop="model">Coyote Black Tactical</span></li>
        <li><label>Licence:</label><span>No Licence</span></li>
        <li><label>Orient.:</label><span>Ambidextrous</span></li>
        <li><label>Scope:</label><span>Unknown&nbsp;3-9x32</span></li>
        <li><label>Origin:</label><span>Spanish</span></li>
        <li><label>Cased:</label><span>Other</span></li>
        <li><label>Trigger:</label><span>1</span></li>
        <li><label>Condition:</label><span itemprop="itemCondition">Used</span></li>
    </ul>
  </div>
""

from bs4 import BeautifulSoup as bs
import csv

labels = []
spans = []
soup = bs(res, 'html.parser')
gun_details = soup.select_one('div.gunDetails')
for l in gun_details.select('label'):
    labels.append(l.text.replace(':',''))
for s in gun_details.select('span'):
    spans.append(s.text)

my_dict = dict(zip(labels, spans))
with open('mycsvfile.csv','w') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=None)
        for key in my_dict.keys():
            csvfile.write(f"{key},{my_dict[key]}\n")

Make    Gamo
Model   Coyote Black Tactical
Licence No Licence
Orient. Ambidextrous
Scope   Unknown 3-9x32
Origin  Spanish
Cased   Other
Trigger 1
Condition   Used