python beautifulsoup刮取站点

python beautifulsoup刮取站点,python,beautifulsoup,scrape,Python,Beautifulsoup,Scrape,我正在尝试学习Python,以便使用beautifulsoup刮取网站午餐菜单。我已经提出了这个要求 r = requests.get(url) soup = BeautifulSoup(r.text, "html.parser") 反应如下: <div class="lunchRow"> <div class="lunchRowDay"><h3>Monday</h3></div> <div class="lunchRowIte

我正在尝试学习Python,以便使用beautifulsoup刮取网站午餐菜单。我已经提出了这个要求

r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
反应如下:

<div class="lunchRow">
<div class="lunchRowDay"><h3>Monday</h3></div>
<div class="lunchRowItem"><div class="lunchRowItemActual">Meatballs</div>
<div class="lunchRowItemActual">Soup</div>
</div>
</div>
<div class="lunchRow">
<div class="lunchRowDay"><h3>Tuesday</h3></div>
<div class="lunchRowItem"><div class="lunchRowItemActual">Chicken</div>
<div class="lunchRowItemActual">Pork</div>
<div class="lunchRowItemActual">Fish</div>
</div>
</div>

首先,您应该尝试按其类名获取所有午餐行div,并将其保存到如下所示的变量中:

rows = soup.findAll('div', attrs={'class': 'lunchRow'})
然后,我们可以循环它们,得到如下所示的各个日期和项目。这里我们获取第一个/唯一的午餐日项目,然后查找当前行中的所有午餐日项目实际元素:

for row in rows:
  print(row.find('div', attrs={'class': 'lunchRowDay'}).text)
  actuals = row.findAll('div', attrs={'class': 'lunchRowItemActual'})
  for actual in actuals:
    print(actual.text)
这项研究的结果是:

Monday
Meatballs
Soup
Tuesday
Chicken
Pork
Fish

与其将它们打印出来,您很可能希望将它们放在dict中,使用午餐日作为键,然后将午餐项目实际值放入列表中,但这取决于您。

soup.select是执行此类操作的一种很好的方法

然后使用get_文本。。。获取文本

一些列表理解将get_文本应用于整个列表

days = soup.select("div.lunchRowDay")
for day in days:
    print(day.get_text())
    items = [item.get_text() for item in day.select("div.lunchRowItemActual")]
    print(items)

获取所有
午餐行
div并在其中搜索
h3
午餐行项目实际
days = soup.select("div.lunchRowDay")
for day in days:
    print(day.get_text())
    items = [item.get_text() for item in day.select("div.lunchRowItemActual")]
    print(items)