Python 3.x 我怎样才能取回所有的条目而不是第一个?
我对python还很陌生,现在仍在学习编程 我正在从本页查找webscrape标题和艺术家: 并将它们排列成表格格式 我已经能够通过以下方式提取具有bs4/请求的项目:Python 3.x 我怎样才能取回所有的条目而不是第一个?,python-3.x,web-scraping,Python 3.x,Web Scraping,我对python还很陌生,现在仍在学习编程 我正在从本页查找webscrape标题和艺术家: 并将它们排列成表格格式 我已经能够通过以下方式提取具有bs4/请求的项目: for title in soup.find_all('div', attrs={'class':'chart-list-item__title'}): print(title.text) for artist in soup.find_all('div', attrs={'class':'chart-list-ite
for title in soup.find_all('div', attrs={'class':'chart-list-item__title'}):
print(title.text)
for artist in soup.find_all('div', attrs={'class':'chart-list-item__artist'}):
print(artist.text)
但当我试图将对象设置为变量时,它只返回第一项
title1 = title.text
print(title1)
我怎样才能把所有的领带带回来
import requests
r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')
for title in soup.find_all('div', attrs={'class':'chart-list-item__title'}):
print(title.text)
for artist in soup.find_all('div', attrs={'class':'chart-list-item__artist'}):
print(artist.text)
title1 = title.text
print(title1)
您可以使用函数组合数据
i.text.strip()
去掉尾随的新行/n
import pandas as pd
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
soup = BeautifulSoup(r.text, 'html.parser')
title = [i.text.strip() for i in (soup.find_all('div', attrs={'class':'chart-list-item__title'}))]
artist = [i.text.strip() for i in (soup.find_all('div', attrs={'class':'chart-list-item__artist'}))]
print(list(zip(artist,title)))
输出
使用熊猫在数据框中保存数据时 输出
使用此类
图表列表项定义循环,然后指定要获取的循环中的字段。假设下面的脚本应该产生rank
、artist
和album
名称
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.find_all(class_="chart-list-item"):
rank = item.find(class_="chart-list-item__rank").get_text(strip=True)
artist = item.find(class_="chart-list-item__artist").get_text(strip=True)
album = item.find(class_="chart-list-item__title-text").get_text(strip=True)
print(rank,artist,album)
输出如下:
1 Clint Black Nobody's Home
2 Tanya Tucker My Arms Stay Open All Night
3 Ricky Van Shelton Statue Of A Fool
4 Alabama Southern Star
5 Keith Whitley It Ain't Nothin'
谢谢,这是一个很好的解决方案!读了这本书,你学到了很多!这是一个伟大的方法!
Title Artist
0 Nobody's Home Clint Black
1 My Arms Stay Open All Night Tanya Tucker
2 Statue Of A Fool Ricky Van Shelton
3 Southern Star Alabama
4 It Ain't Nothin' Keith Whitley
5 It's You Again Skip Ewing
6 When I Could Come Home To You Steve Wariner
7 Many A Long & Lonesome Highway Rodney Crowell
8 That Just About Does It Vern Gosdin
9 Start All Over Again The Desert Rose Band
10 Out Of Your Shoes Lorrie Morgan
11 On Second Thought Eddie Rabbitt
12 One Man Woman The Judds
13 Till I Can't Take It Anymore Billy Joe Royal
14 Overnight Success George Strait
15 Where've You Been Kathy Mattea
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.find_all(class_="chart-list-item"):
rank = item.find(class_="chart-list-item__rank").get_text(strip=True)
artist = item.find(class_="chart-list-item__artist").get_text(strip=True)
album = item.find(class_="chart-list-item__title-text").get_text(strip=True)
print(rank,artist,album)
1 Clint Black Nobody's Home
2 Tanya Tucker My Arms Stay Open All Night
3 Ricky Van Shelton Statue Of A Fool
4 Alabama Southern Star
5 Keith Whitley It Ain't Nothin'