通过开放url的请求
我对所有关于链接url请求的帖子感到非常困惑,因为我自己无法解决这个问题。 我试图从一个网页上获取一些信息,然后打开一个新的“a href”,其中存储了我想要的更多信息通过开放url的请求,url,beautifulsoup,python-requests,Url,Beautifulsoup,Python Requests,我对所有关于链接url请求的帖子感到非常困惑,因为我自己无法解决这个问题。 我试图从一个网页上获取一些信息,然后打开一个新的“a href”,其中存储了我想要的更多信息 from bs4 import BeautifulSoup import requests from csv import reader, writer, DictWriter, DictReader source = requests.get("http://www.bda-ieo.it/test/Group.asp
from bs4 import BeautifulSoup
import requests
from csv import reader, writer, DictWriter, DictReader
source = requests.get("http://www.bda-ieo.it/test/Group.aspx?Lan=Ita")
soup = BeautifulSoup(source.text, "html.parser")
titolo_sezione = ""
table_row = ""
with open("genere.txt", "w", newline="") as txt_file:
headers = ["GRUPPO MERCEOLOGICO", "CODICE MERCEOLOGICO", "ALIMENTO"]
csv_writer = DictWriter(txt_file, fieldnames=headers, delimiter=';')
csv_writer.writeheader()
for table_row in soup.find("table", id="tblResult").find_all("tr"):
className = ""
if table_row.get("class"):
className = table_row.get("class").pop()
if className == "testobold":
titolo_sezione = table_row.text
if className == "testonormale":
for cds in table_row.find_all("td"):
url = cds.get("a")
urls = requests.get("http://www.bda-ieo.it/test/Groupfood.aspx?Lan=Ita + url")
dage = BeautifulSoup(urls.text, "html.parser")
alimenti = ""
for alimenti in dage:
id_alimento, destra = alimenti.find_all("td")
codice = id_alimento.text
nome = destra.text
href = destra.a.get("href")
print(f'{titolo_sezione}; {id_alimento.text}; {nome.text}')
变量URL不会打开任何其他页面。有人能帮我弄清楚吗?
我被困在那上面了
多谢各位
Mass您需要重新处理其中的一些逻辑,以及阅读一些有关字符串格式的内容。我记下了我在哪里做的更改,我不确定您到底想要什么作为输出,但这可能会让您继续
from bs4 import BeautifulSoup
import requests
from csv import reader, writer, DictWriter, DictReader
source = requests.get("http://www.bda-ieo.it/test/Group.aspx?Lan=Ita")
soup = BeautifulSoup(source.text, "html.parser")
titolo_sezione = ""
table_row = ""
with open("c:/test/genere.txt", "w", newline="") as txt_file:
headers = ["GRUPPO MERCEOLOGICO", "CODICE MERCEOLOGICO", "ALIMENTO"]
csv_writer = DictWriter(txt_file, fieldnames=headers, delimiter=';')
csv_writer.writeheader()
for table_row in soup.find("table", id="tblResult").find_all("tr"):
className = ""
if table_row.get("class"):
className = table_row.get("class").pop()
if className == "testobold":
titolo_sezione = table_row.text
if className == "testonormale":
for cds in table_row.find_all("a", href=True): #<-- the hrefs are in the <a> tags within the <td> tags. So you need to find <a> tags that have href
url = cds['href'] #<--- get the href
urls = requests.get("http://www.bda-ieo.it/test/%s" %url) #<--- use that stored string to put into the new url you'll be using
dage = BeautifulSoup(urls.text, "html.parser") #<-- create BeautifulSoup object with that response
dageTbl = dage.find("table", id="tblResult") #<--- find the table in this html now
if dageTbl: #<--- if there is that table
for alimenti in dageTbl.find_all('tr', {'class':'testonormale'}): #<--- find the rows with the specific class
id_alimento, destra = alimenti.find_all("td")
codice = id_alimento.text
nome = destra.text.strip() #<--- added strip() to remove whitespace
href = destra.a.get("href")
print(f'{titolo_sezione}; {codice}; {nome}') #<--- fixed string formatting here too
我看到的第一个问题是,您使用
硬编码url以执行请求http://www.bda-ieo.it/test/Groupfood.aspx?Lan=Ita +url“
。你到底想干什么?我可以帮你,谢谢你!你明白了我想表达的意思,并在我脑海中清楚地表明了一些步骤。再次感谢你!!
PATATE; 381; PATATE
PATATE; 50399; PATATE DOLCI
PATATE; 380; PATATE NOVELLE
PATATE; 3002; PATATE, FECOLA
PATATE; 100219; PATATE, POLVERE ISTANTANEA
PATATE; 382; PATATINE IN SACCHETTO
PATATE; 18; TAPIOCA
VEGETALI; 303; ASPARAGI DI BOSCO
VEGETALI; 304; ASPARAGI DI CAMPO
VEGETALI; 305; ASPARAGI DI SERRA
VEGETALI; 700484; ASPARAGI IN SCATOLA
VEGETALI; 8035; GERMOGLI DI ERBA MEDICA
...