通过开放url的请求_Url_Beautifulsoup_Python Requests

通过开放url的请求

url

通过开放url的请求,url,beautifulsoup,python-requests,Url,Beautifulsoup,Python Requests,我对所有关于链接url请求的帖子感到非常困惑，因为我自己无法解决这个问题。我试图从一个网页上获取一些信息，然后打开一个新的“a href”，其中存储了我想要的更多信息 from bs4 import BeautifulSoup import requests from csv import reader, writer, DictWriter, DictReader source = requests.get("http://www.bda-ieo.it/test/Group.asp

我对所有关于链接url请求的帖子感到非常困惑，因为我自己无法解决这个问题。我试图从一个网页上获取一些信息，然后打开一个新的“a href”，其中存储了我想要的更多信息

    from bs4 import BeautifulSoup
import requests
from csv import reader, writer, DictWriter, DictReader

source = requests.get("http://www.bda-ieo.it/test/Group.aspx?Lan=Ita")
soup = BeautifulSoup(source.text, "html.parser")


titolo_sezione = ""
table_row = ""
with open("genere.txt", "w", newline="") as txt_file:
    headers = ["GRUPPO MERCEOLOGICO", "CODICE MERCEOLOGICO", "ALIMENTO"]
    csv_writer = DictWriter(txt_file, fieldnames=headers, delimiter=';')
    csv_writer.writeheader()

for table_row in soup.find("table", id="tblResult").find_all("tr"):
    className = ""
    if table_row.get("class"):
        className = table_row.get("class").pop()

        if className == "testobold":
            titolo_sezione = table_row.text

        if className == "testonormale":
            for cds in table_row.find_all("td"):
                url = cds.get("a")

                urls = requests.get("http://www.bda-ieo.it/test/Groupfood.aspx?Lan=Ita + url")
                dage = BeautifulSoup(urls.text, "html.parser")


                alimenti = ""
                for alimenti in dage:
                    id_alimento, destra = alimenti.find_all("td")
                    codice = id_alimento.text
                    nome = destra.text
                    href = destra.a.get("href")

                print(f'{titolo_sezione}; {id_alimento.text}; {nome.text}')

变量URL不会打开任何其他页面。有人能帮我弄清楚吗？我被困在那上面了

多谢各位

Mass

您需要重新处理其中的一些逻辑，以及阅读一些有关字符串格式的内容。我记下了我在哪里做的更改，我不确定您到底想要什么作为输出，但这可能会让您继续

from bs4 import BeautifulSoup
import requests
from csv import reader, writer, DictWriter, DictReader

source = requests.get("http://www.bda-ieo.it/test/Group.aspx?Lan=Ita")
soup = BeautifulSoup(source.text, "html.parser")


titolo_sezione = ""
table_row = ""
with open("c:/test/genere.txt", "w", newline="") as txt_file:
    headers = ["GRUPPO MERCEOLOGICO", "CODICE MERCEOLOGICO", "ALIMENTO"]
    csv_writer = DictWriter(txt_file, fieldnames=headers, delimiter=';')
    csv_writer.writeheader()

for table_row in soup.find("table", id="tblResult").find_all("tr"):
    className = ""
    if table_row.get("class"):
        className = table_row.get("class").pop()

        if className == "testobold":
            titolo_sezione = table_row.text

        if className == "testonormale":
            for cds in table_row.find_all("a", href=True): #<-- the hrefs are in the <a> tags within the <td> tags. So you need to find <a> tags that have href
                url = cds['href'] #<--- get the href

                urls = requests.get("http://www.bda-ieo.it/test/%s" %url) #<--- use that stored string to put into the new url you'll be using
                dage = BeautifulSoup(urls.text, "html.parser") #<-- create BeautifulSoup object with that response
                dageTbl = dage.find("table", id="tblResult") #<--- find the table in this html now 
                if dageTbl:   #<--- if there is that table
                    for alimenti in dageTbl.find_all('tr', {'class':'testonormale'}): #<--- find the rows with the specific class
                        id_alimento, destra = alimenti.find_all("td") 
                        codice = id_alimento.text
                        nome = destra.text.strip() #<--- added strip() to remove whitespace
                        href = destra.a.get("href")

                        print(f'{titolo_sezione}; {codice}; {nome}') #<--- fixed string formatting here too

我看到的第一个问题是，您使用

硬编码url以执行请求http://www.bda-ieo.it/test/Groupfood.aspx?Lan=Ita +url“

。你到底想干什么？我可以帮你，谢谢你！你明白了我想表达的意思，并在我脑海中清楚地表明了一些步骤。再次感谢你！！

PATATE; 381; PATATE
PATATE; 50399; PATATE DOLCI
PATATE; 380; PATATE NOVELLE
PATATE; 3002; PATATE, FECOLA
PATATE; 100219; PATATE, POLVERE ISTANTANEA
PATATE; 382; PATATINE IN SACCHETTO
PATATE; 18; TAPIOCA
VEGETALI; 303; ASPARAGI DI BOSCO
VEGETALI; 304; ASPARAGI DI CAMPO
VEGETALI; 305; ASPARAGI DI SERRA
VEGETALI; 700484; ASPARAGI IN SCATOLA
VEGETALI; 8035; GERMOGLI DI ERBA MEDICA
...