在Python3中，如何使用.append函数向已删除的链接添加字符串？_Python_Python 3.x_Web Scraping_Beautifulsoup

在Python3中，如何使用.append函数向已删除的链接添加字符串？

python python-3.x web-scraping

在Python3中，如何使用.append函数向已删除的链接添加字符串？,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,多亏了stackoverflow.com，我才能够编写一个程序，从任何给定的网页上刮取web链接。但是，我需要它将主URL连接到它遇到的任何相对链接。（例如：“”可以，但仅仅“/sitemap”本身是不可以的。）在下面的代码中 from bs4 import BeautifulSoup as mySoup from urllib.parse import urljoin as myJoin from urllib.request import urlopen as myRequest bas

多亏了stackoverflow.com，我才能够编写一个程序，从任何给定的网页上刮取web链接。但是，我需要它将主URL连接到它遇到的任何相对链接。（例如：“”可以，但仅仅“/sitemap”本身是不可以的。）

在下面的代码中

from bs4 import BeautifulSoup as mySoup
from urllib.parse import urljoin as myJoin
from urllib.request import urlopen as myRequest

base_url = "https://www.census.gov/programs-surveys/popest.html"

html_page = myRequest(base_url)
raw_html = html_page.read()
page_soup = mySoup(raw_html, "html.parser")
html_page.close()

f = open("census4-3.csv", "w")

all_links = page_soup.find_all('a', href=True)

def clean_links(tags, base_url):
    cleaned_links = set()
    for tag in tags:
        link = tag.get('href')
        if link is None:
            continue
        full_url = myJoin(base_url, link)
        cleaned_links.add(full_url)
    return cleaned_links

cleaned_links = clean_links(all_links, base_url)

for link in cleaned_links:
    f.write(str(link) + '\n')

f.close()
print("The CSV file is saved to your computer.")

我将如何以及在何处添加以下内容：

.append("http://www.google.com")

您应该将基本url另存为

base\u url=https://www.census.gov“

像这样调用请求

html_page = myRequest(base_url + '/programs-surveys/popest.html')

当您想要获取任何

完整的\u url

时，只需执行以下操作

full_url = base_url + link

我还没有一个Python IDE来测试这一点，因此我给出了一个注释而不是答案。对于初学者来说，

.append

方法适用于列表，而不是字符串。要在Python中组合字符串，请使用+。也可以使用+=与递增数值变量的方式相同。例如，将字符串

a=“Hello”

重新定义为

a=“Hello world”

a+=“world”

您需要确保两个变量都是字符串。OP的代码中不需要这样做

link=tag.get（'href'）

将是一个字符串或

None

，他处理了

None

的案例。