Python 如何将输出转储到txt？_Python

Python 如何将输出转储到txt？

python

Python 如何将输出转储到txt？,python,Python,我想制作一个简单的程序，从站点提取URL，然后将其转储到一个.txt文件中下面的代码工作得很好，但当我试图将其转储到文件时，会出现错误来自bs4导入BeautifulSoup，SoupStrainer 导入请求 url=”https://stackoverflow.com" page=请求.get（url） data=page.text 汤=美汤（数据） cr='C:\Users\Admin\Desktop\extracted.txt' 查找所有（'a'）：打印（link.get（'hre

我想制作一个简单的程序，从站点提取URL，然后将其转储到一个.txt文件中

下面的代码工作得很好，但当我试图将其转储到文件时，会出现错误

来自bs4导入BeautifulSoup，SoupStrainer
导入请求
url=”https://stackoverflow.com"
page=请求.get（url）
data=page.text
汤=美汤（数据）
cr='C:\Users\Admin\Desktop\extracted.txt'
查找所有（'a'）：
打印（link.get（'href'））

我试过了

open(cr, 'w') as f:
  for link in soup.find_all('a'):
    print(link.get('href'))
    f.write(link.get('href'))

它转储一些链接，而不是全部链接——它们都在一行中（我得到TypeError：应该是字符串或其他字符缓冲区对象）

.txt

中的结果应如下所示：

/teams/customers
/teams/use-cases
/questions
/teams
/enterprise
https://www.stackoverflowbusiness.com/talent
https://www.stackoverflowbusiness.com/advertising
https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f
https://stackoverflow.com/users/signup?ssrc=head&returnurl=%2fusers%2fstory%2fcurrent
https://stackoverflow.com
https://stackoverflow.com
https://stackoverflow.com/help
https://chat.stackoverflow.com
https://meta.stackoverflow.com
https://stackoverflow.com/users/signup?ssrc=site_switcher&returnurl=%2fusers%2fstory%2fcurrent
https://stackoverflow.com/users/login?ssrc=site_switcher&returnurl=https%3a%2f%2fstackoverflow.com%2f
https://stackexchange.com/sites
https://stackoverflow.blog
https://stackoverflow.com/legal/cookie-policy
https://stackoverflow.com/legal/privacy-policy
https://stackoverflow.com/legal/terms-of-service/public

来自bs4导入BeautifulSoup，SoupStrainer
导入请求
url=”https://stackoverflow.com"
page=请求.get（url）
data=page.text
汤=美汤（数据）
cr='C:\Users\Admin\Desktop\crawler\extracted.txt'
打开（cr，'w'）作为f：
查找所有（'a'）：
打印（link.get（'href'））
f、 写入（link.get（'href'））

试试这个：

with open(cr, 'w') as f:
   for link in soup.find_all('a'):
      link_text = link.get('href')
      if link_text is not None:
          print(link.get('href'))
          f.write(link.get('href') + '\n')

所以。。。正如西蒙·芬克（Simon Fink）所说的那样，这是可行的。但是我找到了另一个

打开（cr，'w'）作为f的

：
查找所有（'a'）：
打印（link.get（'href'））
尝试：
f、 写入（link.get（'href'）+'\n'）
除：
持续

但我认为Simon Fink提出的方法更好。非常感谢

您有

f.write

但我看不到您创建

write

会将它们放在一行中。您负责格式化。只要在每次你称它为“我的坏”之后添加一个

\n

，我添加了一个assabled，你可能需要检查

链接。get（'href'）

是否

无，以防href
未定义以避免类型错误。只需继续捕获每个异常可能不是一个好主意，如果只捕获预期的异常（在本例中为TypeError），可能会更好。因此，如果您有其他例外情况，它们将被正确地提出。我将按照您的建议执行。感谢这是3.8中新的walrus操作符的一个很好的用例<代码>如果链接\文本：=link.get（'href'）：
from bs4 import BeautifulSoup, SoupStrainer
import requests

url = "https://stackoverflow.com"

page = requests.get(url)    
data = page.text
soup = BeautifulSoup(data)
cr= r'C:\Users\Admin\Desktop\extracted.txt'
links = []

for link in soup.find_all('a'):
    print(link.get('href'))
    if link.get('href'):
        links.append(link.get('href'))


with open(cr, 'w') as f:
    for link in links:
        print(link)
        f.write(link + '\n')