Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/347.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将输出转储到txt?_Python - Fatal编程技术网

Python 如何将输出转储到txt?

Python 如何将输出转储到txt?,python,Python,我想制作一个简单的程序,从站点提取URL,然后将其转储到一个.txt文件中 下面的代码工作得很好,但当我试图将其转储到文件时,会出现错误 来自bs4导入BeautifulSoup,SoupStrainer 导入请求 url=”https://stackoverflow.com" page=请求.get(url) data=page.text 汤=美汤(数据) cr='C:\Users\Admin\Desktop\extracted.txt' 查找所有('a'): 打印(link.get('hre

我想制作一个简单的程序,从站点提取URL,然后将其转储到一个.txt文件中

下面的代码工作得很好,但当我试图将其转储到文件时,会出现错误

来自bs4导入BeautifulSoup,SoupStrainer
导入请求
url=”https://stackoverflow.com"
page=请求.get(url)
data=page.text
汤=美汤(数据)
cr='C:\Users\Admin\Desktop\extracted.txt'
查找所有('a'):
打印(link.get('href'))
我试过了

open(cr, 'w') as f:
  for link in soup.find_all('a'):
    print(link.get('href'))
    f.write(link.get('href'))
它转储一些链接,而不是全部链接——它们都在一行中(我得到TypeError:应该是字符串或其他字符缓冲区对象)

.txt
中的结果应如下所示:

/teams/customers
/teams/use-cases
/questions
/teams
/enterprise
https://www.stackoverflowbusiness.com/talent
https://www.stackoverflowbusiness.com/advertising
https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f
https://stackoverflow.com/users/signup?ssrc=head&returnurl=%2fusers%2fstory%2fcurrent
https://stackoverflow.com
https://stackoverflow.com
https://stackoverflow.com/help
https://chat.stackoverflow.com
https://meta.stackoverflow.com
https://stackoverflow.com/users/signup?ssrc=site_switcher&returnurl=%2fusers%2fstory%2fcurrent
https://stackoverflow.com/users/login?ssrc=site_switcher&returnurl=https%3a%2f%2fstackoverflow.com%2f
https://stackexchange.com/sites
https://stackoverflow.blog
https://stackoverflow.com/legal/cookie-policy
https://stackoverflow.com/legal/privacy-policy
https://stackoverflow.com/legal/terms-of-service/public
来自bs4导入BeautifulSoup,SoupStrainer
导入请求
url=”https://stackoverflow.com"
page=请求.get(url)
data=page.text
汤=美汤(数据)
cr='C:\Users\Admin\Desktop\crawler\extracted.txt'
打开(cr,'w')作为f:
查找所有('a'):
打印(link.get('href'))
f、 写入(link.get('href'))
试试这个:

with open(cr, 'w') as f:
   for link in soup.find_all('a'):
      link_text = link.get('href')
      if link_text is not None:
          print(link.get('href'))
          f.write(link.get('href') + '\n')

所以。。。正如西蒙·芬克(Simon Fink)所说的那样,这是可行的。但是我找到了另一个

打开(cr,'w')作为f的
:
查找所有('a'):
打印(link.get('href'))
尝试:
f、 写入(link.get('href')+'\n')
除:
持续

但我认为Simon Fink提出的方法更好。非常感谢

您有
f.write
但我看不到您创建
f
+
write
会将它们放在一行中。您负责格式化。只要在每次你称它为“我的坏”之后添加一个
\n
,我添加了一个assabled,你可能需要检查
链接。get('href')
是否
,以防
href
未定义以避免类型错误。只需继续捕获每个异常可能不是一个好主意,如果只捕获预期的异常(在本例中为TypeError),可能会更好。因此,如果您有其他例外情况,它们将被正确地提出。我将按照您的建议执行。感谢这是3.8中新的walrus操作符的一个很好的用例<代码>如果链接\文本:=link.get('href'):
from bs4 import BeautifulSoup, SoupStrainer
import requests

url = "https://stackoverflow.com"

page = requests.get(url)    
data = page.text
soup = BeautifulSoup(data)
cr= r'C:\Users\Admin\Desktop\extracted.txt'
links = []

for link in soup.find_all('a'):
    print(link.get('href'))
    if link.get('href'):
        links.append(link.get('href'))


with open(cr, 'w') as f:
    for link in links:
        print(link)
        f.write(link + '\n')