Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/file/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python将数据从Wikipedia提取到txt文件_Python_File_Web Scraping_Io - Fatal编程技术网

使用Python将数据从Wikipedia提取到txt文件

使用Python将数据从Wikipedia提取到txt文件,python,file,web-scraping,io,Python,File,Web Scraping,Io,我已经实现了以下代码来从Wikipedia页面提取数据 import bs4 import sys import requests res = requests.get('https://en.wikipedia.org/wiki/Agriculture' ) res.raise_for_status() wiki = bs4.BeautifulSoup(res.text,"html.parser") for i in wiki.select('p'): print(i.getText

我已经实现了以下代码来从Wikipedia页面提取数据

import bs4
import sys
import requests

res = requests.get('https://en.wikipedia.org/wiki/Agriculture' )
res.raise_for_status()
wiki = bs4.BeautifulSoup(res.text,"html.parser")
for i in wiki.select('p'):
    print(i.getText())
这段代码根据我的需要从页面中提取所有数据。然而,我想使用Python将其存储在文本文件中,但我无法。 如果文本文件是从url本身提取的,那么它的名称应该更好,这样它就可以在多个wiki页面上使用

试试这个:

wiki_page = 'Agriculture'
res = requests.get(f'https://en.wikipedia.org/wiki/{wiki_page}' )
res.raise_for_status()
wiki = bs4.BeautifulSoup(res.text,"html.parser")

# open a file named as your wiki page in write mode
with open(wiki_page+".txt", "w", encoding="utf-8") as f:
    for i in wiki.select('p'):
        # write each paragraph to the file
        f.write(i.getText())
试试这个:

wiki_page = 'Agriculture'
res = requests.get(f'https://en.wikipedia.org/wiki/{wiki_page}' )
res.raise_for_status()
wiki = bs4.BeautifulSoup(res.text,"html.parser")

# open a file named as your wiki page in write mode
with open(wiki_page+".txt", "w", encoding="utf-8") as f:
    for i in wiki.select('p'):
        # write each paragraph to the file
        f.write(i.getText())

试试这个。这是参考资料。


试试这个。这是参考资料。


以下内容与所有python版本兼容:

import bs4
import sys
import requests
url = "https://en.wikipedia.org/wiki/Agriculture"
res = requests.get(url)
res.raise_for_status()
wiki = bs4.BeautifulSoup(res.text,"html.parser")
file_to_write = open(url.split('/')[-1]+".txt", "a")  # append mode
for i in wiki.select('p'):
    text_to_write = i.getText().encode('utf-8') 
    print(text_to_write)
    file_to_write.write(text_to_write)

file_to_write.close()

以下内容与所有python版本兼容:

import bs4
import sys
import requests
url = "https://en.wikipedia.org/wiki/Agriculture"
res = requests.get(url)
res.raise_for_status()
wiki = bs4.BeautifulSoup(res.text,"html.parser")
file_to_write = open(url.split('/')[-1]+".txt", "a")  # append mode
for i in wiki.select('p'):
    text_to_write = i.getText().encode('utf-8') 
    print(text_to_write)
    file_to_write.write(text_to_write)

file_to_write.close()

你好尝试它给出了以下错误。UnicodeEncodeError:“charmap”编解码器无法对位置74中的字符“\u016b”进行编码:字符映射到Happy to help:您好。尽管如此,但为什么不创建.txt文件?@Sca已更新,现在将在您的工作目录hi中创建Agriculture.txt。尝试它给出了以下错误。UnicodeEncodeError:“charmap”编解码器无法对位置74中的字符“\u016b”进行编码:字符映射到Happy to help:您好。尽管如此,但为什么不创建.txt文件?@Sca已更新,现在将在您的工作目录中创建Agriculture.txt