Python web刮片:刮片后写入文件失败
我自己在练习网络抓取,并试图从一个中国在线小说网站上抓取python的在线小说系列。在我将python代码放入函数后,它似乎停止了。 我写了一段代码如下:Python web刮片:刮片后写入文件失败,python,file-io,Python,File Io,我自己在练习网络抓取,并试图从一个中国在线小说网站上抓取python的在线小说系列。在我将python代码放入函数后,它似乎停止了。 我写了一段代码如下: import requests from bs4 import BeautifulSoup page = requests.get('https://www.51shucheng.net/zh-tw/wuxia/shediaoyingxiongzhuan') soup = BeautifulSoup(page.content,'lxml
import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.51shucheng.net/zh-tw/wuxia/shediaoyingxiongzhuan')
soup = BeautifulSoup(page.content,'lxml')
page_list = soup.find_all(class_='mulu-list')
pages = page_list[0].find_all('a')
print(pages[0])
for i in range(len(pages)):
pages[i] = pages[i].get('href')
with open("射雕英雄傳1.txt", "w+") as file_object:
for i in range(len(pages)):
file_object.write('\n\n\t{}'.format(i+1))
page = requests.get(pages[i])
soup = BeautifulSoup(page.content,'lxml')
content = soup.find(class_='neirong').text
print(content[0:20])
file_object.write(content)
with open('射雕英雄傳1.txt') as oldfile, open('射雕英雄傳.txt', 'w') as newfile:
for line in oldfile:
if not ('adsbygoogle' in line):
newfile.write(line)
而且它工作得很好。然而,我想把它放在一个函数中,因此我做了以下修正。然后它就不起作用了:这个射雕英雄傳“1.txt”文件仍在创建中,但为空
import requests
from bs4 import BeautifulSoup
def scraping_novel(prefix,bookname):
page = requests.get('https://www.51shucheng.net/zh-tw/wuxia/{}'.format(prefix))
soup = BeautifulSoup(page.content,'lxml')
page_list = soup.find_all(class_='mulu-list')
pages = page_list[0].find_all('a')
print(pages[0])
for i in range(len(pages)):
pages[i] = pages[i].get('href')
with open("{}1.txt".format(bookname), "w+") as file_object:
for i in range(len(pages)):
file_object.write('\n\n\t{}'.format(i+1))
page = requests.get(pages[i])
soup = BeautifulSoup(page.content,'lxml')
content = soup.find(class_='neirong').text
print(content[0:20])
file_object.write(content)
with open("{}1.txt".format(bookname)) as oldfile, open("{}1.txt".format(bookname), 'w') as newfile:
for line in oldfile:
if not ('adsbygoogle' in line):
newfile.write(line)
scraping_novel("shediaoyingxiongzhuan","射雕英雄傳")
#failed
我试过两件事:
如果有人能告诉我发生了什么,我将不胜感激,因为我仍然无法找出哪里出了问题。使用python>=3.6? 做 但是,对于文件的覆盖。我猜你做不到。在一条语句中打开同一文件进行读写
with open("1.txt", "w+") as oldfile:
oldfile.write('test')
differentName = "12.txt"
with open("1.txt", "r") as oldfile, open(differentName, 'w') as newfile:
assert(len(oldfile.readlines()))
sameName = "1.txt"
with open(sameName, "r") as oldfile, open(sameName, 'w') as newfile:
assert(len(oldfile.readlines()))
Lydia van Dyke提到的打字错误导致文件被打开以供写入,并提前结束读取流。所以oldfile行上的循环执行了0次。Btw,没有错误消息。
open({}1.txt).format(bookname),'w')作为新文件
应该变成open({}.txt).format(bookname),'w')作为新文件
我认为。看起来像是复制粘贴错误。@LydiavanDyke即使有了您的更改,我也开始了,但看不到。。。在令人尴尬的很长一段时间内:-)@user9882001 diff-b或-w应该很容易告诉您。好的,由于这两个答案现在一致认为这个问题是由打字错误引起的,所以这个问题应该结束了。
with open("1.txt", "w+") as oldfile:
oldfile.write('test')
differentName = "12.txt"
with open("1.txt", "r") as oldfile, open(differentName, 'w') as newfile:
assert(len(oldfile.readlines()))
sameName = "1.txt"
with open(sameName, "r") as oldfile, open(sameName, 'w') as newfile:
assert(len(oldfile.readlines()))