import-im6.q16:未授权错误';os';@Python web刮板的错误/constructe.c/WriteImage/1037
我为一个漫画网站写了一个简单的网页刮板。我在Ubuntu上运行它(import-im6.q16:未授权错误';os';@Python web刮板的错误/constructe.c/WriteImage/1037,python,python-3.x,ubuntu,beautifulsoup,Python,Python 3.x,Ubuntu,Beautifulsoup,我为一个漫画网站写了一个简单的网页刮板。我在Ubuntu上运行它(Linux-Ubuntu 4.18.0-16-generic#17~18.04.1-Ubuntu),但当我执行脚本(权限设置为chmod ug+x)时,我在导入的系统库中不断遇到一系列错误以及一个令人困惑的语法错误: import-im6.q16: not authorized `time' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized `
Linux-Ubuntu 4.18.0-16-generic#17~18.04.1-Ubuntu
),但当我执行脚本(权限设置为chmod ug+x
)时,我在导入的系统库中不断遇到一系列错误以及一个令人困惑的语法错误:
import-im6.q16: not authorized `time' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `os' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `sys' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `re' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized `requests' @ error/constitute.c/WriteImage/1037.
from: can't read /var/mail/bs4
./poorlywrittenscraper.py: line 15: DEFAULT_DIR_NAME: command not found
./poorlywrittenscraper.py: line 16: syntax error near unexpected token `('
./poorlywrittenscraper.py: line 16: `COMICS_DIRECTORY = os.path.join(os.getcwd(), DEFAULT_DIR_NAME)'
有趣的是,当我通过python3
运行同一个脚本时,它会启动,创建文件夹,获取图像,但。。。这并不能拯救他们。o、 o
你知道我在这里遗漏了什么或者如何修复吗
以下是脚本的完整代码:
"""
A simple image downloader for poorlydrawnlines.com/archive
"""
import time
import os
import sys
import re
import concurrent.futures
import requests
from bs4 import BeautifulSoup as bs
DEFAULT_DIR_NAME = "poorly_created_folder"
COMICS_DIRECTORY = os.path.join(os.getcwd(), DEFAULT_DIR_NAME)
LOGO = """
a Python comic(al) scraper for poorlydwarnlines.com
__
.-----.-----.-----.----.| |.--.--.
| _ | _ | _ | _|| || | |
| __|_____|_____|__| |__||___ |
|__| |_____|
__ __ __
.--.--.--.----.|__| |_| |_.-----.-----.
| | | | _|| | _| _| -__| |
|________|__| |__|____|____|_____|__|__|
.-----.----.----.---.-.-----.-----.----.
|__ --| __| _| _ | _ | -__| _|
|_____|____|__| |___._| __|_____|__|
|__|
version: 0.4 | author: baduker | https://github.com/baduker
"""
ARCHIVE_URL = "http://www.poorlydrawnlines.com/archive/"
COMIC_PATTERN = re.compile(r'http://www.poorlydrawnlines.com/comic/.+')
def download_comics_menu(comics_found):
"""
Main download menu, takes number of available comics for download
"""
print("\nThe scraper has found {} comics.".format(len(comics_found)))
print("How many comics do you want to download?")
print("Type 0 to exit.")
while True:
try:
comics_to_download = int(input(">> "))
except ValueError:
print("Error: expected a number. Try again.")
continue
if comics_to_download > len(comics_found) or comics_to_download < 0:
print("Error: incorrect number of comics to download. Try again.")
continue
elif comics_to_download == 0:
sys.exit()
return comics_to_download
def grab_image_src_url(session, url):
"""
Fetches urls with the comic image source
"""
response = session.get(url)
soup = bs(response.text, 'html.parser')
for i in soup.find_all('p'):
for img in i.find_all('img', src=True):
return img['src']
def download_and_save_comic(session, url):
"""
Downloads and saves the comic image
"""
file_name = url.split('/')[-1]
with open(os.path.join(COMICS_DIRECTORY, file_name), "wb") as file:
response = session.get(url)
file.write(response.content)
def fetch_comics_from_archive(session):
"""
Grabs all urls from the poorlydrawnlines.com/archive and parses for only those that link to published comics
"""
response = session.get(ARCHIVE_URL)
soup = bs(response.text, 'html.parser')
comics = [url.get("href") for url in soup.find_all("a")]
return [url for url in comics if COMIC_PATTERN.match(url)]
def download_comic(session, url):
"""
Download progress information
"""
print("Downloading: {}".format(url))
url = grab_image_src_url(session, url)
download_and_save_comic(session, url)
def main():
"""
Encapsulates and executes all methods in the main function
"""
print(LOGO)
session = requests.Session()
comics = fetch_comics_from_archive(session)
comics_to_download = download_comics_menu(comics)
try:
os.mkdir(DEFAULT_DIR_NAME)
except OSError as exc:
sys.exit("Failed to create directory (error_no {})".format(exc.error_no))
start = time.time()
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(lambda url: download_comic(session, url), comics[:comics_to_download])
executor.shutdown()
end = time.time()
print("Finished downloading {} comics in {:.2f} sec.".format(comics_to_download, end - start))
if __name__ in "__main__":
main()
“”“
poorlydrawnlines.com/archive的简单图像下载程序
"""
导入时间
导入操作系统
导入系统
进口稀土
进口期货
导入请求
从bs4导入BeautifulSoup作为bs
默认\u DIR\u NAME=“创建的\u文件夹不好”
COMICS\u DIRECTORY=os.path.join(os.getcwd(),默认\u DIR\u名称)
LOGO=”“”
poorlydwarnlines.com的Python漫画(al)刮刀
__
.-----.-----.-----.----.| |.--.--.
| _ | _ | _ | _|| || | |
| __|_____|_____|__| |__||___ |
|__| |_____|
__ __ __
.--.--.--.----.|__| |_| |_.-----.-----.
| | | | _|| | _| _| -__| |
|________|__| |__|____|____|_____|__|__|
.-----.----.----.---.-.-----.-----.----.
|__ --| __| _| _ | _ | -__| _|
|_____|____|__| |___._| __|_____|__|
|__|
版本:0.4 |作者:巴杜克|https://github.com/baduker
"""
存档\u URL=”http://www.poorlydrawnlines.com/archive/"
漫画模式=重新编译(r'http://www.poorlydrawnlines.com/comic/.+')
def下载漫画菜单(找到漫画):
"""
主下载菜单,获取可供下载的漫画数量
"""
打印(“\n刮板已找到{}个漫画。”.format(len(comics_-found)))
打印(“您想下载多少本漫画?”)
打印(“键入0以退出”)
尽管如此:
尝试:
漫画下载=int(输入(“>>”)
除值错误外:
打印(“错误:需要一个数字。请重试。”)
持续
如果漫画下载>len(找到漫画)或漫画下载<0:
打印(“错误:要下载的漫画数量不正确。请重试。”)
持续
elif漫画下载==0:
sys.exit()
将漫画返回至下载
def grab_image_src_url(会话,url):
"""
获取带有漫画图像源的URL
"""
response=session.get(url)
soup=bs(response.text'html.parser')
因为我在汤里。找到所有的('p'):
对于i.find_all('img',src=True)中的img:
返回img['src']
def下载和保存漫画(会话,url):
"""
下载并保存漫画图像
"""
file_name=url.split('/')[-1]
打开(os.path.join(漫画目录,文件名),“wb”)作为文件:
response=session.get(url)
file.write(response.content)
def从文档中获取漫画(会话):
"""
从poorlydrawnlines.com/archive获取所有URL,并仅解析链接到已发布漫画的URL
"""
response=session.get(存档\u URL)
soup=bs(response.text'html.parser')
漫画=[url.get(“href”)表示汤中的url.find_all(“a”)]
返回[漫画中url的url如果漫画模式匹配(url)]
def下载漫画(会话,url):
"""
下载进度信息
"""
打印(“下载:{}”。格式(url))
url=grab\u image\u src\u url(会话,url)
下载和保存漫画(会话,url)
def main():
"""
封装并执行主函数中的所有方法
"""
印刷品(标志)
会话=请求。会话()
漫画=从存档中获取漫画(会话)
漫画下载=下载漫画菜单(漫画)
尝试:
os.mkdir(默认目录名)
除OSError作为exc外:
sys.exit(“创建目录失败(错误号{})”。格式(exc.error号))
开始=时间。时间()
以concurrent.futures.ThreadPoolExecutor()作为执行器:
executor.map(lambda url:download\u漫画(会话,url),漫画[:漫画下载])
执行器关闭()
end=time.time()
打印(“在{:.2f}秒内完成{}漫画下载。”.format(漫画下载,结束-开始))
如果“\uuuuu main\uuuuuuuuuuuuuuuuuuuuu”中的“名称”:
main()
例如,我很确定您在文件开头缺少一个shebang
#!/usr/bin/env python3
#!/usr/bin/env python2
你的
#是对的代码>但它部分解决了问题。我现在可以正常运行脚本了,但由于某种原因,它仍然不会将文件写入磁盘,尽管它会下载它们。我想这部分,以open(os.path.join(漫画目录,文件名),“wb”)作为file:response=session.get(url)file.write(response.content)
的形式打开(os.path.join(漫画目录,文件名),“wb”)是不是你提到的那部分不起作用?在这种情况下,我会尝试检查所有内容,这意味着检查文件是否已创建并正确打开,get请求是否成功,等等。。。另外,可能是一个明显的问题,但您有权在目标文件夹中写入?是的,我有权执行脚本并写入任何文件夹。我会检查一下方法,然后再给你回复。谢谢你的反馈!