Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/search/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用联机扫描仪扫描pdf文件的python脚本_Python_Pdf_Web Scraping - Fatal编程技术网

使用联机扫描仪扫描pdf文件的python脚本

使用联机扫描仪扫描pdf文件的python脚本,python,pdf,web-scraping,Python,Pdf,Web Scraping,我用这段代码用在线扫描仪“”用这张纸条扫描文件夹中包含的多个PDF文件 import mechanize import re import os def upload_file(uploaded_file): url = "https://wepawet.iseclab.org/" br = mechanize.Browser() br.set_handle_robots(False) # ignore robots br.open(url) br.se

我用这段代码用在线扫描仪“”用这张纸条扫描文件夹中包含的多个PDF文件

import mechanize
import re
import os

def upload_file(uploaded_file):
    url = "https://wepawet.iseclab.org/"
    br = mechanize.Browser()
    br.set_handle_robots(False) # ignore robots
    br.open(url)
    br.select_form(nr=0)
    f = os.path.join("200",uploaded_file)
    br.form.add_file(open(f) ,'text/plain', f)
    br.form.set_all_readonly(False)
    res = br.submit()
    content = res.read()
    with open("200_clean.html", "a") as f:
        f.write(content)

def main():

    for file in os.listdir("200"):
        upload_file(file)

if __name__ == '__main__':
    main()
但是在代码执行之后,我得到了以下错误:

Traceback (most recent call last):
  File "test.py", line 56, in <module>
    main()
  File "test.py", line 50, in main
    upload_file(file)
  File "test.py", line 40, in upload_file
    res = br.submit()
  File "/home/suleiman/Desktop/mechanize/_mechanize.py", line 541, in submit
    return self.open(self.click(*args, **kwds))
  File "/home/suleiman/Desktop/mechanize/_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/home/suleiman/Desktop/mechanize/_mechanize.py", line 255, in _mech_open
    raise response
mechanize._response.httperror_seek_wrapper: HTTP Error refresh: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
OK
回溯(最近一次呼叫最后一次):
文件“test.py”,第56行,在
main()
文件“test.py”,第50行,在main中
上传文件(文件)
上传文件中第40行的文件“test.py”
res=br.submit()
文件“/home/suleiman/Desktop/mechanize/_mechanize.py”,第541行,提交
返回self.open(self.click(*args,**kwds))
文件“/home/suleiman/Desktop/mechanize/_mechanize.py”,第203行,打开
返回self.\u mech\u open(url、数据、超时=超时)
文件“/home/suleiman/Desktop/mechanize/\u mechanize.py”,第255行,在“打开”中
提出回应
mechanize.\u response.httperror\u seek\u包装器:HTTP错误刷新:HTTP服务器返回一个重定向错误,该错误将导致无限循环。
最后一条30倍的错误消息是:
好啊

有谁能帮我解决这个问题吗?

我想问题在于你设置的mime类型
text/plain
。对于PDF,这应该是
application/PDF
。当我上传一个示例PDF时,您的代码和此更改对我有效

br.form.add_file
调用更改为如下所示:

br.form.add_file(open(f), 'application/pdf', f)

在我看来,似乎是网站的设计导致了这一问题。你认为我如何解决它?我不确定,抱歉。我认为这与你的代码无关,但我可能错了。