Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/github/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Drysrap访问在python中只能工作一次_Python_Dryscrape - Fatal编程技术网

Drysrap访问在python中只能工作一次

Drysrap访问在python中只能工作一次,python,dryscrape,Python,Dryscrape,我想访问循环中的页面 代码是: import dryscrape dryscrape.start_xvfb() sess = dryscrape.Session() url = 'http://192.168.1.5'; loop = 1 while loop < 100000: sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like

我想访问循环中的页面

代码是:

import dryscrape

dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5';
loop = 1
while loop < 100000: 

    sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
    sess.set_attribute('auto_load_images', False)
    sess.set_timeout(30)
    sess.visit(url)
    response = sess.body()
    print(response)
    print('loop:', loop)
    sess.reset()
    loop = loop + 1 
导入干刮
drysrape.start_xvfb()
sess=drysrap.Session()
url='1〕http://192.168.1.5';
循环=1
当回路<100000时:
sess.set_header('user-agent'、'Mozilla/5.0(Windows NT 6.4;WOW64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/41.0.2225.0 Safari/537.36')
sess.set_属性('auto_load_images',False)
sess.set_超时(30)
sess.visit(url)
response=sess.body()
打印(答复)
打印('循环:',循环)
sess.reset()
循环=循环+1
根据输出,页面只访问一次,我不明白为什么?在2,3。。。。没有输出:

('loop:', 1)
<!DOCTYPE html><html><head>
  <meta charset="utf-8">
  <title>Javascript scraping test</title>
</head>
<body>
  <p id="intro-text">Yay! Supports javascript</p>
  <script>
     document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
  </script> 

</body></html>
('loop:', 2)

('loop:', 3)

('loop:', 4)

('loop:', 5)

('loop:', 6)

('loop:', 7)
('loop:',1)
Javascript抓取测试

耶!支持javascript

document.getElementById('intro-text')。innerHTML='Yay!支持javascript'; (“循环:”,2) (“循环:”,3) (“循环:”,4) (“循环:”,5) (“循环:”,6) (“循环:”,7)

你能帮我吗?谢谢。

将DryScrap及其依赖项更新到最新版本后,它现在可以正常工作了

版本如下: Drysrape-1.0、lxml-4.1.1、webkit-server-1.0、xvfbwrapper-0.2.9

守则:

import dryscrape
dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5/jsSupport.html';
loop = 1
while loop < 100000:

    sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
    sess.set_attribute('auto_load_images', False)
    sess.set_timeout(30)
    sess.visit(url)
    response = sess.body()
    print(response)
    print('loop:', loop)
    sess.reset()
    loop = loop + 1
导入干刮
drysrape.start_xvfb()
sess=drysrap.Session()
url='1〕http://192.168.1.5/jsSupport.html';
循环=1
当回路<100000时:
sess.set_header('user-agent'、'Mozilla/5.0(Windows NT 6.4;WOW64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/41.0.2225.0 Safari/537.36')
sess.set_属性('auto_load_images',False)
sess.set_超时(30)
sess.visit(url)
response=sess.body()
打印(答复)
打印('循环:',循环)
sess.reset()
循环=循环+1
输出:

   'loop:' 1
   <!DOCTYPE html><html><head>
     <meta charset="utf-8">
     <title>Javascript scraping test</title>
   </head>
   <body>
     <p id="intro-text">Yay! Supports javascript</p>
     <script>
        document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
     </script> 

   </body></html>
   'loop:' 2
   <!DOCTYPE html><html><head>
     <meta charset="utf-8">
     <title>Javascript scraping test</title>
   </head>
   <body>
     <p id="intro-text">Yay! Supports javascript</p>
     <script>
        document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
     </script> 

   </body></html>
   'loop:' 3
   <!DOCTYPE html><html><head>
     <meta charset="utf-8">
     <title>Javascript scraping test</title>
   </head>
   <body>
     <p id="intro-text">Yay! Supports javascript</p>
     <script>
        document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
     </script> 

   </body></html>
循环:'1 Javascript抓取测试

耶!支持javascript

document.getElementById('intro-text')。innerHTML='Yay!支持javascript'; '循环:'2 Javascript抓取测试

耶!支持javascript

document.getElementById('intro-text')。innerHTML='Yay!支持javascript'; '循环:'3 Javascript抓取测试

耶!支持javascript

document.getElementById('intro-text')。innerHTML='Yay!支持javascript'; 如果您不能更新模块,或者不想更新,那么快速修复程序将访问循环末尾的另一个页面

import dryscrape
dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5/jsSupport.html';
otherurl = "http://192.168.1.5/test"
loop = 1
while loop < 100000:

    sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
    sess.set_attribute('auto_load_images', False)
    sess.set_timeout(30)
    sess.visit(url)
    response = sess.body()
    print(response)
    print('loop:', loop)
    sess.reset()
    loop = loop + 1
    sess.visit(otherurl) #Visits the other url, so that when sess.visit(url) is called, it is forced to visit the page again.
导入干刮
drysrape.start_xvfb()
sess=drysrap.Session()
url='1〕http://192.168.1.5/jsSupport.html';
其他URL=”http://192.168.1.5/test"
循环=1
当回路<100000时:
sess.set_header('user-agent'、'Mozilla/5.0(Windows NT 6.4;WOW64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/41.0.2225.0 Safari/537.36')
sess.set_属性('auto_load_images',False)
sess.set_超时(30)
sess.visit(url)
response=sess.body()
打印(答复)
打印('循环:',循环)
sess.reset()
循环=循环+1
sess.visit(otherurl)#访问另一个url,因此当调用sess.visit(url)时,它将被迫再次访问该页面。

我也有同样的问题我用def解决了这个问题试试这个

def fb(user,pwd)
 import dryscrape as d
 d.start_xvfb()
 Br = d.Session()
 #every time it creat a new session
 Br.visit('http://fb.com')
 Br.at_xpath('//*[@name = "email"]').set(user)
 Br.at_xpath('//*[@name = "pass"]').set(pwd)
 Br.at_xpath('//*[@name = "login"]').click()
 #......Now Do Something you want.....#
然后在制作def后,现在使用此

 fb('my@account.com','password')
然后自动登录自己的用户这个命令100次没有错误


请阅读并回答我的问题

听起来Drysrape足够聪明,可以跳过它已经访问过的URL。@JohnGordon是否有任何方法可以清除历史记录,或者类似的想法?您到底想看到什么输出?看起来您正在使用URL设置会话,而dryscrape在会话中只会刮取一次URL。如果要再次刮取相同的URL,则必须重新设置会话。@Sheshnath我每次都需要启用javascript的输出。每次访问后尝试打印状态代码:
print(sess.status\u code())