Drysrap访问在python中只能工作一次
我想访问循环中的页面 代码是:Drysrap访问在python中只能工作一次,python,dryscrape,Python,Dryscrape,我想访问循环中的页面 代码是: import dryscrape dryscrape.start_xvfb() sess = dryscrape.Session() url = 'http://192.168.1.5'; loop = 1 while loop < 100000: sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like
import dryscrape
dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5';
loop = 1
while loop < 100000:
sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
sess.set_attribute('auto_load_images', False)
sess.set_timeout(30)
sess.visit(url)
response = sess.body()
print(response)
print('loop:', loop)
sess.reset()
loop = loop + 1
导入干刮
drysrape.start_xvfb()
sess=drysrap.Session()
url='1〕http://192.168.1.5';
循环=1
当回路<100000时:
sess.set_header('user-agent'、'Mozilla/5.0(Windows NT 6.4;WOW64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/41.0.2225.0 Safari/537.36')
sess.set_属性('auto_load_images',False)
sess.set_超时(30)
sess.visit(url)
response=sess.body()
打印(答复)
打印('循环:',循环)
sess.reset()
循环=循环+1
根据输出,页面只访问一次,我不明白为什么?在2,3。。。。没有输出:
('loop:', 1)
<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title>Javascript scraping test</title>
</head>
<body>
<p id="intro-text">Yay! Supports javascript</p>
<script>
document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
</script>
</body></html>
('loop:', 2)
('loop:', 3)
('loop:', 4)
('loop:', 5)
('loop:', 6)
('loop:', 7)
('loop:',1)
Javascript抓取测试
耶!支持javascript
document.getElementById('intro-text')。innerHTML='Yay!支持javascript';
(“循环:”,2)
(“循环:”,3)
(“循环:”,4)
(“循环:”,5)
(“循环:”,6)
(“循环:”,7)
你能帮我吗?谢谢。将DryScrap及其依赖项更新到最新版本后,它现在可以正常工作了 版本如下: Drysrape-1.0、lxml-4.1.1、webkit-server-1.0、xvfbwrapper-0.2.9 守则:
import dryscrape
dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5/jsSupport.html';
loop = 1
while loop < 100000:
sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
sess.set_attribute('auto_load_images', False)
sess.set_timeout(30)
sess.visit(url)
response = sess.body()
print(response)
print('loop:', loop)
sess.reset()
loop = loop + 1
导入干刮
drysrape.start_xvfb()
sess=drysrap.Session()
url='1〕http://192.168.1.5/jsSupport.html';
循环=1
当回路<100000时:
sess.set_header('user-agent'、'Mozilla/5.0(Windows NT 6.4;WOW64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/41.0.2225.0 Safari/537.36')
sess.set_属性('auto_load_images',False)
sess.set_超时(30)
sess.visit(url)
response=sess.body()
打印(答复)
打印('循环:',循环)
sess.reset()
循环=循环+1
输出:
'loop:' 1
<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title>Javascript scraping test</title>
</head>
<body>
<p id="intro-text">Yay! Supports javascript</p>
<script>
document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
</script>
</body></html>
'loop:' 2
<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title>Javascript scraping test</title>
</head>
<body>
<p id="intro-text">Yay! Supports javascript</p>
<script>
document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
</script>
</body></html>
'loop:' 3
<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title>Javascript scraping test</title>
</head>
<body>
<p id="intro-text">Yay! Supports javascript</p>
<script>
document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
</script>
</body></html>
循环:'1
Javascript抓取测试
耶!支持javascript
document.getElementById('intro-text')。innerHTML='Yay!支持javascript'; '循环:'2 Javascript抓取测试耶!支持javascript
document.getElementById('intro-text')。innerHTML='Yay!支持javascript'; '循环:'3 Javascript抓取测试耶!支持javascript
document.getElementById('intro-text')。innerHTML='Yay!支持javascript'; 如果您不能更新模块,或者不想更新,那么快速修复程序将访问循环末尾的另一个页面import dryscrape
dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5/jsSupport.html';
otherurl = "http://192.168.1.5/test"
loop = 1
while loop < 100000:
sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
sess.set_attribute('auto_load_images', False)
sess.set_timeout(30)
sess.visit(url)
response = sess.body()
print(response)
print('loop:', loop)
sess.reset()
loop = loop + 1
sess.visit(otherurl) #Visits the other url, so that when sess.visit(url) is called, it is forced to visit the page again.
导入干刮
drysrape.start_xvfb()
sess=drysrap.Session()
url='1〕http://192.168.1.5/jsSupport.html';
其他URL=”http://192.168.1.5/test"
循环=1
当回路<100000时:
sess.set_header('user-agent'、'Mozilla/5.0(Windows NT 6.4;WOW64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/41.0.2225.0 Safari/537.36')
sess.set_属性('auto_load_images',False)
sess.set_超时(30)
sess.visit(url)
response=sess.body()
打印(答复)
打印('循环:',循环)
sess.reset()
循环=循环+1
sess.visit(otherurl)#访问另一个url,因此当调用sess.visit(url)时,它将被迫再次访问该页面。
我也有同样的问题我用def解决了这个问题试试这个
def fb(user,pwd)
import dryscrape as d
d.start_xvfb()
Br = d.Session()
#every time it creat a new session
Br.visit('http://fb.com')
Br.at_xpath('//*[@name = "email"]').set(user)
Br.at_xpath('//*[@name = "pass"]').set(pwd)
Br.at_xpath('//*[@name = "login"]').click()
#......Now Do Something you want.....#
然后在制作def后,现在使用此
fb('my@account.com','password')
然后自动登录自己的用户这个命令100次没有错误
请阅读并回答我的问题听起来Drysrape足够聪明,可以跳过它已经访问过的URL。@JohnGordon是否有任何方法可以清除历史记录,或者类似的想法?您到底想看到什么输出?看起来您正在使用URL设置会话,而dryscrape在会话中只会刮取一次URL。如果要再次刮取相同的URL,则必须重新设置会话。@Sheshnath我每次都需要启用javascript的输出。每次访问后尝试打印状态代码:
print(sess.status\u code())