Python 数据刮取时的IP阻塞
您可以尝试将Python 数据刮取时的IP阻塞,python,web,screen-scraping,Python,Web,Screen Scraping,您可以尝试将用户代理更改为站点的输出。这会使服务器认为您是一个浏览器,并可能使其更为宽松。另外,请尝试将请求间隔得更长一些:可能是5-10秒。您可以尝试将用户代理更改为站点的输出。这会使服务器认为您是一个浏览器,并可能使其更为宽松。另外,请尝试将请求间隔再长一点:可能5-10秒。如果我通过电子邮件将代码发送给你?你能给我的密码加上吗?如果我给你发电子邮件的话?你能把我的代码加进去吗? I have the code here. I want to help that how can I set
用户代理更改为站点的输出。这会使服务器认为您是一个浏览器,并可能使其更为宽松。另外,请尝试将请求间隔得更长一些:可能是5-10秒。您可以尝试将用户代理更改为站点的输出。这会使服务器认为您是一个浏览器,并可能使其更为宽松。另外,请尝试将请求间隔再长一点:可能5-10秒。如果我通过电子邮件将代码发送给你?你能给我的密码加上吗?如果我给你发电子邮件的话?你能把我的代码加进去吗?
I have the code here. I want to help that how can I set proxy for this?
I have Api of proxy server, I just want to set up that on each call it call my api.
I have added paused but didn't worked.
<h1> This is source folder containing exccel file</h1>
source = "Data"
dir_list = os.listdir(source)
def geturl(searchtext):
query = searchtext
for url in search(query, tld="co.in", num=1, stop=1, pause=1):
return url
<h1> This is source folder containing exccel file</h1>
def writeurl(value, description, url, file,Col):
file.write( Col, 0, value )
file.write( Col, 1, description )
file.write( Col, 2, url )
for i in range(len(dir_list)):
filename = dir_list[i]
wbr = Workbook()
sheet1 = wbr.add_sheet( 'Sheet 1' )
wb = xlrd.open_workbook('Data/' + filename)
sheet = wb.sheet_by_index(0)
i=0
count=0
for x in range(sheet.nrows):
# if i<=10:
if x == 0:
writeurl( sheet.cell_value( x, 0 ), sheet.cell_value( x, 1 ), sheet.cell_value( x, 2 ), sheet1, x )
<h1> This is source folder containing exccel file</h1>
#time.sleep(randint(10, 120))
writeurl(sheet.cell_value(x, 0),sheet.cell_value(x, 1),Url,sheet1,x)
count=count+1
wbr.save('OutPut/' + filename.split('.')[0] + '.xls')
if count==45:
count=0
time.sleep(1200)
# CompanyList.append(UrlSearch.CompanyDescription(sheet.cell_value(x, 0), sheet.cell_value(x, 1), Url))
# i = i+1
<h1> This is source folder containing exccel file</h1>
# wbr.save('OutPut/' + filename.split('.')[0] + '.xls')