Python 元素不';不存在于抛出超时异常的网站中,反之亦然
我正试图从这个特定网站的excel工作表中获取大量参考文件的DOI和URL“https://git.macropus.org/citation-finder/"我的问题是,有些引用没有DOI和URL,这意味着对于这些引用,包含与DOI和URL相关信息的web元素在网站中不存在,因此,在这种情况下,它应该进入NoTouchElementFoundException,但是它进入TimeoutException,这是我的代码,请帮助我更正代码 更清楚地说,正如您从参考woody的Image1中看到的,web元素引用数据存在,而参考woodybase的Image2没有web元素引用数据,因此,在这种情况下,它应该进入NosTouchElementFoundException,而不是进入TimeoutException,最初我有一个输入excel文件,它如下图所示,这就是excel输出的样子Python 元素不';不存在于抛出超时异常的网站中,反之亦然,python,selenium-chromedriver,browser-automation,Python,Selenium Chromedriver,Browser Automation,我正试图从这个特定网站的excel工作表中获取大量参考文件的DOI和URL“https://git.macropus.org/citation-finder/"我的问题是,有些引用没有DOI和URL,这意味着对于这些引用,包含与DOI和URL相关信息的web元素在网站中不存在,因此,在这种情况下,它应该进入NoTouchElementFoundException,但是它进入TimeoutException,这是我的代码,请帮助我更正代码 更清楚地说,正如您从参考woody的Image1中看到的,
def getting_doi_and_url(References):
edf = pd.DataFrame([])
for text in References:
WINDOW_SIZE = "1920,1080"
chrome_options=webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=%s" % WINDOW_SIZE)
chrome_options.add_experimental_option("detach", True)
driver = webdriver.Chrome(executable_path='/home/pbaddam/anaconda3/bin/chromedriver', options=chrome_options)
driver.implicitly_wait(10)
#The website
driver.get("https://citation-finder.now.sh/")
#This look for the element 'textarea' by the way which is the input box where the reference text is entered or copy paste into the inputbox
driver.find_element_by_tag_name('textarea').send_keys(text)
#This locate the element "paper-button" which is search or enter button in that website and click on it.
driver.find_element_by_tag_name('button').click()
#After clicking the enter or search button for that particular reference it brings the text box which contains information about the DOIs, URLs and all other information.
#Now to locate text box "code" which contains the information about Doi's and Url's the below code is used.
endTime = datetime.datetime.now() + datetime.timedelta(seconds=15)
while True:
try:
copiedText = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, 'code'))).text
#copiedText = driver.find_element_by_tag_name('code').text
x = copiedText.split('\n')
list1=[]
list2=[]
for li in range(len(x)):
#print(x[li][0:2],x[li][6:])
list1.append(x[li][0:2])
list2.append(x[li][6:])
dictionary = dict(zip(list1, list2))
DOI = dictionary['DO']
URL = dictionary['UR']
some = pd.DataFrame({'References':text, 'DOI' : DOI, 'URL':URL},index=[0])
edf = edf.append(some)
time.sleep(2)
driver.quit()
except NoSuchElementException:
print(text,'DOI and URL not exist for this reference')
some=pd.DataFrame({'References':text, 'DOI' : 'Doi not available', 'URL':'Url not available'},index=[0])
edf = edf.append(some)
break
except TimeoutException:
print(text, 'Loading took too much time')
if datetime.datetime.now() >= endTime:
some=pd.DataFrame({'References':text, 'DOI' : 'Doi available', 'URL':'Url available'},index=[0])
edf = edf.append(some)
break
except WebDriverException:
print(text, 'error occured')
some=pd.DataFrame({'References':text, 'DOI' : 'Doi available', 'URL':'Url available'},index=[0])
edf = edf.append(some)
break
except:
print(text,"Page Not Loaded.")
if datetime.datetime.now() >= endTime:
some=pd.DataFrame({'References':text, 'DOI' : 'DOI available', 'URL':'URL available'},index=[0])
edf = edf.append(some)
break
edf.reset_index(inplace=True,drop=True)
return edf
df=pd.read_excel('/home/pbaddam/Documents/Trydata 5-6 (copy).xlsx', sheet_name='References2')
References=df['References']
x=getting_doi_and_url(References)
x