Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/css/41.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x Web刮取由Javascript函数创建的表_Python 3.x_Selenium_Selenium Webdriver_Automation_Geckodriver - Fatal编程技术网

Python 3.x Web刮取由Javascript函数创建的表

Python 3.x Web刮取由Javascript函数创建的表,python-3.x,selenium,selenium-webdriver,automation,geckodriver,Python 3.x,Selenium,Selenium Webdriver,Automation,Geckodriver,我正试图在下面的链接中浏览研究报告表 该表使用Javascript动态创建了内容。 我尝试使用selenium,但间歇性地出现StaleElementException。 请帮我做同样的事情 我想检索表中的所有行并将它们存储在本地数据库中。 以下是我在selenium中尝试的内容 import selenium.webdriver as webdriver url = 'https://clinicaltrials.gov/ct2/results?cond=COVID&term=&

我正试图在下面的链接中浏览研究报告表

该表使用Javascript动态创建了内容。 我尝试使用selenium,但间歇性地出现StaleElementException。 请帮我做同样的事情

我想检索表中的所有行并将它们存储在本地数据库中。 以下是我在selenium中尝试的内容

 import selenium.webdriver as webdriver
 url = 'https://clinicaltrials.gov/ct2/results?cond=COVID&term=&cntry=&state=&city=&dist='
 driver=webdriver.Firefox()
 #driver.implicitly_wait(30)
 driver.get(url)
 data = []
 for tr in driver.find_elements_by_xpath('//table[@id="theDataTable"]//tbody//tr'):
  tds = tr.find_elements_by_tag_name('td')
   if tds:
    for td in tds:
     print(td.text)
     if td.text not in data:
      data.append(td.text)          
 driver.quit()
 print('*********************************************************************')
 print(data)
进一步从我将存储在DB中的'data'变量中提取数据

我对selenium和网络抓取还不熟悉,我想点击“研究标题”栏中的每个链接,并从该页面中提取每项研究的数据

我想要一些建议,以避免/处理过时的元素异常或SeleniumWebDriver的替代方案。
提前谢谢

我尝试了我的以下代码,所有数据都正确存储。你能试试吗

代码

driver.get("https://clinicaltrials.gov/ct2/results?cond=COVID&term=&cntry=&state=&city=&dist=")


time.sleep(5)

array = []

flag = True
next_counter = 0

time.sleep(4)

select = Select(driver.find_element_by_name('theDataTable_length'))
select.select_by_value('100')

time.sleep(5)

while flag == True:
    if next_counter == 13:
        print("Stoped")
    else:
        item = driver.find_elements_by_tag_name("tbody")[1]
        rows = item.find_elements_by_tag_name('tr')

        for x in range(len(rows)):
            for i in range(7):
                array.insert(x, rows[x].find_elements_by_tag_name('td')[i].text)
                print(rows[x].find_elements_by_tag_name('td')[i].text)

        time.sleep(5)
    next = driver.find_element_by_id('theDataTable_next')
    next.click()
    next_counter = next_counter + 1


    time.sleep(7)
输出

1

Not yet recruiting
NEW
Indirect Endovenous Systemic Ozone for New Coronavirus Disease (COVID19) in Non-intubated Patients
COVID
Other: Systemic indirect endovenous ozone therapy
SEOT
Valencia, Spain
2

Recruiting
NEW
Prediction of Clinical Course in COVID19 Patients
COVID 19
Other: CT-Scan
Chu Saint-Etienne
Saint-Étienne, France
3

Not yet recruiting
NEW
Risks of COVID19 in the Pregnant Population
COVID19
Other: Biospecimen collection
Mayo Clinic in Rochester
Rochester, Minnesota, United States
4

Recruiting
NEW
Saved From COVID-19
COVID
Drug: Chloroquine
Drug: Placebo oral tablet
Columbia University Irving Medical Center/NYP
New York, New York, United States
5

Recruiting
NEW
Efficacy of Convalescent Plasma Therapy in Severely Sick COVID-19 Patients
COVID
Drug: Convalescent Plasma Transfusion
Other: Supportive Care
Drug: Random Donor Plasma
Maulana Azad medical College
New Delhi, Delhi, India
Institute of Liver and Biliary Sciences
New Delhi, Delhi, India
6

Not yet recruiting
NEW
A Real-life Experience on Treatment of Patients With COVID 19
COVID
Drug: Chloroquine
Drug: Favipiravir
Drug: Nitazoxanide
(and 3 more...)
Tanta university hospital
Tanta, Egypt
7

Recruiting
International COVID19 Clinical Evaluation Registry,
COVID 19
Combination Product: Observational (registry)
Hospital Lclinico San Carlos
Madrid, Spain
8

Completed
NEW
AiM COVID for Covid 19 Tracking and Prediction
COVID 19
Other: No Intervention
Aarogyam (UK)
Leicester, United Kingdom
9

Recruiting
NEW
Establishing a COVID-19 Prospective Cohort for Identification of Secondary HLH
COVID

Department of nephrology, Klinikum rechts der Isar
München, Bavaria, Germany
10

Recruiting
NEW
Max Ivermectin- COVID 19 Study Versus Standard of Care Treatment for COVID 19 Cases. A Pilot Study
COVID
Drug: Ivermectin
Max Super Speciality hospital, Saket (A unit of Devki Devi Foundation)
New Delhi, Delhi, India
我的代码正在执行以下逻辑步骤:

  • 首先,为了节省检索数据的时间,我选择了查看100个结果而不是10个结果的选项
  • 其次,我阅读了页面(100)的所有结果,当我完成后,我点击下一页符号。然后我有一个sleep命令等待4秒钟(你可以用更好的方式来做,但我这样做是为了给你一些快速的东西-你必须插入waituntilementisvisible概念)
  • 单击下一页按钮后,我再次保存结果(100)
  • 此功能将一直运行,直到标志变为False。当下一个_计数器为14(大于最大值13)时,它将为false。数字13实际上是1300(结果)除以100(每页的最大结果数),因此1300/100=13。所以我们有13页

编辑和传输数据是您可以管理的,不需要Selenium知识或与web自动化相关的知识。这是一个100%的Python概念。

我尝试了以下代码,所有数据都正确存储。你能试试吗

代码

driver.get("https://clinicaltrials.gov/ct2/results?cond=COVID&term=&cntry=&state=&city=&dist=")


time.sleep(5)

array = []

flag = True
next_counter = 0

time.sleep(4)

select = Select(driver.find_element_by_name('theDataTable_length'))
select.select_by_value('100')

time.sleep(5)

while flag == True:
    if next_counter == 13:
        print("Stoped")
    else:
        item = driver.find_elements_by_tag_name("tbody")[1]
        rows = item.find_elements_by_tag_name('tr')

        for x in range(len(rows)):
            for i in range(7):
                array.insert(x, rows[x].find_elements_by_tag_name('td')[i].text)
                print(rows[x].find_elements_by_tag_name('td')[i].text)

        time.sleep(5)
    next = driver.find_element_by_id('theDataTable_next')
    next.click()
    next_counter = next_counter + 1


    time.sleep(7)
输出

1

Not yet recruiting
NEW
Indirect Endovenous Systemic Ozone for New Coronavirus Disease (COVID19) in Non-intubated Patients
COVID
Other: Systemic indirect endovenous ozone therapy
SEOT
Valencia, Spain
2

Recruiting
NEW
Prediction of Clinical Course in COVID19 Patients
COVID 19
Other: CT-Scan
Chu Saint-Etienne
Saint-Étienne, France
3

Not yet recruiting
NEW
Risks of COVID19 in the Pregnant Population
COVID19
Other: Biospecimen collection
Mayo Clinic in Rochester
Rochester, Minnesota, United States
4

Recruiting
NEW
Saved From COVID-19
COVID
Drug: Chloroquine
Drug: Placebo oral tablet
Columbia University Irving Medical Center/NYP
New York, New York, United States
5

Recruiting
NEW
Efficacy of Convalescent Plasma Therapy in Severely Sick COVID-19 Patients
COVID
Drug: Convalescent Plasma Transfusion
Other: Supportive Care
Drug: Random Donor Plasma
Maulana Azad medical College
New Delhi, Delhi, India
Institute of Liver and Biliary Sciences
New Delhi, Delhi, India
6

Not yet recruiting
NEW
A Real-life Experience on Treatment of Patients With COVID 19
COVID
Drug: Chloroquine
Drug: Favipiravir
Drug: Nitazoxanide
(and 3 more...)
Tanta university hospital
Tanta, Egypt
7

Recruiting
International COVID19 Clinical Evaluation Registry,
COVID 19
Combination Product: Observational (registry)
Hospital Lclinico San Carlos
Madrid, Spain
8

Completed
NEW
AiM COVID for Covid 19 Tracking and Prediction
COVID 19
Other: No Intervention
Aarogyam (UK)
Leicester, United Kingdom
9

Recruiting
NEW
Establishing a COVID-19 Prospective Cohort for Identification of Secondary HLH
COVID

Department of nephrology, Klinikum rechts der Isar
München, Bavaria, Germany
10

Recruiting
NEW
Max Ivermectin- COVID 19 Study Versus Standard of Care Treatment for COVID 19 Cases. A Pilot Study
COVID
Drug: Ivermectin
Max Super Speciality hospital, Saket (A unit of Devki Devi Foundation)
New Delhi, Delhi, India
我的代码正在执行以下逻辑步骤:

  • 首先,为了节省检索数据的时间,我选择了查看100个结果而不是10个结果的选项
  • 其次,我阅读了页面(100)的所有结果,当我完成后,我点击下一页符号。然后我有一个sleep命令等待4秒钟(你可以用更好的方式来做,但我这样做是为了给你一些快速的东西-你必须插入waituntilementisvisible概念)
  • 单击下一页按钮后,我再次保存结果(100)
  • 此功能将一直运行,直到标志变为False。当下一个_计数器为14(大于最大值13)时,它将为false。数字13实际上是1300(结果)除以100(每页的最大结果数),因此1300/100=13。所以我们有13页

编辑和传输数据是您可以管理的,不需要Selenium知识或与web自动化相关的知识。这是一个100%的Python概念。

谢谢@dpapadopoulos的回答!我试过了,但还是有点过时Exception@AkshayPhadnis StaleElementException发生在dom上更改webelement并且对该webelement的初始引用丢失时。因此,请重试该代码。我更新了。谢谢@dpapadopoulos的回答!我试过了,但还是有点过时Exception@AkshayPhadnis StaleElementException发生在dom上更改webelement并且对该webelement的初始引用丢失时。因此,请重试该代码。我更新了。你能查一下我的答案吗?我的代码正在运行并收集所有数据(所有1300项研究),你能检查我的答案吗?我的代码正在运行并收集所有数据(所有1300项研究)