Selenium python获取复杂的表数据

Selenium python获取复杂的表数据,python,selenium,Python,Selenium,正在尝试使用selenium实现作业刮片的自动化,但存在以下问题: 链接:谷歌网站(全球): 我需要的是如何只从第二个标记中获取所有位置的名称和HREF(链接),并每次跳过第一个标记 将所有位置保存到.json文件,如下所示: { id:'1',标题:'location name','href':'location href' } 这将有助于您: from selenium import webdriver import time import json driver = webdrive

正在尝试使用selenium实现作业刮片的自动化,但存在以下问题: 链接:谷歌网站(全球):

  • 我需要的是如何只从第二个标记中获取所有位置的名称和HREF(链接),并每次跳过第一个标记

  • 将所有位置保存到.json文件,如下所示:

    { id:'1',标题:'location name','href':'location href' }

  • 这将有助于您:

    from selenium import webdriver
    import time
    import json
    
    driver = webdriver.Chrome()
    driver.get('https://www.indeed.com/worldwide')
    
    time.sleep(3)
    
    final = {}
    
    a_tags = driver.find_element_by_class_name('countries').find_elements_by_xpath('.//a')
    idx = 1
    for a in a_tags:
        if a.text != "":
            final.setdefault('id',[]).append(idx)
            final.setdefault('title',[]).append(a.text)
            final.setdefault('href',[]).append(a.get_attribute('href'))
            idx += 1
    print(final)
    driver.close()
    
    with open('D:\\jobs.json', 'w') as f:
        json.dump(final, f)
    
    输出:

    {'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62], 'title': ['Argentina', 'Australia', 'Austria', 'Bahrain', 'Belgium', 'Brazil', 'Canada', 'Chile', 'China', 'Colombia', 'Costa Rica', 'Czech Republic', 'Denmark', 'Ecuador', 'Egypt', 'Finland', 'France', 'Germany', 'Greece', 'Hong Kong', 'Hungary', 'India', 'Indonesia', 'Ireland', 'Israel', 'Italy', 'Japan', 'Kuwait', 'Luxembourg', 'Malaysia', 'Mexico', 'Morocco', 'Netherlands', 'New Zealand', 'Nigeria', 'Norway', 'Oman', 'Pakistan', 'Panama', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Qatar', 'Romania', 'Russia', 'Saudi Arabia', 'Singapore', 'South Africa', 'South Korea', 'Spain', 'Sweden', 'Switzerland', 'Taiwan', 'Thailand', 'Turkey', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'Uruguay', 'Venezuela', 'Vietnam'], 'href': ['https://ar.indeed.com/', 'https://au.indeed.com/', 'https://at.indeed.com/', 'https://bh.indeed.com/', 'https://be.indeed.com/', 'https://www.indeed.com.br/', 'https://ca.indeed.com/', 'https://cl.indeed.com/', 'https://cn.indeed.com/', 'https://co.indeed.com/', 'https://cr.indeed.com/', 'https://cz.indeed.com/', 'https://dk.indeed.com/', 'https://ec.indeed.com/', 'https://eg.indeed.com/', 'https://fi.indeed.com/', 'https://www.indeed.fr/', 'https://de.indeed.com/', 'https://gr.indeed.com/', 'https://hk.indeed.com/', 'https://hu.indeed.com/', 'https://www.indeed.co.in/', 'https://id.indeed.com/', 'https://ie.indeed.com/', 'https://il.indeed.com/', 'https://it.indeed.com/', 'https://jp.indeed.com/', 'https://kw.indeed.com/', 'https://lu.indeed.com/', 'https://malaysia.indeed.com/', 'https://www.indeed.com.mx/', 'https://ma.indeed.com/', 'https://www.indeed.nl/', 'https://nz.indeed.com/', 'https://ng.indeed.com/', 'https://no.indeed.com/', 'https://om.indeed.com/', 'https://pk.indeed.com/', 'https://pa.indeed.com/', 'https://pe.indeed.com/', 'https://ph.indeed.com/', 'https://pl.indeed.com/', 'https://pt.indeed.com/', 'https://qa.indeed.com/', 'https://ro.indeed.com/', 'https://ru.indeed.com/', 'https://sa.indeed.com/', 'https://sg.indeed.com/', 'https://za.indeed.com/', 'https://kr.indeed.com/', 'https://es.indeed.com/', 'https://se.indeed.com/', 'https://www.indeed.ch/', 'https://tw.indeed.com/', 'https://th.indeed.com/', 'https://tr.indeed.com/', 'https://ua.indeed.com/', 'https://www.indeed.ae/', 'https://www.indeed.co.uk/', 'https://uy.indeed.com/', 'https://ve.indeed.com/', 'https://vn.indeed.com/']}
    

    仅从第二个标记开始,每次都跳过第一个标记-这是什么意思?到目前为止,您尝试了什么?我的意思是,在每个元素中,您有两个标记,从第二个标记获取名称和href,您希望在
    标题中出现什么?类似于助理软件工程师?抱歉,标题是关键,值是“位置名称”