Python 如何使这两个脚本一起工作？_Python_Web Scraping_Google Sheets_Automation_Implementation

Python 如何使这两个脚本一起工作？

python web-scraping google-sheets automation

Python 如何使这两个脚本一起工作？,python,web-scraping,google-sheets,automation,implementation,Python,Web Scraping,Google Sheets,Automation,Implementation,我有两个代码，它们可以刮取一个基本上是搜索引擎的页面。它从谷歌表单中读取信息，搜索URL，获取一些信息，然后将它们写在表单上问题是我使用了两个代码，第二个是将信息写入google表单的代码第一个代码执行所有搜索，然后在每次搜索完成后，第二个代码开始将获取的信息写入google表单我想做的是搜索一个然后写，搜索第二个然后写。。。。。我尝试过不同的方法，但这是我的第一段代码，也是我第一次编程，所以我正在努力解决这个问题 k_bot.py（刮网器）从selenium导入webdriver 从s

我有两个代码，它们可以刮取一个基本上是搜索引擎的页面。它从谷歌表单中读取信息，搜索URL，获取一些信息，然后将它们写在表单上

问题是我使用了两个代码，第二个是将信息写入google表单的代码

第一个代码执行所有搜索，然后在每次搜索完成后，第二个代码开始将获取的信息写入google表单

我想做的是搜索一个然后写，搜索第二个然后写。。。。。我尝试过不同的方法，但这是我的第一段代码，也是我第一次编程，所以我正在努力解决这个问题

k_bot.py（刮网器）

从selenium导入webdriver
从selenium.webdriver.firefox.options导入选项
从selenium.common.Exception导入NoTouchElementException
从selenium.common.exceptions导入UnexpectedAlertPresentException
进口稀土
导入时间
类BOT（对象）：
定义初始化（自我，cpfs）：
#URL的设置
self.bot\u url='1http://www.3kplus.net/'
self.cpfs=cpfs
self.profile=webdriver.FirefoxProfile（）
self.options=options（）
self.driver=webdriver.Firefox（Firefox\u profile=self.profile，
可执行文件\u path='C:\\Users\MOISA\Documents\geckodriver.exe'，
选项=self.options）
#导航到URL
self.driver.get（self.bot\uURL）
login\u box=self.driver。通过xpath（'/*[@id=“login”]/div[3]/div[2]/div[2]/input'）查找\u元素
登录框。发送密钥（'daiane'）
通过xpath（'/*[@id=“login”]/div[3]/div[2]/div[3]/input'）查找元素
传递箱。发送钥匙（'789456'）
login\u btn=self.driver。通过xpath（'/*[@id=“login”]/div[3]/div[2]/button'）查找\u元素
登录\u btn.单击（）
def搜索_cpfs（自我）：
#搜索客户代码列表（电子表格的第一列），并获取这些信息
nomes=[]
idades=[]
实益=[]
concessoes=[]
萨拉里奥斯=[]
班科斯=[]
银行卡=[]
consigs=[]
卡片=[]
对于self.cpfs中的cpf：
印刷品（f“Procurando{cpf}.”）
self.driver.get（self.bot\uURL）
self.delay=3秒
#搜索客户端代码
尝试：
cpf_input=self.driver.find_element_by_xpath（'/*[@id=“search”]/div/div[1]/input'））
cpf_输入。发送_键（cpf）
cpf_btn=self.driver.find_element_by_xpath（'/*[@id=“search”]/div/div[2]/button'））
cpf_btn.单击（）
cpf_btn.单击（）
时间。睡眠（2）
#客户端代码是有效的
#客户端代码有通知
如果self.driver.find_element_by_xpath（'/*[@id=“notification”]'）。是否显示（）
nome=self.driver.find_元素(
“/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/h2”）。text
idade=self.driver.find_元素(
“/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/ul/li[2]”。text
年龄=再搜索（r'\（.*Anos'，idade）.组（1）
bengiio=self.driver.find_元素(
“/html/body/main[1]/div[1]/div[1]/div[1]/div[3]/div[5]/span/b”）。text
concessao=self.driver.find_元素(
“/html/body/main[1]/div[1]/div[1]/div[1]/div[3]/div[2]/span”）。text
salario=self.driver.find_元素(
“/html/body/main[1]/div[1]/div[2]/div/div[3]/div[1]/div[1]/span”）。text
bancos=self.driver.find_element_by_xpath（'/*[@id=“loans”]'）。text
bancosw=re.findall（r’（？简而言之：您应该将self.cpfs:

中的cpf从第一个脚本移动到第二个脚本

在第一个脚本中，您应该具有函数

def search_cpfs(self, cpf):

它只搜索一个

cpf

因此，您必须在self.cpfs:中删除

中的cpf。cpfs:

从

search\u cpfs（）

运行

Bot（）

而不运行

cpfs

，但在运行

search\u cpfs（）

时使用single

cpf

在第二个脚本中，您应该使用这个

for

-loop以不同的值运行

搜索\u cpf（cpf）

    bot_url = BOT()

    for cpf in cpfs:
       ...variables... = bot_url.search_cpfs(cpf)

       # UPDATE THE SHEET
       print("Atualizando...")

编辑：

在

class BOT（）

中，您必须使用

\uuuu init\uuuuuu（self）

而不使用

cpfs

和

self.cpfs=cpfs

因为

search\u cpf（self，cpf）：

只搜索一个项目，那么您可以使用name

search\u cpf

而不使用

（但不是强制性的），并且您不需要列表

    nomes = []
    idades = []
    beneficios = []
    concessoes = []
    salarios = []
    bancoss = []
    bancoscard = []
    consigs = []
    cards = []

但您可以直接返回结果

return nome, idade, beneficio, concessoe, salario, bancos, bancocard, consig, card

在

process\u cpf\u列表中

必须为-循环加入两个


for cpf in cpfs:
    # code 1
    nomes = ...

    for cpfs in range(len(nomes)):
        # code 2
        self.sheet.update_cell(cpfs + 2, self.nome_col, nomes[cpfs])

为

-循环创建一个


for row, cpf in enumerate(cpfs): 
    # code 1
    nomes, idades, ... = BOT.search_cpfs()

    # code 2
    self.sheet.update_cell(row + 2, self.nome_col, nomes[row])
    self.sheet.update_cell(row + 2, self.age_col, idades[row])

我将使用namerow
而不是secondcpfs
使其更具可读性
因为search\u cpfs
给我一个结果列表（而不是列表列表），所以我可以使用nome
而不是nomes[cpfs]
然后我可以使用row=row+2

for row, cpf in enumerate(cpfs): 
    # code 1
    nome, idade, ... = BOT.search_cpfs(cpf)

    # code 2
    row = row + 2
    self.sheet.update_cell(row, self.nome_col, nome)
    self.sheet.update_cell(row, self.age_col, idade)

我甚至可以使用enumerate（cpfs，2）
而不是row=row+2

for row, cpf in enumerate(cpfs): 
    # code 1
    nome, idade, ... = BOT.search_cpfs(cpf)

    # code 2
    row = row + 2
    self.sheet.update_cell(row, self.nome_col, nome)
    self.sheet.update_cell(row, self.age_col, idade)


完整代码-没有测试它
k_bot.py
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import UnexpectedAlertPresentException

import re
import time


class BOT(object):

    def __init__(self): 

        # SETUP FOR URL
        self.bot_url = 'http://www.3kplus.net/'

        self.profile = webdriver.FirefoxProfile()
        self.options = Options()
        self.driver = webdriver.Firefox(firefox_profile=self.profile,
                                        executable_path='C:\\Users\MOISA\Documents\geckodriver.exe',
                                        options=self.options)

        # NAVIGATE TO URL
        self.driver.get(self.bot_url)

        login_box = self.driver.find_element_by_xpath('//*[@id="login"]/div[3]/div[2]/div[2]/input')
        login_box.send_keys('daiane')

        pass_box = self.driver.find_element_by_xpath('//*[@id="login"]/div[3]/div[2]/div[3]/input')
        pass_box.send_keys('789456')

        login_btn = self.driver.find_element_by_xpath('//*[@id="login"]/div[3]/div[2]/button')
        login_btn.click()

    def search_cpf(self, cpf):

        print(f"Procurando {cpf}.")
        self.driver.get(self.bot_url)
        self.delay = 3  # seconds

        # SEARCH CLIENT CODE
        try:
            cpf_input = self.driver.find_element_by_xpath('//*[@id="search"]/div/div[1]/input')
            cpf_input.send_keys(cpf)

            cpf_btn = self.driver.find_element_by_xpath('//*[@id="search"]/div/div[2]/button')
            cpf_btn.click()
            cpf_btn.click()

            time.sleep(2)

            # CLIENT CODE IS VALID
            # CLIENT CODE HAVE NOTIFICATION
            if self.driver.find_element_by_xpath('//*[@id="notification"]').is_displayed():

                nome = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/h2").text
                idade = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/ul/li[2]").text
                age = re.search(r'\((.*?)Anos', idade).group(1)
                beneficio = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[1]/div[3]/div[5]/span/b   ").text
                concessao = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[1]/div[3]/div[2]/span").text
                salario = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[2]/div/div[3]/div[1]/div[1]/span").text
                bancos = self.driver.find_element_by_xpath('//*[@id="loans"]').text
                bancosw = re.findall(r'(?<=Banco )(\w+)', bancos)
                bankslist = ', '.join(bancosw)
                bancocard = self.driver.find_element_by_xpath('//*[@id="cards"]').text
                bcardw = re.findall(r'(?<=Banco )(\w+)', bancocard)
                bcardlist = ', '.join(bcardw)
                consig = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[3]/div[2]/span").text
                card = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[3]/div[3]/span").text

                print('CPF Valido')
                print('NOTIFICACAO')
                print(nome, age, beneficio, concessao, salario, bankslist, bcardlist, consig, card)

            # CLIENT CODE DOESN'T HAVE NOTIFICATION
            else:
                nome = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/h2").text
                idade = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/ul/li[2]").text
                age = re.search(r'\((.*?)Anos', idade).group(1)
                beneficio = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/div[5]/span/b").text
                concessao = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/div[2]/span").text
                salario = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[2]/div/div[3]/div[1]/div[1]/span").text
                bancos = self.driver.find_element_by_xpath('//*[@id="loans"]').text
                bancosw = re.findall(r'(?<=Banco )(\w+)', bancos)
                bankslist = ', '.join(bancosw)
                bancocard = self.driver.find_element_by_xpath('//*[@id="cards"]').text
                bcardw = re.findall(r'(?<=Banco )(\w+)', bancocard)
                bcardlist = ', '.join(bcardw)
                consig = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[3]/div[2]/span").text
                card = self.driver.find_element_by_xpath(
                    "/html/body/main[1]/div[1]/div[1]/div[3]/div[3]/span").text

                print('CPF Valido')
                print(nome, age, beneficio, concessao, salario, bankslist, bcardlist, consig, card)

        # IF THE CLIENT CODE IS WRONG
        except (NoSuchElementException, UnexpectedAlertPresentException):
            nome = ''
            idade = ''
            age = ''
            concessao = ''
            salario = ''
            bancos = ''
            bancosw = ''
            bankslist = ''
            bancocard = ''
            bcardw = ''
            bcardlist = ''
            consig = ''
            card = ''
            print('CPF Invalido')

         return nome, idade, beneficio, concessoe, salario, bancos, bancocard, consig, card

from k_bot import BOT
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import time
from gspread.exceptions import APIError


class CpfSearch(object):

    def __init__(self, spreadsheet_name):
        self.cpf_col = 1
        self.nome_col = 2
        self.age_col = 3
        self.beneficio_col = 4
        self.concessao_col = 5
        self.salario_col = 6
        self.bancos_col = 7
        self.bancocard_col = 9
        self.consig_col = 10
        self.card_col = 16

        scope = ['https://www.googleapis.com/auth/spreadsheets',
                 'https://www.googleapis.com/auth/drive.readonly']

        creds = ServiceAccountCredentials.from_json_keyfile_name('CONSULTAS.json', scope)

        client = gspread.authorize(creds)

        self.sheet = client.open(spreadsheet_name).sheet1

    def process_cpf_list(self):

        # SKIP OVER COLUMN HEADING IN THE SPREADSHEET
        cpfs = self.sheet.col_values(self.cpf_col)[1:]

        bot_url = BOT()

        for row, cpf in enumerate(cpfs): # if you use `enumerate(cpfs, 2)` then you don't need `row = row + 2`
            #old version gives many results 
            # nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards = bot_url.search_cpfs()

            # new version gives only one result
            nome, idade, beneficio, concessoe, salario, bancos, bancocard, consig, card = bot_url.search_cpfs(cpf)

            # UPDATE THE SHEET
            print("Atualizando...")

            try:
                row = row + 2
                self.sheet.update_cell(row, self.nome_col, nome)
                self.sheet.update_cell(row, self.age_col, idade)
                self.sheet.update_cell(row, self.beneficio_col, beneficio)
                self.sheet.update_cell(row, self.concessao_col, concessoe)
                self.sheet.update_cell(row, self.salario_col, salario)
                self.sheet.update_cell(row, self.bancos_col, bancos)
                self.sheet.update_cell(row, self.bancocard_col, bancocard)
                self.sheet.update_cell(row, self.consig_col, consig)
                self.sheet.update_cell(row, self.card_col, card)
                print('Cliente atualizado!')
            except APIError:
                print('Esperando para atualizar...')
                time.sleep(100)
                continue


cpf_updater = CpfSearch('TESTE')
cpf_updater.process_cpf_list()

不要对self中的cpf使用。cpfs:
内部def search\u cpfs（self）：
但使用单个元素运行它-sear
from k_bot import BOT
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import time
from gspread.exceptions import APIError


class CpfSearch(object):

    def __init__(self, spreadsheet_name):
        self.cpf_col = 1
        self.nome_col = 2
        self.age_col = 3
        self.beneficio_col = 4
        self.concessao_col = 5
        self.salario_col = 6
        self.bancos_col = 7
        self.bancocard_col = 9
        self.consig_col = 10
        self.card_col = 16

        scope = ['https://www.googleapis.com/auth/spreadsheets',
                 'https://www.googleapis.com/auth/drive.readonly']

        creds = ServiceAccountCredentials.from_json_keyfile_name('CONSULTAS.json', scope)

        client = gspread.authorize(creds)

        self.sheet = client.open(spreadsheet_name).sheet1

    def process_cpf_list(self):

        # SKIP OVER COLUMN HEADING IN THE SPREADSHEET
        cpfs = self.sheet.col_values(self.cpf_col)[1:]

        bot_url = BOT()

        for row, cpf in enumerate(cpfs): # if you use `enumerate(cpfs, 2)` then you don't need `row = row + 2`
            #old version gives many results 
            # nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards = bot_url.search_cpfs()

            # new version gives only one result
            nome, idade, beneficio, concessoe, salario, bancos, bancocard, consig, card = bot_url.search_cpfs(cpf)

            # UPDATE THE SHEET
            print("Atualizando...")

            try:
                row = row + 2
                self.sheet.update_cell(row, self.nome_col, nome)
                self.sheet.update_cell(row, self.age_col, idade)
                self.sheet.update_cell(row, self.beneficio_col, beneficio)
                self.sheet.update_cell(row, self.concessao_col, concessoe)
                self.sheet.update_cell(row, self.salario_col, salario)
                self.sheet.update_cell(row, self.bancos_col, bancos)
                self.sheet.update_cell(row, self.bancocard_col, bancocard)
                self.sheet.update_cell(row, self.consig_col, consig)
                self.sheet.update_cell(row, self.card_col, card)
                print('Cliente atualizado!')
            except APIError:
                print('Esperando para atualizar...')
                time.sleep(100)
                continue


cpf_updater = CpfSearch('TESTE')
cpf_updater.process_cpf_list()