For循环-运行1个循环直到完成,然后运行下一个循环python

For循环-运行1个循环直到完成,然后运行下一个循环python,python,pandas,loops,beautifulsoup,with-statement,Python,Pandas,Loops,Beautifulsoup,With Statement,此脚本需要一直运行到RI_page_url.csv,然后从RI_License_url.csv运行所有生成的URL并获取业务信息 它从RI_page_url.csv中提取所有url,但只运行并打印RI_License_url.csv中100个url中的第一个。需要帮助了解如何让它在运行第二部分之前等待第一部分完成 我感谢所有的帮助 以下是RI_page_urls.csv的url: http://www.crb.state.ri.us/verify_CRB.php 以及守则: from bs4

此脚本需要一直运行到RI_page_url.csv,然后从RI_License_url.csv运行所有生成的URL并获取业务信息

它从RI_page_url.csv中提取所有url,但只运行并打印RI_License_url.csv中100个url中的第一个。需要帮助了解如何让它在运行第二部分之前等待第一部分完成

我感谢所有的帮助

以下是RI_page_urls.csv的url:

http://www.crb.state.ri.us/verify_CRB.php
以及守则:

from bs4 import BeautifulSoup as soup
import requests as r
import pandas as pd
import re
import csv

#pulls lic# url
with open('RI_page_urls.csv') as f_input:
    csv_input = csv.reader(f_input)

    for url in csv_input:
        data = r.get(url[0])
        page_data = soup(data.text, 'html.parser')
        links = [r'www.crb.state.ri.us/' + link['href']
            for link in page_data.table.tr.find_all('a') if re.search('licensedetail.php', str(link))]

        df = pd.DataFrame(links)
        df.to_csv('RI_License_urls.csv', header=False, index=False, mode = 'a')
#Code Above works!

#need to pull table info from license url    
#this pulls the first record, but doesn't loop through the requests

with open('RI_License_urls.csv') as f_input_2:
    csv_input_2 = csv.reader(f_input_2)

    for url in csv_input_2:
        data = r.get(url[0])
        page_data = soup(data.text, 'html.parser')
        company_info = (' '.join(info.get_text(", ", strip=True).split()) for info in page_data.find_all('h9'))

        df = pd.DataFrame(info, columns=['company_info'])
        df.to_csv('RI_company_info.csv', index=False)

嗯,问题有点不清楚,而且代码也有一些错误

data = r.get(url[0])
应该是因为它的URL以http或https开始,而不是www

data = r.get("http://"+url[0])
在下面的代码中

info
没有定义,所以我假设它应该是
company\u info

 company_info = (' '.join(info.get_text(", ", strip=True).split()) for info in page_data.find_all('h9'))

        df = pd.DataFrame(info, columns=['company_info'])
因此,完整的代码是

from bs4 import BeautifulSoup as soup
import requests as r
import pandas as pd
import re
import csv

#pulls lic# url
with open('RI_page_urls.csv') as f_input:
    csv_input = csv.reader(f_input)

    for url in csv_input:
        data = r.get(url[0])
        page_data = soup(data.text, 'html.parser')
        links = [r'www.crb.state.ri.us/' + link['href']
            for link in page_data.table.tr.find_all('a') if re.search('licensedetail.php', str(link))]

        df = pd.DataFrame(links)
        df.to_csv('RI_License_urls.csv', header=False, index=False, mode = 'a')
#Code Above works!

#need to pull table info from license url    
#this pulls the first record, but doesn't loop through the requests

with open('RI_License_urls.csv') as f_input_2:
    csv_input_2 = csv.reader(f_input_2)
    with open('RI_company_info.csv','a',buffering=0) as companyinfofiledescriptor:
        for url in csv_input_2:
            data = r.get("http://"+url[0])
            page_data = soup(data.text, 'html.parser')
            company_info = (' '.join(info.get_text(", ", strip=True).split()) for info in page_data.find_all('h9'))

            df = pd.DataFrame(company_info, columns=['company_info'])
            df.to_csv(companyinfofiledescriptor, index=False)
            print(df)

df.to_csv('RI_company_info.csv',index=False)
重复覆盖每个iterationAlbin上的文件内容,当我运行它时,它会抛出这个错误:ValueError:can't have unbuffered text i/O您是如何运行的?尝试使用open('RI_company_info.csv','a',buffering=0)在这一行
添加buffering=10。好的,所以我删除了缓冲,并将其打印到powershell。但是,它不会写入“RI_company_info.csv”。然后它会在您停止程序后写入;-)嗯,这不是很有趣吗,它运行后确实写了。因此,我从RI_page_urls.csv中获得了320个URL,它将生成32000个URL来获取公司信息。它真的会在写之前将32000行公司信息打印到powershell,而不会崩溃吗?