Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 字符串文本到CSV_Python 3.x_Selenium_Beautifulsoup - Fatal编程技术网

Python 3.x 字符串文本到CSV

Python 3.x 字符串文本到CSV,python-3.x,selenium,beautifulsoup,Python 3.x,Selenium,Beautifulsoup,我想将字符串格式化为CSV。我使用BeautifulSoup从网站上抓取数据并获得完整的字符串 结果: Business Objective\n 464 Wholesale of household goods\n Main Business Activities\n 46493 Wholesale of stationery, books, magazines and newspapers\n 我尝试了很多方法,比如: result=re.findall(r'(?==业务目标=)(.*)(?

我想将字符串格式化为CSV。我使用BeautifulSoup从网站上抓取数据并获得完整的字符串

结果:

Business Objective\n
464 Wholesale of household goods\n
Main Business Activities\n
46493 Wholesale of stationery, books, magazines and newspapers\n
我尝试了很多方法,比如:

  • result=re.findall(r'(?==业务目标=)(.*)(?=主要业务活动=)',字符串)

  • 使用连接

    3.使用字符串替换

  • 代码:

    我的预期结果是:

    Business Objective,Main Business Activities
    464 Wholesale of household goods,"46493 Wholesale of stationery, books, magazines and newspapers"
    "581 Publishing of books, periodicals and other publishing activities","58110 Publishing of books, brochures and other publications(2)"
    

    最好使用硒的等待功能,而不是睡眠。但您可以取出这些行,放入数据帧,然后以csv格式写入:

    from selenium import webdriver
    from bs4 import BeautifulSoup
    import pandas as pd
    import requests
    import  time
    import re
    import numpy
    import csv
    companyName = "MONUMENT BOOKS CO  LTD"
    SourceAppCode = "-- Any register --"
    browser = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
    browser.get('https://www.businessregistration.moc.gov.kh/cambodia-master/relay.html?url=https%3A%2F%2Fwww.businessregistration.moc.gov.kh%2Fcambodia-master%2Fservice%2Fcreate.html%3FtargetAppCode%3Dcambodia-master%26targetRegisterAppCode%3Dcambodia-br-companies%26service%3DregisterItemSearch&target=cambodia-master')
    browser.find_elements_by_xpath("//input[@name='QueryString']")[0].send_keys(companyName)
    time.sleep(0.5)
    browser.find_elements_by_xpath("//select[@name='SourceAppCode']")[0].send_keys(SourceAppCode)
    time.sleep(0.5)
    browser.find_elements_by_xpath("/html[1]/body[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/form[1]/div[1]/div[1]/div[1]/div[1]/div[2]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/a[3]")[0].click()
    time.sleep(0.5)
    browser.find_elements_by_xpath("//a[@class='registerItemSearch-results-page-line-ItemBox-resultLeft-viewMenu appMenu appMenuItem appMenuDepth0 noSave appItemSearchResult viewInstanceUpdateStackPush appReadOnly appIndex0']")[0].click()
    time.sleep(0.5)
    ww=browser.find_elements_by_xpath("/html[1]/body[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/form[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[7]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]")
    time.sleep(0.5) 
    
    
    soup = BeautifulSoup(browser.page_source, 'html.parser')
    ba = soup.find_all('div',{'class':'appRepeaterContent'})[1]
    
    rows = ba.find_all('div',{'class':'appRecordChildren appBlockChildren'})
    
    
    
    results = pd.DataFrame()
    for row in rows:
        bo = row.find('div',{'class':'appAttrValue'})
        mba = bo.findNext('div',{'class':'appAttrValue'})
    
        temp_df = pd.DataFrame([[bo.text, mba.text]], columns=['Business Objective','Main Business Activies'])
        results = results.append(temp_df, sort=True).reset_index(drop=True)
    
    results.to_csv('file.csv', index=False)
    
    输出:

    print (results)
                                       Business Objective                             Main Business Activies
    0                    464 Wholesale of household goods  46493 Wholesale of stationery, books, magazine...
    1   581 Publishing of books, periodicals and other...  58110 Publishing of books, brochures and other...
    2   581 Publishing of books, periodicals and other...  58120 Publishing of mailing lists, telephone b...
    3   581 Publishing of books, periodicals and other...  58130 Publishing of newspapers, journals, maga...
    4   581 Publishing of books, periodicals and other...  58190 Publishing of catalogs, photos, engravin...
    5                 469 Non-specialized wholesale trade  46900 Wholesale of a variety of goods without ...
    6                    464 Wholesale of household goods  46431 Wholesale of pharmaceutical and medical ...
    7                         521 Warehousing and storage             52100 Warehousing and storage services
    8              421 Construction of roads and railways  42101 Construction of streets, roads, bridges ...
    9   681 Real estate activities with own or leased ...  68101 Buying, selling, renting and operating o...
    10                                854 Other education                     85499 Other education n.e.c(6)
    11                                    731 Advertising                               73100 Advertising(1)
    12            551 Short term accommodation activities                     55101 Hotels and resort hotels
    13  561 Restaurants and mobile food service activi...   56101 Restaurants and restaurant cum night clubs
    14     791 Travel agency and tour operator activities                  79110 Travel agency activities(1)
    

    哇,太酷了。你让我开心。非常感谢你的回答。这是我真正需要的答案。
    print (results)
                                       Business Objective                             Main Business Activies
    0                    464 Wholesale of household goods  46493 Wholesale of stationery, books, magazine...
    1   581 Publishing of books, periodicals and other...  58110 Publishing of books, brochures and other...
    2   581 Publishing of books, periodicals and other...  58120 Publishing of mailing lists, telephone b...
    3   581 Publishing of books, periodicals and other...  58130 Publishing of newspapers, journals, maga...
    4   581 Publishing of books, periodicals and other...  58190 Publishing of catalogs, photos, engravin...
    5                 469 Non-specialized wholesale trade  46900 Wholesale of a variety of goods without ...
    6                    464 Wholesale of household goods  46431 Wholesale of pharmaceutical and medical ...
    7                         521 Warehousing and storage             52100 Warehousing and storage services
    8              421 Construction of roads and railways  42101 Construction of streets, roads, bridges ...
    9   681 Real estate activities with own or leased ...  68101 Buying, selling, renting and operating o...
    10                                854 Other education                     85499 Other education n.e.c(6)
    11                                    731 Advertising                               73100 Advertising(1)
    12            551 Short term accommodation activities                     55101 Hotels and resort hotels
    13  561 Restaurants and mobile food service activi...   56101 Restaurants and restaurant cum night clubs
    14     791 Travel agency and tour operator activities                  79110 Travel agency activities(1)