Python 3.x 字符串文本到CSV
我想将字符串格式化为CSV。我使用BeautifulSoup从网站上抓取数据并获得完整的字符串 结果:Python 3.x 字符串文本到CSV,python-3.x,selenium,beautifulsoup,Python 3.x,Selenium,Beautifulsoup,我想将字符串格式化为CSV。我使用BeautifulSoup从网站上抓取数据并获得完整的字符串 结果: Business Objective\n 464 Wholesale of household goods\n Main Business Activities\n 46493 Wholesale of stationery, books, magazines and newspapers\n 我尝试了很多方法,比如: result=re.findall(r'(?==业务目标=)(.*)(?
Business Objective\n
464 Wholesale of household goods\n
Main Business Activities\n
46493 Wholesale of stationery, books, magazines and newspapers\n
我尝试了很多方法,比如:
result=re.findall(r'(?==业务目标=)(.*)(?=主要业务活动=)',字符串)
Business Objective,Main Business Activities
464 Wholesale of household goods,"46493 Wholesale of stationery, books, magazines and newspapers"
"581 Publishing of books, periodicals and other publishing activities","58110 Publishing of books, brochures and other publications(2)"
最好使用硒的等待功能,而不是睡眠。但您可以取出这些行,放入数据帧,然后以csv格式写入:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import requests
import time
import re
import numpy
import csv
companyName = "MONUMENT BOOKS CO LTD"
SourceAppCode = "-- Any register --"
browser = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
browser.get('https://www.businessregistration.moc.gov.kh/cambodia-master/relay.html?url=https%3A%2F%2Fwww.businessregistration.moc.gov.kh%2Fcambodia-master%2Fservice%2Fcreate.html%3FtargetAppCode%3Dcambodia-master%26targetRegisterAppCode%3Dcambodia-br-companies%26service%3DregisterItemSearch&target=cambodia-master')
browser.find_elements_by_xpath("//input[@name='QueryString']")[0].send_keys(companyName)
time.sleep(0.5)
browser.find_elements_by_xpath("//select[@name='SourceAppCode']")[0].send_keys(SourceAppCode)
time.sleep(0.5)
browser.find_elements_by_xpath("/html[1]/body[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/form[1]/div[1]/div[1]/div[1]/div[1]/div[2]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/a[3]")[0].click()
time.sleep(0.5)
browser.find_elements_by_xpath("//a[@class='registerItemSearch-results-page-line-ItemBox-resultLeft-viewMenu appMenu appMenuItem appMenuDepth0 noSave appItemSearchResult viewInstanceUpdateStackPush appReadOnly appIndex0']")[0].click()
time.sleep(0.5)
ww=browser.find_elements_by_xpath("/html[1]/body[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/form[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[7]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]")
time.sleep(0.5)
soup = BeautifulSoup(browser.page_source, 'html.parser')
ba = soup.find_all('div',{'class':'appRepeaterContent'})[1]
rows = ba.find_all('div',{'class':'appRecordChildren appBlockChildren'})
results = pd.DataFrame()
for row in rows:
bo = row.find('div',{'class':'appAttrValue'})
mba = bo.findNext('div',{'class':'appAttrValue'})
temp_df = pd.DataFrame([[bo.text, mba.text]], columns=['Business Objective','Main Business Activies'])
results = results.append(temp_df, sort=True).reset_index(drop=True)
results.to_csv('file.csv', index=False)
输出:
print (results)
Business Objective Main Business Activies
0 464 Wholesale of household goods 46493 Wholesale of stationery, books, magazine...
1 581 Publishing of books, periodicals and other... 58110 Publishing of books, brochures and other...
2 581 Publishing of books, periodicals and other... 58120 Publishing of mailing lists, telephone b...
3 581 Publishing of books, periodicals and other... 58130 Publishing of newspapers, journals, maga...
4 581 Publishing of books, periodicals and other... 58190 Publishing of catalogs, photos, engravin...
5 469 Non-specialized wholesale trade 46900 Wholesale of a variety of goods without ...
6 464 Wholesale of household goods 46431 Wholesale of pharmaceutical and medical ...
7 521 Warehousing and storage 52100 Warehousing and storage services
8 421 Construction of roads and railways 42101 Construction of streets, roads, bridges ...
9 681 Real estate activities with own or leased ... 68101 Buying, selling, renting and operating o...
10 854 Other education 85499 Other education n.e.c(6)
11 731 Advertising 73100 Advertising(1)
12 551 Short term accommodation activities 55101 Hotels and resort hotels
13 561 Restaurants and mobile food service activi... 56101 Restaurants and restaurant cum night clubs
14 791 Travel agency and tour operator activities 79110 Travel agency activities(1)
哇,太酷了。你让我开心。非常感谢你的回答。这是我真正需要的答案。
print (results)
Business Objective Main Business Activies
0 464 Wholesale of household goods 46493 Wholesale of stationery, books, magazine...
1 581 Publishing of books, periodicals and other... 58110 Publishing of books, brochures and other...
2 581 Publishing of books, periodicals and other... 58120 Publishing of mailing lists, telephone b...
3 581 Publishing of books, periodicals and other... 58130 Publishing of newspapers, journals, maga...
4 581 Publishing of books, periodicals and other... 58190 Publishing of catalogs, photos, engravin...
5 469 Non-specialized wholesale trade 46900 Wholesale of a variety of goods without ...
6 464 Wholesale of household goods 46431 Wholesale of pharmaceutical and medical ...
7 521 Warehousing and storage 52100 Warehousing and storage services
8 421 Construction of roads and railways 42101 Construction of streets, roads, bridges ...
9 681 Real estate activities with own or leased ... 68101 Buying, selling, renting and operating o...
10 854 Other education 85499 Other education n.e.c(6)
11 731 Advertising 73100 Advertising(1)
12 551 Short term accommodation activities 55101 Hotels and resort hotels
13 561 Restaurants and mobile food service activi... 56101 Restaurants and restaurant cum night clubs
14 791 Travel agency and tour operator activities 79110 Travel agency activities(1)