Python 3.x 如何将for循环生成的数据转换为数据帧？_Python 3.x_Pandas_Selenium_Selenium Webdriver_Selenium Chromedriver

Python 3.x 如何将for循环生成的数据转换为数据帧？

python-3.x pandas selenium selenium-webdriver

Python 3.x 如何将for循环生成的数据转换为数据帧？,python-3.x,pandas,selenium,selenium-webdriver,selenium-chromedriver,Python 3.x,Pandas,Selenium,Selenium Webdriver,Selenium Chromedriver,我使用for循环从网站中的表中提取数据 selenium自动化Web驱动程序。如何将该数据转换为数据帧并导出到csv文件。我试图在pandas数据帧中分配“值”，但它抛出了错误 from selenium import webdriver url = "https://www.jambalakadi.info/status/" driver = webdriver.Chrome(executable_path="chromedriver.exe") driver.get(url) ro

我使用for循环从网站中的表中提取数据 selenium自动化Web驱动程序。如何将该数据转换为数据帧并导出到csv文件。我试图在pandas数据帧中分配“值”，但它抛出了错误

from selenium import webdriver

url = "https://www.jambalakadi.info/status/"

driver = webdriver.Chrome(executable_path="chromedriver.exe")

driver.get(url)

row_count = len(driver.find_elements_by_xpath(" //*[@id='main_table_countries_today']/tbody[1]/tr "))
col_count = len(driver.find_elements_by_xpath(" //*[@id='main_table_countries_today']/tbody[1]/tr[1]/td "))

print('Number of row counts:', row_count)
print("Number of column counts:", col_count)


for r in range(2, row_count+1):
    for c in range(1, col_count+1):
        value = driver.find_element_by_xpath(" //*[@id='main_table_countries_today']/tbody[1]/tr["+str(r)+"]/td["+str(c)+"] ").text
        print(value, end=" ")

    print(" ")

当我运行for循环时，“value”变量打印数据，但我无法使用pandas创建数据帧并将其导出到CSV文件

我更新了代码，格式正确吗

my_data = []
for r in range(2, row_count+1):
    for c in range(1, col_count+1):
        value = driver.find_element_by_xpath(" //*[@id='main_table_countries_today']/tbody[1]/tr["+str(r)+"]/td["+str(c)+"] ").text
        print(value, end=" ")
        for line in value:
            my_data.append(line[0],line[1],line[2])
        pd.DataFrame.from_records(my_data, columns=column).to_csv('output.csv')

    print(" ")

您需要使用函数

pd.DataFrame.from_records（）

用例：

import pandas as pd
#Reading the data
my_data = []
for line in my_database:
    #preprocess the line (say you get 3 columns date,customer,price)
    #say you use line.split(" "), now your line is actually an array of values (line = line.split(" ")
    my_data.append([line[0],line[1],line[2]]) #each index corresponds to date, customer and price respectively

pd.DataFrame.from_records(my_data, columns=['date','customer','price']).to_csv('output.csv')

您需要使用函数

pd.DataFrame.from_records（）

用例：

import pandas as pd
#Reading the data
my_data = []
for line in my_database:
    #preprocess the line (say you get 3 columns date,customer,price)
    #say you use line.split(" "), now your line is actually an array of values (line = line.split(" ")
    my_data.append([line[0],line[1],line[2]]) #each index corresponds to date, customer and price respectively

pd.DataFrame.from_records(my_data, columns=['date','customer','price']).to_csv('output.csv')

下面是使用pandas在

dataframe

中获取数据，然后导入到csv的代码

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from bs4 import BeautifulSoup


driver=webdriver.Chrome(executable_path="chromedriver.exe")
driver.get("https://yourwebsitename.com")
WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table#main_table_countries_today")))
html=driver.page_source
soup=BeautifulSoup(driver.page_source,'html.parser')
table=soup.find('table',attrs={"id":"main_table_countries_today"})
df=pd.read_html(str(table))
print(df[0])
df[0].to_csv('output.csv',index=False)

已更新：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd


driver=webdriver.Chrome(executable_path = "chromedriver.exe")
driver.get("https://yourwebsitename.com")
element=WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table#main_table_countries_today")))
table=driver.execute_script("return arguments[0].outerHTML;",element)
df=pd.read_html(str(table))
print(df[0])
df[0].to_csv('output.csv',index=False)

下面是使用pandas在

dataframe

中获取数据，然后导入到csv的代码

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from bs4 import BeautifulSoup


driver=webdriver.Chrome(executable_path="chromedriver.exe")
driver.get("https://yourwebsitename.com")
WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table#main_table_countries_today")))
html=driver.page_source
soup=BeautifulSoup(driver.page_source,'html.parser')
table=soup.find('table',attrs={"id":"main_table_countries_today"})
df=pd.read_html(str(table))
print(df[0])
df[0].to_csv('output.csv',index=False)

已更新：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd


driver=webdriver.Chrome(executable_path = "chromedriver.exe")
driver.get("https://yourwebsitename.com")
element=WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table#main_table_countries_today")))
table=driver.execute_script("return arguments[0].outerHTML;",element)
df=pd.read_html(str(table))
print(df[0])
df[0].to_csv('output.csv',index=False)

你能分享你的url吗？我没有提到网站url，因为我在未经网站许可的情况下抓取数据。好的。在这种情况下，你需要发布表格结构的html？你能分享你的url吗？我没有提到网站url，因为我在未经网站许可的情况下抓取数据。好的。在这种情况下，你需要发布表格结构的html？如果有100行，那么我应该手动提及第[0]行、第[1]行、第[2]行还是以其他方式修复它。。。。不。第[0]行第[1]行第[2]行代表您获得的值，即您将读取的整行转换为一个值数组，第[0]行==值数组[0]，第[1]行==值数组[1]等。它表示超出范围如果有100行，那么我应该手动提及第[0]行、第[1]行、第[2]行或其他方式来修复它。。。。不。第[0]行第[1]行第[2]行代表您获得的值，即您将读取的整行转换为一个值数组，第[0]==一个值数组[0]，第[1]==一个值数组[1]等等。它表示超出范围此代码应添加到我的代码中，或者这是一个完全不同的代码？当我遇到错误时：“导致此警告的代码位于文件/home/user/PycharmProjects/seleniumScrape/testing.py的第13行。要消除此警告，请更改如下代码：BeautifulSoup（您的_标记}）”@Jainmiah：你需要添加解析器，以避免警告。我已经添加了解析器。但是，我又发布了一个没有漂亮汤的解决方案。让我知道它是如何运行的？第一个很好@KunduK非常感谢你，实际上我是一个初学者，急切地想了解这个selenium。如何学习，以便我清楚地了解如何编写上面的代码？有选项卡可用，如“现在”和“昨天”？我是否可以使用selenium自动同时提取这两个选项卡？要获取昨天的值，您需要先单击selenium的选项卡，然后才能获取值。此代码应添加到我的代码中，或者这是完全不同的代码？因为我遇到了错误：导致此警告的代码位于文件/home/user/PycharmProjects/seleniumScrape/testing.py的第13行。若要消除此警告，请更改如下代码：BeautifulSoup（您的标记}）@Jainmiah：你需要添加解析器，以避免警告。我已经添加了解析器。但是，我又发布了一个没有漂亮汤的解决方案。让我知道它是如何运行的？第一个很好@KunduK非常感谢你，实际上我是一个初学者，急切地想了解这个selenium。如何学习，以便我清楚地了解如何编写上面的代码？有选项卡可用，如“现在”和“昨天”？我可以使用selenium自动同时提取这两个选项卡吗？要获取昨天的值，您需要先单击selenium的选项卡，然后才能获取值。