Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/459.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Javascript 如何使用Python触发从网站下载文件?_Javascript_Python_Web Scraping_Export - Fatal编程技术网

Javascript 如何使用Python触发从网站下载文件?

Javascript 如何使用Python触发从网站下载文件?,javascript,python,web-scraping,export,Javascript,Python,Web Scraping,Export,我正试图建立一个脚本,每天从一个网站上提取数据,但我很难让Python真正读取该表——我不是一个专业的程序员。我尝试了两种方法: 1) 用漂亮的汤刮桌子(页眉、行等),然后 2) 使用网站的“导出excel”按钮 以下是准确的网站: 到目前为止,我的代码是: #导入 导入请求 导入urllib.request 作为pd进口熊猫 从lxml导入html 将lxml.html作为lh导入 从bs4导入BeautifulSoup `URL='1〕https://scgenvoy.sempra.com

我正试图建立一个脚本,每天从一个网站上提取数据,但我很难让Python真正读取该表——我不是一个专业的程序员。我尝试了两种方法:

1) 用漂亮的汤刮桌子(页眉、行等),然后

2) 使用网站的“导出excel”按钮

以下是准确的网站:

到目前为止,我的代码是:

#导入
导入请求
导入urllib.request
作为pd进口熊猫
从lxml导入html
将lxml.html作为lh导入
从bs4导入BeautifulSoup
`URL='1〕https://scgenvoy.sempra.com/index.html#nav=/Public/ViewExternalLowOFO.getLowOFO%3Frand%3D200'`
#创建一个句柄,页面,来处理网站的内容
requests.packages.urllib3.disable_warnings()
page=requests.get(URL,verify=False)
我认为最简单的方法是使用

xpath //*[@id="content"]/form/div[2]/div/table/tbody/tr/td[4]/table/tbody/tr/td[1]/a

非常感谢您的帮助

我会尝试识别“导出到excel”的API并使用该API。您可以从浏览器的开发人员工具中识别这一点。例如,Google Chrome的Copy as Curl提供了以下内容:

API url是

输入参数为:

FileName: LowOFO05302019Cycle2
Class: com.sempra.krypton.common.saveas.constants.FancyExcelExportType
pageSize: letter
pageOrientation: portrait
HiddenGasFlowDateField: 05/30/2019
HiddenCycleField: 2
gasFlowDate: 05/30/2019
cycle: 2
请求方法是POST

现在可以使用python请求库或beautifulsoup库发出此请求,并为参数传递适当的值

为您提供一个想法,而不是自己解决整个问题。

您的任务是使用“导出”按钮添加动态表数据。因此,基本上您需要使用
Selenium
包来处理动态数据。根据您的浏览器下载selenium web驱动程序

对于chrome浏览器:

unzip ~/Downloads/chromedriver_linux64.zip -d ~/Downloads
chmod +x ~/Downloads/chromedriver
sudo mv -f ~/Downloads/chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
from selenium import webdriver
import time

driver = webdriver.Chrome('/usr/bin/chromedriver')
driver.get('https://scgenvoy.sempra.com/index.html#nav=/Public/ViewExternalLowOFO.getLowOFO%3Frand%3D200')
time.sleep(3)
excel_button = driver.find_element_by_xpath("//div[@id='content']/form/div[2]/div/table/tbody/tr/td[4]/table/tbody/tr/td[2]/a")

print(excel_button.click())

为chrome浏览器安装web驱动程序:

unzip ~/Downloads/chromedriver_linux64.zip -d ~/Downloads
chmod +x ~/Downloads/chromedriver
sudo mv -f ~/Downloads/chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
from selenium import webdriver
import time

driver = webdriver.Chrome('/usr/bin/chromedriver')
driver.get('https://scgenvoy.sempra.com/index.html#nav=/Public/ViewExternalLowOFO.getLowOFO%3Frand%3D200')
time.sleep(3)
excel_button = driver.find_element_by_xpath("//div[@id='content']/form/div[2]/div/table/tbody/tr/td[4]/table/tbody/tr/td[2]/a")

print(excel_button.click())
硒教程

导出Excel文件:

unzip ~/Downloads/chromedriver_linux64.zip -d ~/Downloads
chmod +x ~/Downloads/chromedriver
sudo mv -f ~/Downloads/chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
from selenium import webdriver
import time

driver = webdriver.Chrome('/usr/bin/chromedriver')
driver.get('https://scgenvoy.sempra.com/index.html#nav=/Public/ViewExternalLowOFO.getLowOFO%3Frand%3D200')
time.sleep(3)
excel_button = driver.find_element_by_xpath("//div[@id='content']/form/div[2]/div/table/tbody/tr/td[4]/table/tbody/tr/td[2]/a")

print(excel_button.click())

其中
“/usr/bin/chromedriver”
chrome web驱动程序路径。

以下是我使用的代码:

## Input parameters
start_date = '5/28/19'
end_date = '5/31/19'

#### Loops through date range and pulls data
## Date Range ##
datelist = pd.date_range(start=start_date, end=end_date, 
freq='D',dtype='datetime64[ns]')
print(datelist)

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

# opens chrome and opens up Gas Envoy
driver =webdriver.Chrome('C:/Users/tmrt/Documents/chromedriver_win32/chromedriver.exe')
driver.get('https://scgenvoy.sempra.com/index.html#nav=/Public/ViewExternalLowOFO.getLowOFO%3Frand%3D200)