python解析evernote共享笔记本_Python_Html_Selenium_Parsing_Xpath

python解析evernote共享笔记本

python html selenium parsing xpath

python解析evernote共享笔记本,python,html,selenium,parsing,xpath,Python,Html,Selenium,Parsing,Xpath,我正在尝试从evernote“共享笔记本”获取数据。例如，从这一点：我试着用漂亮的汤： url = 'https://www.evernote.com/pub/missrspink/evernoteexamples#st=p&n=56b67555-158e-4d10-96e2-3b2c57ee372c' r = requests.get(url) bs = BeautifulSoup(r.text, 'html.parser') bs 结果不包含笔记本中的任何文本信息，只包含一些代

我正在尝试从evernote“共享笔记本”获取数据。例如，从这一点：

我试着用漂亮的汤：

url = 'https://www.evernote.com/pub/missrspink/evernoteexamples#st=p&n=56b67555-158e-4d10-96e2-3b2c57ee372c'
r = requests.get(url)
bs = BeautifulSoup(r.text, 'html.parser')
bs

结果不包含笔记本中的任何文本信息，只包含一些代码

我还看到了使用selenium和通过XPath查找元素的建议。例如，我想找到这张便条的标题——“学期3周2”。在GoogleChrome中，我发现它的XPath是“/html/body/div[1]/div[1]/b/span/u/b”。所以我试了一下：

driver = webdriver.PhantomJS()
driver.get(url)
t = driver.find_element_by_xpath('/html/body/div[1]/div[1]/b/span/u/b')

但它也不起作用，结果是“无接触元素例外：…”

我是python的新手，尤其是语法分析方面的新手，所以我很高兴能得到任何帮助。我正在使用python 3.6.2和jupiter笔记本

提前感谢。

与Evernote交互的最简单方法是使用他们的

配置好API密钥并可以正常连接后，您可以下载并参考笔记和笔记本

Evernote Notes使用自己的模板语言，称为Evernote标记语言，它是HTML的一个子集。您将能够使用BeautifulSoup4解析ENML并提取您正在寻找的元素

如果您试图根据本地安装而不是他们的web应用程序提取信息，您也可以从可执行文件中获取所需信息。请参阅本地安装以提取数据。为此，您需要使用Python3模块

然而

如果您想使用selenium，这将帮助您开始：

import selenium.webdriver.support.ui as ui
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

# your example URL
URL = 'https://www.evernote.com/pub/missrspink/evernoteexamples#st=p&n=56b67555-158e-4d10-96e2-3b2c57ee372c'

# create the browser interface, and a generic "wait" that we can use
#  to intelligently block while the driver looks for elements we expect.
#  10:  maximum wait in seconds
# 0.5:  polling interval in seconds
driver = Chrome()
wait = ui.WebDriverWait(driver, 10, 0.5)

driver.get(URL)

# Note contents are loaded in an iFrame element
find_iframe = By.CSS_SELECTOR, 'iframe.gwt-Frame'
find_html = By.TAG_NAME, 'html'

# .. so we have to wait for the iframe to exist, switch our driver context
#  and then wait for that internal page to load.
wait.until(EC.frame_to_be_available_and_switch_to_it(find_iframe))
wait.until(EC.visibility_of_element_located(find_html))

# since ENML is "just" HTML we can select the top tag and get all the 
#  contents inside it.
doc = driver.find_element_by_tag_name('html')

print(doc.get_attribute('innerHTML'))  # <-- this is what you want

# cleanup our browser instance
driver.quit()

与Evernote交互的最简单方法是使用他们的

配置好API密钥并可以正常连接后，您可以下载并参考笔记和笔记本

Evernote Notes使用自己的模板语言，称为Evernote标记语言，它是HTML的一个子集。您将能够使用BeautifulSoup4解析ENML并提取您正在寻找的元素

然而

如果您想使用selenium，这将帮助您开始：

import selenium.webdriver.support.ui as ui
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

# your example URL
URL = 'https://www.evernote.com/pub/missrspink/evernoteexamples#st=p&n=56b67555-158e-4d10-96e2-3b2c57ee372c'

# create the browser interface, and a generic "wait" that we can use
#  to intelligently block while the driver looks for elements we expect.
#  10:  maximum wait in seconds
# 0.5:  polling interval in seconds
driver = Chrome()
wait = ui.WebDriverWait(driver, 10, 0.5)

driver.get(URL)

# Note contents are loaded in an iFrame element
find_iframe = By.CSS_SELECTOR, 'iframe.gwt-Frame'
find_html = By.TAG_NAME, 'html'

# .. so we have to wait for the iframe to exist, switch our driver context
#  and then wait for that internal page to load.
wait.until(EC.frame_to_be_available_and_switch_to_it(find_iframe))
wait.until(EC.visibility_of_element_located(find_html))

# since ENML is "just" HTML we can select the top tag and get all the 
#  contents inside it.
doc = driver.find_element_by_tag_name('html')

print(doc.get_attribute('innerHTML'))  # <-- this is what you want

# cleanup our browser instance
driver.quit()

为了补充@blakev所说的内容，您将无法通过请求获得所需的正确HTML，因为URL中的“后位”表示未发送到服务器，因此您只是将响应发送并获取回服务器https://www.evernote.com/pub/missrspink/evernoteexamplesTo 再加上@blakev所说的，您将无法通过请求获得所需的正确HTML，因为URL中的表示不会将后面的位发送到服务器，所以您只是将响应发送并获取回服务器https://www.evernote.com/pub/missrspink/evernoteexamplesfor Selenium您需要确保正确安装要使用的webdriver，否则它将在driver=Chrome步骤中失败。@blakev非常感谢您给出了如此完整的答案！硒的方法完全是完美的。evernote方法有一个缺点——Python3没有正式的包。所以它的使用可能会稍微复杂一些。谢谢你的帮助！对于Selenium，您需要确保正确安装要使用的webdriver，否则它将在driver=Chrome步骤中失败。@blakev非常感谢您的完整回答！硒的方法完全是完美的。evernote方法有一个缺点——Python3没有正式的包。所以它的使用可能会稍微复杂一些。谢谢你的帮助！