Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/459.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Javascript python的Webscraping:信息不完整,被多节框架隐藏_Javascript_Python_Web Scraping_Sparql - Fatal编程技术网

Javascript python的Webscraping:信息不完整,被多节框架隐藏

Javascript python的Webscraping:信息不完整,被多节框架隐藏,javascript,python,web-scraping,sparql,Javascript,Python,Web Scraping,Sparql,提前谢谢大家。我是网络垃圾和堆叠溢出的新手。我试着从中提取一些生物数据 我要删除的链接来自一个表 outerHTML代码为 <a href="http://identifiers.org/pubmed/7503987" target="_blank">7503987</a> 此方法返回一个没有我要查找的链接的链接列表 方法2: from selenium import webdriver from selenium.webdriver.common.keys impor

提前谢谢大家。我是网络垃圾和堆叠溢出的新手。我试着从中提取一些生物数据

我要删除的链接来自一个表

outerHTML代码为

<a href="http://identifiers.org/pubmed/7503987" target="_blank">7503987</a>
此方法返回一个没有我要查找的链接的链接列表

方法2:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get("https://glytoucan.org/Structures/Glycans/G00055MO")
elem = driver.find_element_by_xpath("//*[@id='literature']/togostanza-literature//main/ul/li/ul/li[1]")
此方法找不到我输入的xpath

有人能帮我找到另一种获取数据的方法吗?我真的很感激

谢谢, 博坎

--封闭的-- 谢谢大家帮我重新编排这个问题。这是关于stackoverflow的第一篇文章

我用PhantomJS和Firefox驱动程序尝试了第二种方法。最后,firefix Web驱动程序可以工作。

JS似乎正在调用它。输入参数是URL编码的查询,如下所示:

PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX glycan: <http://purl.jp/bio/12/glyco/glycan#>
PREFIX glytoucan: <http://www.glytoucan.org/glyco/owl/glytoucan#>

SELECT DISTINCT ?from ?partner_url ?description ?pubmed_id ?pubmed_url
WHERE{
    VALUES ?accNum {"G00055MO"}
    ?saccharide  glytoucan:has_primary_id ?accNum .

    GRAPH ?graph {
        ?saccharide dcterms:references ?article .
        ?article a bibo:Article .
        ?article dcterms:identifier ?pubmed_id .
        ?article rdfs:seeAlso ?pubmed_url .
    }
    ?graph rdfs:label ?from .
    OPTIONAL {?graph rdfs:seeAlso ?partner_url.}
    ?graph dcterms:description ?description.
} ORDER by ?from
使用以下链接将获得您的链接:

import requests

query = """
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX glycan: <http://purl.jp/bio/12/glyco/glycan#>
PREFIX glytoucan: <http://www.glytoucan.org/glyco/owl/glytoucan#>

SELECT DISTINCT ?from ?partner_url ?description ?pubmed_id ?pubmed_url
WHERE{
    VALUES ?accNum {"G00055MO"}
    ?saccharide  glytoucan:has_primary_id ?accNum .

    GRAPH ?graph {
        ?saccharide dcterms:references ?article .
        ?article a bibo:Article .
        ?article dcterms:identifier ?pubmed_id .
        ?article rdfs:seeAlso ?pubmed_url .
    }
    ?graph rdfs:label ?from .
    OPTIONAL {?graph rdfs:seeAlso ?partner_url.}
    ?graph dcterms:description ?description.
} ORDER by ?from
"""

headers = {'Accept': 'application/sparql-results+json'}
payload = {'query': query}

r = requests.get('https://ts.glytoucan.org/sparql', params=payload, headers=headers)

print(r.status_code)
data = r.json()
links = [ t["pubmed_url"]["value"] for t in data["results"]["bindings"] ]
print(links)

你太专业了!您能告诉我如何获得上面显示的URL编码查询信息吗?我只从中找到查询信息。但是我以前没有学过这种数据查询方法,任何背景信息都会有帮助@BokanBao当你点击internal API太长时间没有评论时,url编码的查询已经在上面的帖子中链接。注意,你可以在Chrome控制台中查看完整的查询,打开控制台转到网络选项卡可能刷新页面并添加过滤器sparql以查看所有这些内部API调用谢谢你的帮助!我将驱动程序改为Firefox,解决了这个问题。但是,我对如何使用sparql语言查询数据库非常感兴趣!
import requests

query = """
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX glycan: <http://purl.jp/bio/12/glyco/glycan#>
PREFIX glytoucan: <http://www.glytoucan.org/glyco/owl/glytoucan#>

SELECT DISTINCT ?from ?partner_url ?description ?pubmed_id ?pubmed_url
WHERE{
    VALUES ?accNum {"G00055MO"}
    ?saccharide  glytoucan:has_primary_id ?accNum .

    GRAPH ?graph {
        ?saccharide dcterms:references ?article .
        ?article a bibo:Article .
        ?article dcterms:identifier ?pubmed_id .
        ?article rdfs:seeAlso ?pubmed_url .
    }
    ?graph rdfs:label ?from .
    OPTIONAL {?graph rdfs:seeAlso ?partner_url.}
    ?graph dcterms:description ?description.
} ORDER by ?from
"""

headers = {'Accept': 'application/sparql-results+json'}
payload = {'query': query}

r = requests.get('https://ts.glytoucan.org/sparql', params=payload, headers=headers)

print(r.status_code)
data = r.json()
links = [ t["pubmed_url"]["value"] for t in data["results"]["bindings"] ]
print(links)