使用YQL多查询&;XPath解析HTML,如何转义嵌套引号?
标题比它必须要复杂得多,这是问题所在使用YQL多查询&;XPath解析HTML,如何转义嵌套引号?,xpath,escaping,quotes,yql,Xpath,Escaping,Quotes,Yql,标题比它必须要复杂得多,这是问题所在 SELECT * FROM query.multi WHERE queries=" SELECT * FROM html WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com' AND xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span';
SELECT *
FROM query.multi
WHERE queries="
SELECT *
FROM html
WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com'
AND xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span';
SELECT *
FROM xml
WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com';
SELECT *
FROM json
WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com';
SELECT *
FROM xml
WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com';
SELECT *
FROM json
WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'"
特别是这条线,
xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'
这是有问题的,因为引用,我必须把它们嵌套三层,我已经用光了引用字符。我尝试了以下几种变体,但没有成功:
//no attribute quoting
xpath='//li[@class=listLi]/div[@class=views]/a/span'
//try to quote attribute w/ backslash & single quote
xpath='//li[@class=\'listLi\']/div[@class=\'views\']/a/span'
//try to quote attribute w/ backslash & double quote
xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'
//try to quote attribute with double single quotes, like SQL
xpath='//li[@class=''listLi'']/div[@class=''views'']/a/span'
//try to quote attribute with double double quotes, like SQL
xpath='//li[@class=""listLi""]/div[@class=""views""]/a/span'
//try to quote attribute with quote entities
xpath='//li[@class="listLi"]/div[@class="views"]/a/span'
//try to surround XPath with backslash & double quote
xpath=\"//li[@class='listLi']/div[@class='views']/a/span\"
//try to surround XPath with double double quote
xpath=""//li[@class='listLi']/div[@class='views']/a/span""
一切都没有成功
我没有看到太多关于转义XPath字符串的内容,但我发现的一切似乎都是关于使用concat(因为“或”都不可用)或html实体的变体。不使用属性引号不会引发错误,但会失败,因为它不是我需要的实际XPath字符串
我在YQL文档中没有看到任何关于如何处理转义的内容。我知道edge casey的情况,但希望他们能提供某种转义指南。我提出了一个解决方案,它并没有真正回答我最初的问题,但确实解决了问题 该表将使用CSS选择器&将其解析为XPath,从而避免了令人讨厌的转义问题
SELECT *
FROM query.multi
WHERE queries="
SELECT *
FROM data.html.cssselect
WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com'
AND css='li.listLi div.views a span';
SELECT *
FROM xml
WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com';
SELECT *
FROM json
WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com';
SELECT *
FROM xml
WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com';
SELECT *
FROM json
WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'"
您需要用双反斜杠转义XPath查询中的任何字符,换句话说:
SELECT * FROM query.multi
WHERE queries="
SELECT *
FROM html
WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com'
AND xpath='//li[@class=\\'listLi\\']/div[@class=\\'views\\']/a/span';
SELECT *
FROM xml
WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com';
SELECT *
FROM json
WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com';
SELECT *
FROM xml
WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com';
SELECT *
FROM json
WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'"
()默认情况下,当我尝试在我的页面中使用它时,JS正在吃\\。我不得不做这些废话来让它工作,啊,明白了。但是很简陋。“从html中选择*,其中url='stumbleupon.com/url/%url%'和xpath='//li[@class=\\“+”\\'listLi\\\\“+”\']/div[@class=\\\“+”\\\'views\\\\\\\+']/a/span”奇怪的是,看起来data.html.cssselect比使用xpath从html中选择要快,尽管data.htmlcsselect只是转换为带xpath的从html中选择。古怪的