Xpath 刮掉Instagram网页标签帖子_Xpath_Google Apps Script_Web Scraping_Google Sheets_Instagram

Xpath 刮掉Instagram网页标签帖子

xpath google-apps-script web-scraping google-sheets instagram

Xpath 刮掉Instagram网页标签帖子,xpath,google-apps-script,web-scraping,google-sheets,instagram,Xpath,Google Apps Script,Web Scraping,Google Sheets,Instagram,我试图从给定的hashtag（#castles）中获取帖子数量，并使用ImportXML填充一个Google表单单元格我尝试从Chrome复制Xpath并将其粘贴到单元格中的ImportXML参数，如下所示： =ImportXML("https://www.instagram.com/explore/tags/castels/", "//*[@id="react-root"]/section/main/header/div[2]/div/div[2]/span/span") 我发现引号有问题

我试图从给定的hashtag（#castles）中获取帖子数量，并使用ImportXML填充一个Google表单单元格

我尝试从Chrome复制Xpath并将其粘贴到单元格中的ImportXML参数，如下所示：

=ImportXML("https://www.instagram.com/explore/tags/castels/", "//*[@id="react-root"]/section/main/header/div[2]/div/div[2]/span/span")

我发现引号有问题，所以我也尝试了：

=ImportXML("https://www.instagram.com/explore/tags/castels/", "//*[@id='react-root']/section/main/header/div[2]/div/div[2]/span/span")

然而，两者都返回一个错误

我做错了什么

另外，我知道meta标记description

“//meta[@name='description']/@content”

的Xpath，但是我想粗略估计文章的确切数量，而不是缩写数字。

试试这个-

function hashCount() {
  var url = 'instagram.com/explore/tags/cats/';
  var response = UrlFetchApp.fetch(url, {muteHttpExceptions: true}).getContentText();
  var regex = /(edge_hashtag_to_media":{"count":)(\d+)(,"page_info":)/gm;
  var count = regex.exec(response)[2];
  Logger.log(count);
}

演示-

我添加了

muteHttpExceptions:true

，这并没有添加到我上面的评论中。希望这能有所帮助。

基于应用程序脚本的解决方案是否可行，或者您是否希望仅通过使用

=IMPORTXML

函数来实现？我算出了公式，但它不适用于结果太大的警告

=REGEXEXTRACT（ImportXML（“https://www.instagram.com/explore/tags/cats/“，”//body/script[1]”，“edge_hashtag_to_media[[：punct:][][：punct:][][：punct:][]计数[[：punct:][][：punct:][]（\d+）\，[：punct:][]页面信息[：punct:][]”

编辑注释：不起作用我很好奇。。。应用程序脚本是如何工作的？给你-这当然只是一个示例实现-函数hashCount（）{var url=''；var response=UrlFetchApp.fetch（url.getContentText（）；var regex=/（edge_hashtag_to_media）：{“count”：（\d+，“page_info”：）/gm；var count=regex.exec（response）[2]；Logger.log（count）}返回一个空值：-(