Javascript 动态数组到JSON或表到JSON

Javascript 动态数组到JSON或表到JSON,javascript,web-scraping,puppeteer,Javascript,Web Scraping,Puppeteer,几周来,我一直在寻找并试图让它发挥作用,但我仍然失败。我尝试了几种方法,包括reg表达式。我有一个动态表,是我用Puppeter创建的,我试图将数据输出为JSON。问题在于标题“第二层楼(226室)”和“位置-115室”可能会显示,也可能不会显示。这些会议室中的事件可能有1个或多个事件。如何像这样转换动态数据并确保列出所有内容 我想得到类似于JSON的东西 data: [ { "location": "2nd flr (Rm 226)" "time": "10:00 AM"

几周来,我一直在寻找并试图让它发挥作用,但我仍然失败。我尝试了几种方法,包括reg表达式。我有一个动态表,是我用Puppeter创建的,我试图将数据输出为JSON。问题在于标题“第二层楼(226室)”和“位置-115室”可能会显示,也可能不会显示。这些会议室中的事件可能有1个或多个事件。如何像这样转换动态数据并确保列出所有内容

我想得到类似于JSON的东西

data: [

  {
    "location": "2nd flr (Rm 226)"
    "time": "10:00 AM",
    "description": "Social Security Administration Commissioner",
    "document": "18",
    "type": "Social Security Hearing",
    "blank": " ",
    "order": "Hearing"
  },
  {
    "location": "2nd flr (Rm 226)"
    "time": "01:00 PM",
    "description:"
    "Social Security Administration Commissioner",
    "document": "18",
    "type": "Order Setting Social Security Hearing",
    "blank": " ",
    order: "Hearing"

  },
  {
    "location": "3rd flr (100)"
    "time": "01:00 PM",
    "description:"
    "Social Security Administration Commissioner",
    "document": "18",
    "type": "Order Setting Social Security Hearing",
    "blank": " ",
    order: "Hearing"

  }

]


const data = Array.from(
  document.querySelectorAll('#content > table > tbody > tr'),
  row => Array.from(row.querySelectorAll('td'), cell => cell.innerText)
)
这是我得到的输出

{
  "data": [
    [
      "2nd flr (Rm 226)"
    ],
    [
      "10:00 AM",
      "Social Security Administration Commissioner",
      "18",
      "Social Security Hearing",
      " ",
      "Hearing"
    ],
    [
      "01:00 PM",
      "Social Security Administration Commissioner",
      "18",
      "Order Setting Social Security Hearing",
      " ",
      "Hearing"
    ],
    [
      "3rd flr (100)"
    ],
    [
      "09:30 AM",
      "TERMINATED on 03/23/2015",
      "34",
      "Resetting Hearings",
      " ",
      "Hearing"
    ],
    [
      " ",
      "Reserved for case",
      "23",
      "Motion Hearing",
      " ",
      "Hearing"
    ],
    [
      "01:00 PM",
      "Case Information",
      "19",
      "Order Setting",
      " ",
      "Hearing"
    ],
    [
      "01:30 PM",
      "Case information",
      "31",
      "Order Setting",
      " ",
      "Hearing"
    ],
    [
      " ",
      "TERMINATED on 06/14/2019",
      "16",
      "Order Setting/Resetting Hearings",
      " ",
      "Hearing"
    ],
    [
      "3rd flr (Rm 310)"
    ],
    [
      "01:30 PM",
      "Insurance Company",
      "122",
      "Order Setting/Resetting Hearings",
      " ",
      "Hearing"
    ]
  ]
}

我认为你没有合适的头衔。获取json很容易,只需使用json.stringify您的问题是获取要转换的对象以保持一致,或者至少以您希望的方式进行转换-看起来代码生成了大量数组,而不是对象数组

所以我认为你必须做更多的工作来解析html。我将控制台记录来自html的对象,以便在转换为json之前进行检查


因此,您可以显式地读取每个
,而不是在循环中,将值指定给对象或默认值您需要在HTML中寻找进一步的线索,以实现所需的结构。在本例中,我在每个
tr
中查找第一个
td
的类

[[我知道您只是在阅读HTML,但它有错误,因为其中分配了多个相同的
id
s(
room
)。]]

//定义快捷方式函数qsa:querySelectorAll,返回正确的数组
//HTML上下文“el”可以作为可选的第二个参数提供
功能qsa(s,el){
返回Array.prototype.map.call((el?元素:Document).prototype
.queryselectoral.call((el | | document),s),函数(e){returne})
}
数据=[];
qsa('tr')。forEach(函数(tr,i,arr){
var tds=qsa('td',tr);
如果(tds[0]。类名=='room')
arr.room=tds[0]。innerText//“记住”当前房间数据。。。
else if(tds[0].className=='case-0')
data.push([arr.room].concat(tds.map(函数(e){return e.innerText}))//输出房间和行数据
});
console.log(数据)
//当然,JSON是由
var JSONdata=JSON.stringify(数据)

2019年9月23日的每日日历报告
第二层(226室) 上午10:00 18 安全听证会   听力 第二层(406室) 下午1:30 18 安全听证会   听力
看起来,
innertText
并不总能满足您的需求。如果你能提供你想从中获取的HTML,那会很有帮助。如果我不知道在某一天会有多少TD,我怎么能明确地阅读每个TD?你有没有记录所有TD和tr?这是您最好的调试线索。您的示例显示了html中的一致td。
<center><Table border=1 width=98%>
<TR><TD id='report' class='report' align=center><B><FONT SIZE=+2>Daily Calendar Report of 09/23/2019</font></B><BR><CENTER></table></center>
<Table border=1 width=98%   >

<TR><TD class='room' id='room' ALIGN=CENTER COLSPAN=6><STRONG>2nd flr (Rm 226)</STRONG></TD></TR>
<TR id='casedata' class='casedata'>
<TD class=case-0 id=case-0 VALIGN=top NOWRAP>10:00 AM</TD>
<TD class=case-1 id=case-1 VALIGN=top><A HREF=/Reportpt.pl?55244>Social Security Administration</A><B></B></TD>
<TD class=case-2 id=case-2 VALIGN=top>18</TD>
<TD class=case-3 id=case-3 VALIGN=top>Security Hearing</TD>
<TD class=case-4 id=case-4 VALIGN=top>&nbsp</TD>
<TD class=case-5 id=case-5 VALIGN=top NOWRAP><I>Hearing</I></TD>
</TR>

<TR><TD class='room' id='room' ALIGN=CENTER COLSPAN=6><STRONG>2nd flr (Rm 406)</STRONG></TD></TR>
<TR id='casedata' class='casedata'>
<TD class=case-0 id=case-0 VALIGN=top NOWRAP>1:30 PM</TD>
<TD class=case-1 id=case-1 VALIGN=top><A HREF=/Reportpt.pl?55244>Social Security Administration</A><B></B></TD>
<TD class=case-2 id=case-2 VALIGN=top>18</TD>
<TD class=case-3 id=case-3 VALIGN=top>Security Hearing</TD>
<TD class=case-4 id=case-4 VALIGN=top>&nbsp</TD>
<TD class=case-5 id=case-5 VALIGN=top NOWRAP><I>Hearing</I></TD>
</TR>
</table>
const tds = Array.from(document.querySelectorAll('#Content > table > tbody > tr > td'));
const trs = Array.from(document.querySelectorAll('#Content > table > tbody > tr'))

const data = Array.from(
      document.querySelectorAll('#Content > table > tbody > tr'),
      row => Array.from(row.querySelectorAll('td'), cell => cell.innerText),
      data =>{ return ( [data] ) }
    )