Web scraping 如何从该网站获取数据?

Web scraping 如何从该网站获取数据?,web-scraping,windows-10,read-data,screen-grab,data-capture,Web Scraping,Windows 10,Read Data,Screen Grab,Data Capture,这里有一个站点(),该表的每个字段(由黄色方框指定)显示关于特定日期的信息。我需要做的是只读每个字段的حجم行(我指的是我在以下照片中用红色方块指定的内容(你应该转到我在第一张照片中提到的选项卡,以查看第二张照片)): 并将它们(存储在我的计算机中)写入如下文本文件: 6.832 M (14%) , 40.475 M (85%), 248,000 (0%), 47.059 M (99%) 605,000 (3%), 15.277 M (96%), 478,714 (3%), 15.404 M

这里有一个站点(),该表的每个字段(由黄色方框指定)显示关于特定日期的信息。我需要做的是只读每个字段的
حجم
行(我指的是我在以下照片中用红色方块指定的内容(你应该转到我在第一张照片中提到的选项卡,以查看第二张照片)):

并将它们(存储在我的计算机中)写入如下文本文件:

6.832 M (14%) , 40.475 M (85%), 248,000 (0%), 47.059 M (99%)
605,000 (3%), 15.277 M (96%), 478,714 (3%), 15.404 M (96%)
8.102 M (42%), 10.751 M (57%), 9.599 M (50%), 9.253 M (49%)
215,937 (2%), 9.417 M (97%), 1.115 M (11%), 8.518 M (88%)
3.351 M (15%), 18.284 M (84%), 5.987 M (27%), 15.647 M (72%)
但我不知道有没有可能?如果是这样的话,我怎样才能用最简单的方法做到这一点?(我使用Windows10)

编辑: 我成功地完成了步骤3,并在步骤4中运行了
node extract.js
命令。我得到了这个结果:

使用非零退出代码

但是没有
store.txt
文件

  • 从此处下载并安装node.js和npm-

  • 在电脑中的任何位置创建文件夹,在其中创建extract.js文件并粘贴以下代码

  • 然后打开该文件夹中的命令提示符并运行“npm安装puppeter”(可能需要几分钟才能完成)

  • 然后运行“node extract.js”

  • 成功运行后,您将在包含预期结果的同一文件夹中拥有“store.txt”文件


  • 你提供的链接与屏幕截图中显示的内容不一样,如果你提供的链接正确,我可以提供帮助。木偶演员真的很简单。js@SaurabhNarhe当前位置非常感谢您的帮助。我编辑了我的问题并添加了一张新照片。请在活动的internet连接下重试相同的命令。非常感谢Saurabh!因为我在这方面是全新的,我可以在Windows10中使用木偶吗?我应该如何安装它以及必须在哪里使用上述代码?我还想知道如何删除单词
    حجم
    ?我在抓取的数据中不需要那个词。我应该在哪里找到
    store.txt
    文件?是否可以删除带有百分比数字的括号?我的意思是我有这样的行
    6.832m,40.475m,248000,47.059m
    …没有括号和百分比。你是个伟大的人!只有一个问题:
    store.txt
    文件的内容是这样的
    47.059 M、248000、40.475 M、6.832 M、15.404 M、478714、15.277 M、605000、9.253 M、2.34 M、0
    ,我如何将每4个数字排成一行?
    [ 'حجم, 47.059 M (99%), 248,000 (0%), 40.475 M (85%), 6.832 M (14%)',
      'حجم, 15.404 M (96%), 478,714 (3%), 15.277 M (96%), 605,000 (3%)',
      'حجم, 9.253 M (49%), 9.599 M (50%), 10.751 M (57%), 8.102 M (42%)',
      'حجم, 8.518 M (88%), 1.115 M (11%), 9.417 M (97%), 215,937 (2%)',
      'حجم, 15.647 M (72%), 5.987 M (27%), 18.284 M (84%), 3.351 M (15%)',
      'حجم, 21.848 M (93%), 1.501 M (6%), 21.648 M (92%), 1.701 M (7%)',
      'حجم, 30.845 M (95%), 1.3 M (4%), 30.663 M (95%), 1.482 M (4%)',
      'حجم, 9.914 M (64%), 5.474 M (35%), 9.938 M (64%), 5.45 M (35%)',
      'حجم, 10.775 M (97%), 250,000 (2%), 10.995 M (99%), 30,000 (0%)',
      'حجم, 21.328 M (91%), 2.027 M (8%), 22.315 M (95%), 1.04 M (4%)',
      'حجم, 19.588 M (92%), 1.54 M (7%), 21.048 M (99%), 80,000 (0%)',
      'حجم, 12.554 M (96%), 418,000 (3%), 11.504 M (88%), 1.468 M (11%)',
      'حجم, 14.98 M (92%), 1.299 M (7%), 16.135 M (99%), 144,008 (0%)',
      'حجم, 10.878 M (95%), 502,040 (4%), 11.378 M (99%), 2,040 (0%)',
      'حجم, 10.012 M (97%), 275,000 (2%), 10.287 M (100%), 0 (0%)',
      'حجم, 11.992 M (95%), 500,000 (4%), 11.707 M (93%), 785,244 (6%)',
      'حجم, 16.492 M (95%), 820,000 (4%), 17.056 M (98%), 256,241 (1%)',
      'حجم, 19.639 M (98%), 378,384 (1%), 20.017 M (100%), 0 (0%)',
      'حجم, 13.781 M (95%), 639,609 (4%), 14.161 M (98%), 260,000 (1%)',
      'حجم, 31.797 M (99%), 300,507 (0%), 26.089 M (81%), 6.009 M (18%)',
      'حجم, 18.159 M (99%), 30,391 (0%), 15.914 M (87%), 2.275 M (12%)',
      'حجم, 21.271 M (95%), 1.01 M (4%), 21.501 M (96%), 780,000 (3%)',
      'حجم, 17.322 M (62%), 10.615 M (37%), 19.437 M (69%), 8.5 M (30%)',
      'حجم, 37.817 M (97%), 1.03 M (2%), 34.125 M (87%), 4.722 M (12%)',
      'حجم, 55.396 M (99%), 211,000 (0%), 52.507 M (94%), 3.1 M (5%)',
      'حجم, 23.141 M (98%), 420,000 (1%), 23.461 M (99%), 100,000 (0%)',
      'حجم, 46.215 M (82%), 9.919 M (17%), 49.764 M (88%), 6.371 M (11%)',
      'حجم, 1.26 M (100%), 0 (0%), 1.26 M (100%), 0 (0%)',
      'حجم, 35.89 M (99%), 251,000 (0%), 35.921 M (99%), 220,000 (0%)',
      'حجم, 48.509 M (88%), 6.349 M (11%), 54.052 M (98%), 806,362 (1%)',
      'حجم, 41.018 M (91%), 4.006 M (8%), 41.564 M (92%), 3.46 M (7%)',
      'حجم, 40.02 M (99%), 100,000 (0%), 39.22 M (97%), 900,000 (2%)',
      'حجم, 36.974 M (99%), 30,000 (0%), 36.549 M (98%), 455,500 (1%)',
      'حجم, 35.739 M (99%), 230,000 (0%), 33.104 M (92%), 2.866 M (7%)',
      'حجم, 19.627 M (100%), 0 (0%), 18.877 M (96%), 750,000 (3%)',
      'حجم, 19.603 M (81%), 4.379 M (18%), 23.982 M (100%), 0 (0%)',
      'حجم, 10.186 M (97%), 250,000 (2%), 10.436 M (100%), 0 (0%)',
      'حجم, 15.414 M (98%), 250,500 (1%), 15.465 M (98%), 200,000 (1%)',
      'حجم, 21.571 M (97%), 665,000 (2%), 22.236 M (100%), 0 (0%)',
      'حجم, 15.537 M (98%), 250,000 (1%), 15.787 M (100%), 0 (0%)',
      'حجم, 21.422 M (98%), 221,004 (1%), 21.243 M (98%), 400,000 (1%)',
      'حجم, 30.662 M (92%), 2.375 M (7%), 33.036 M (100%), 0 (0%)',
      'حجم, 39.287 M (98%), 455,000 (1%), 39.742 M (100%), 0 (0%)',
      'حجم, 53.141 M (89%), 6.11 M (10%), 59.131 M (99%), 120,000 (0%)',
      'حجم, 23.587 M (98%), 255,000 (1%), 23.842 M (100%), 0 (0%)',
      'حجم, 17.043 M (98%), 255,000 (1%), 17.298 M (100%), 0 (0%)',
      'حجم, 33.51 M (96%), 1.25 M (3%), 34.75 M (99%), 10,000 (0%)',
      'حجم, 36.408 M (99%), 15,000 (0%), 28.248 M (77%), 8.175 M (22%)',
      'حجم, 32.367 M (98%), 480,000 (1%), 31.535 M (96%), 1.312 M (3%)',
      'حجم, 54.773 M (95%), 2.68 M (4%), 43.936 M (76%), 13.517 M (23%)',
      'حجم, 58.955 M (95%), 2.54 M (4%), 41.234 M (67%), 20.262 M (32%)',
      'حجم, 45.222 M (99%), 15,000 (0%), 40.215 M (88%), 5.023 M (11%)',
      'حجم, 43.487 M (97%), 1.225 M (2%), 43.902 M (98%), 810,008 (1%)',
      'حجم, 35.46 M (91%), 3.18 M (8%), 38.33 M (99%), 310,000 (0%)',
      'حجم, 39.42 M (99%), 90,927 (0%), 36.722 M (92%), 2.789 M (7%)',
      'حجم, 41.024 M (99%), 312,000 (0%), 35.814 M (86%), 5.522 M (13%)',
      'حجم, 32.718 M (99%), 277,978 (0%), 30.995 M (93%), 2.001 M (6%)',
      'حجم, 1.12 M (100%), 0 (0%), 1.12 M (100%), 0 (0%)',
      'حجم, 2.015 M (86%), 325,000 (13%), 2.34 M (100%), 0 (0%)',
      'حجم, 40.402 M (95%), 2.109 M (4%), 42.511 M (100%), 0 (0%)',
      'حجم, 41.726 M (86%), 6.372 M (13%), 48.098 M (100%), 0 (0%)',
      'حجم, 39.444 M (97%), 1.14 M (2%), 39.551 M (97%), 1.033 M (2%)',
      'حجم, 4.14 M (100%), 0 (0%), 3.14 M (75%), 1,000,000 (24%)',
      'حجم, 43.447 M (96%), 1.743 M (3%), 44.292 M (98%), 898,000 (1%)',
      'حجم, 56.023 M (98%), 864,338 (1%), 52.627 M (92%), 4.26 M (7%)',
      'حجم, 14.062 M (99%), 8,008 (0%), 12.055 M (85%), 2.015 M (14%)',
      'حجم, 56.557 M (84%), 10.413 M (15%), 66.47 M (99%), 500,000 (0%)',
      'حجم, 7.971 M (69%), 3.481 M (30%), 11.452 M (100%), 0 (0%)',
      'حجم, 38.85 M (86%), 5.864 M (13%), 44.494 M (99%), 220,000 (0%)',
      'حجم, 53.151 M (99%), 105,000 (0%), 51.039 M (95%), 2.217 M (4%)',
      'حجم, 51.861 M (79%), 13.352 M (20%), 64.603 M (99%), 610,000 (0%)',
      'حجم, 2.025 M (80%), 500,000 (19%), 2.525 M (100%), 0 (0%)',
      'حجم, 67.428 M (95%), 3.294 M (4%), 68.538 M (96%), 2.184 M (3%)',
      'حجم, 52.373 M (87%), 7.211 M (12%), 58.408 M (98%), 1.176 M (1%)',
      'حجم, 12.073 M (80%), 3.01 M (19%), 14.583 M (96%), 500,000 (3%)',
      'حجم, 47.369 M (99%), 424,000 (0%), 30.168 M (63%), 17.626 M (36%)',
      'حجم, 3.401 M (100%), 0 (0%), 1.039 M (30%), 2.363 M (69%)',
      'حجم, 52.213 M (99%), 247,000 (0%), 41.872 M (79%), 10.588 M (20%)',
      'حجم, 73.585 M (98%), 1.356 M (1%), 38.911 M (51%), 36.029 M (48%)',
      'حجم, 67.943 M (97%), 1.622 M (2%), 35.571 M (51%), 33.995 M (48%)',
      'حجم, 2.653 M (100%), 0 (0%), 2.003 M (75%), 650,000 (24%)',
      'حجم, 32.055 M (99%), 18,408 (0%), 24.301 M (75%), 7.772 M (24%)',
      'حجم, 16.989 M (98%), 209,000 (1%), 9.598 M (55%), 7.6 M (44%)',
      'حجم, 34.906 M (95%), 1.64 M (4%), 21.129 M (57%), 15.417 M (42%)',
      'حجم, 14.669 M (98%), 150,000 (1%), 7.852 M (52%), 6.967 M (47%)',
      'حجم, 23.542 M (98%), 289,600 (1%), 23.102 M (96%), 729,782 (3%)',
      'حجم, 27.87 M (98%), 450,000 (1%), 21.461 M (75%), 6.859 M (24%)',
      'حجم, 48.785 M (98%), 500,000 (1%), 30.683 M (62%), 18.603 M (37%)',
      'حجم, 22.839 M (93%), 1.518 M (6%), 16.242 M (66%), 8.115 M (33%)',
      'حجم, 15.683 M (96%), 631,500 (3%), 13.316 M (81%), 2.999 M (18%)',
      'حجم, 15.715 M (96%), 630,000 (3%), 15.436 M (94%), 908,399 (5%)',
      'حجم, 11.776 M (90%), 1.305 M (9%), 13.081 M (100%), 0 (0%)',
      'حجم, 12.492 M (85%), 2.057 M (14%), 14.149 M (97%), 400,000 (2%)',
      'حجم, 11.909 M (100%), 0 (0%), 11.818 M (99%), 91,008 (0%)',
      'حجم, 21.404 M (99%), 140,000 (0%), 17.8 M (82%), 3.744 M (17%)',
      'حجم, 22.115 M (89%), 2.718 M (10%), 21.969 M (88%), 2.864 M (11%)',
      'حجم, 23.146 M (97%), 637,396 (2%), 21.881 M (92%), 1.902 M (7%)',
      'حجم, 35.986 M (94%), 1.92 M (5%), 25.749 M (67%), 12.156 M (32%)',
      'حجم, 16.064 M (93%), 1.179 M (6%), 17.104 M (99%), 139,467 (0%)',
      'حجم, 19.314 M (85%), 3.284 M (14%), 22.408 M (99%), 189,500 (0%)',
      ... 84 more items ]
    (node:13916) UnhandledPromiseRejectionWarning: TypeError [ERR_INVALID_CALLBACK]: Callback must be a function
        at maybeCallback (fs.js:129:9)
        at Object.writeFile (fs.js:1159:14)
        at C:\Users\m\Desktop\GetData\extract.js:21:14
        at process._tickCallback (internal/process/next_tick.js:68:7)
    (node:13916) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
    (node:13916) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process