Node.js 木偶戏中如何在对象内部创建对象数组

Node.js 木偶戏中如何在对象内部创建对象数组,node.js,arrays,object,web-scraping,puppeteer,Node.js,Arrays,Object,Web Scraping,Puppeteer,我想在产品信息对象数组中创建产品大小对象数组 这是我正在尝试的HTML树 <div class="product-thumbShim"></div><a target="_blank" href="tshirts/herenow/herenow-men-black-printed-round-neck-t-shirt/4318138/buy" style="display: block;"

我想在产品信息对象数组中创建产品大小对象数组

这是我正在尝试的HTML树

<div class="product-thumbShim"></div><a target="_blank" href="tshirts/herenow/herenow-men-black-printed-round-neck-t-shirt/4318138/buy" style="display: block;"><div class="product-imageSliderContainer"><div class="product-sliderContainer" style="display: block;"><div style="background: rgb(244, 255, 249);"><div style="height: 280px; width: 100%;"><picture class="img-responsive" style="width: 100%; height: 100%; display: block;"><source srcset="
    https://assets.myntassets.com/f_webp,dpr_1.0,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg ,
    https://assets.myntassets.com/f_webp,dpr_1.5,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 1.5x,
    https://assets.myntassets.com/f_webp,dpr_1.8,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 1.8x,
    https://assets.myntassets.com/f_webp,dpr_2.0,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.0x,
    https://assets.myntassets.com/f_webp,dpr_2.2,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.2x,
    https://assets.myntassets.com/f_webp,dpr_2.4,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.4x,
    https://assets.myntassets.com/f_webp,dpr_2.6,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.6x,
    https://assets.myntassets.com/f_webp,dpr_2.8,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg 2.8x" type="image/webp"><img src="https://assets.myntassets.com/dpr_2,q_60,w_210,c_limit,fl_progressive/assets/images/4318138/2018/5/4/11525433792765-HERENOW-Men-Black-Printed-Round-Neck-T-shirt-2881525433792598-1.jpg" class="img-responsive" alt="HERE&amp;NOW Men Black Printed Round Neck T-shirt" title="HERE&amp;NOW Men Black Printed Round Neck T-shirt" style="width: 100%; display: block;"></picture></div></div></div></div><div class="product-productMetaInfo"><h3 class="product-brand">HERE&amp;NOW</h3><h4 class="product-product">Men Printed Round Neck T-shirt</h4><h4 class="product-sizes"><!-- react-text: 396 -->Sizes: <!-- /react-text --><span class="product-sizeInventoryPresent">S, </span><span class="product-sizeInventoryPresent">M, </span><span class="product-sizeInventoryPresent">L, </span><span class="product-sizeInventoryPresent">XL, </span><span class="product-sizeInventoryPresent">XXL</span></h4><div class="product-price"><span><span class="product-discountedPrice"><!-- react-text: 405 -->Rs. <!-- /react-text --><!-- react-text: 406 -->374<!-- /react-text --></span><span class="product-strike"><!-- react-text: 408 -->Rs. <!-- /react-text --><!-- react-text: 409 -->749<!-- /react-text --></span></span><span class="product-discountPercentage">(50% OFF)</span></div></div></a><div class="image-grid-similarColorsCta product-similarItemCta"><span class="myntraweb-sprite image-grid-similarColorsIcon sprites-similarProductsIcon"></span><span class="image-grid-iconText">VIEW SIMILAR</span></div><div class="product-actions "><span class="product-actionsButton product-wishlist " style="width: 100%; text-align: center;"><!-- react-text: 416 -->wishlist<!-- /react-text --></span></div><div class="product-sizeDisplayDiv"><div class="product-sizeDisplayHeader"><span>Select a size</span><span class="myntraweb-sprite product-sizeDisplayRemoveMark sprites-remove"></span></div><div class="product-sizeButtonsContaier"><button class="product-sizeButton">S</button><button class="product-sizeButton">M</button><button class="product-sizeButton">L</button><button class="product-sizeButton">XL</button><button class="product-sizeButton">XXL</button></div></div>"
我的预期产出是

[
 {
    brandName: 'max',
    productName: 'Colourblocked Round Neck T-shirt',
    productSizes: [
             Size: 'S',
             Size: 'M',
             Size: 'L',
    ]
  }
]
当前代码:

const res = await page.$$eval(".product-base", (productInfo) =>
    productInfo.map((product) => {
        return {
            brandName: product.querySelector(".product-brand").innerText,
            productName: product.querySelector(".product-product").innerText,
            productSizes: product.querySelector(".product-sizes").innerText,
        };
    }),
);

还有,为了不让我的IP被阻止,我可以多久刮一次网站?你可以这样做:

const res = await page.$$eval(".product-base", (productInfo) =>
    productInfo.map((product) => {
        let productSizeText = product.querySelector(".product-sizes").innerText;
        let productSizeArr = productSizeText.replace('Sizes:', '').trim().split(',');
        return {
            brandName: product.querySelector(".product-brand").innerText,
            productName: product.querySelector(".product-product").innerText,
            productSizes: productSizeArr,
        };
    }),
);

你可以这样做:

const res = await page.$$eval(".product-base", (productInfo) =>
    productInfo.map((product) => {
        let productSizeText = product.querySelector(".product-sizes").innerText;
        let productSizeArr = productSizeText.replace('Sizes:', '').trim().split(',');
        return {
            brandName: product.querySelector(".product-brand").innerText,
            productName: product.querySelector(".product-product").innerText,
            productSizes: productSizeArr,
        };
    }),
);

HTML URL的第二个答案:使用
puppeter.js
可以获得如下所示的源标记URL:

let imageURLArr = await page.evaluate(() => {
    //This will get the first sourceTag of the DOM, change the value 0 according to your DOM that you are scraping if it has more source tags and is not the first source tag element
    let sourceTag = document.getElementsByTagName('source')[0];
    // check selector exists
    if (sourceTag) {
        // This will give you all the image URLs of source tag
        let imagURLs = sourceTag.getAttribute('srcset')
        return imagURLs;
    }
});

console.log(imageURLArr);

HTML URL的第二个答案:使用
puppeter.js
可以获得如下所示的源标记URL:

let imageURLArr = await page.evaluate(() => {
    //This will get the first sourceTag of the DOM, change the value 0 according to your DOM that you are scraping if it has more source tags and is not the first source tag element
    let sourceTag = document.getElementsByTagName('source')[0];
    // check selector exists
    if (sourceTag) {
        // This will give you all the image URLs of source tag
        let imagURLs = sourceTag.getAttribute('srcset')
        return imagURLs;
    }
});

console.log(imageURLArr);

你能再做一个吗help@SagarChavan这是什么?我编辑了一个问题,我无法从它显示的iton浏览器控制台中获取图像url,但在代码中它正在输出asnull@SagarChavan堆栈溢出标准是,您一次只有一个问题。你可以发布一个新的问题和详细信息,因为它不清楚听到?您共享的HTML也需要格式化。我会帮你回答一个新问题,你可以再问一个help@SagarChavan这是什么?我编辑了一个问题,我无法从它显示的iton浏览器控制台中获取图像url,但在代码中它正在输出asnull@SagarChavan堆栈溢出标准是,您一次只有一个问题。你可以发布一个新的问题和详细信息,因为它不清楚听到?您共享的HTML也需要格式化。我会帮你提出一个新问题,你需要在上面的对象中的imageUrl上贴上品牌名称、尺寸和所有可能的东西吗?@SagarChavan请提出一个新问题我的新问题我需要在上面的对象中的imageUrl上贴上品牌名称、尺寸和所有可能的东西吗?@SagarChavan请提出一个新问题我的新问题