Node.js 如何筛选具有某些匹配属性的大型XML节点_Node.js_Xml_Xslt

Node.js 如何筛选具有某些匹配属性的大型XML节点

node.js xml xslt

Node.js 如何筛选具有某些匹配属性的大型XML节点,node.js,xml,xslt,Node.js,Xml,Xslt,我有500MB到1GB之间的大XML文件，我正在尝试过滤它们，以便只保留具有某些指定属性的节点，在本例中为Prod_id。我需要过滤大约10k个Prod_id，目前XML包含大约60k个项目目前，我在node.js中使用XSL，但速度非常慢，我从未看到其中一个在30-40分钟内完成有没有办法提高这个过程的速度？ XSL不是一个要求，我可以使用一切 XML示例： <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <

我有500MB到1GB之间的大XML文件，我正在尝试过滤它们，以便只保留具有某些指定属性的节点，在本例中为Prod_id。我需要过滤大约10k个Prod_id，目前XML包含大约60k个项目

目前，我在node.js中使用XSL，但速度非常慢，我从未看到其中一个在30-40分钟内完成

有没有办法提高这个过程的速度？ XSL不是一个要求，我可以使用一切

XML示例：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<products>
    <Product Quality="approved" Name="WL6A6" Title="BeBikes comfort WL6A6" Prod_id="BBKBECOMFORTWL6A6">
        <CategoryFeatureGroup ID="10030">
            <FeatureGroup>
                <Name Value="Dettagli tecnici" langid="5"/>
            </FeatureGroup>
        </CategoryFeatureGroup>
        <Gallery />
    </Product>
    ...
    <Product Quality="approved" Name="WL6A6" Title="BeBikes comfort WL6A6" Prod_id="LAL733">
        <CategoryFeatureGroup ID="10030">
            <FeatureGroup>
                <Name Value="Dettagli tecnici" langid="5"/>
            </FeatureGroup>
        </CategoryFeatureGroup>
        <Gallery />
    </Product>
</products>

我正在使用的XSL

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>  
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="
         products/Product
         [not(@Prod_id='CEESPPRIVAIPHONE4')]
         ...
         [not(@Prod_id='LAL733')]"
   />
</xsl:stylesheet>

谢谢

我用类似于这个答案的方法解决了这个问题

private static void filter(InputStream fileInputStream, final Set<String> prodIdToExclude) throws SAXException, TransformerException, FileNotFoundException {
        XMLReader xr = new XMLFilterImpl(XMLReaderFactory.createXMLReader()) {
            private boolean skip;

            @Override
            public void startElement(String uri, String localName, String qName, Attributes atts)
                    throws SAXException {
                if (qName.equals("Product")) {
                    String prodId = atts.getValue("Prod_id");
                    if (prodIdToExclude.contains(prodId)) {
                        skip = true;
                    } else {
                        super.startElement(uri, localName, qName, atts);
                        skip = false;
                    }
                } else {
                    if (!skip) {
                        super.startElement(uri, localName, qName, atts);
                    }
                }
            }

            public void endElement(String uri, String localName, String qName) throws SAXException {
                if (!skip) {
                    super.endElement(uri, localName, qName);
                }
            }

            @Override
            public void characters(char[] ch, int start, int length) throws SAXException {
                if (!skip) {
                    super.characters(ch, start, length);
                }
            }
        };
        Source src = new SAXSource(xr, new InputSource(fileInputStream));
        Result res = new StreamResult(new FileOutputStream("output.xml"));
        TransformerFactory.newInstance().newTransformer().transform(src, res);
    }

谢谢

private static void filter(InputStream fileInputStream, final Set<String> prodIdToExclude) throws SAXException, TransformerException, FileNotFoundException {
        XMLReader xr = new XMLFilterImpl(XMLReaderFactory.createXMLReader()) {
            private boolean skip;

            @Override
            public void startElement(String uri, String localName, String qName, Attributes atts)
                    throws SAXException {
                if (qName.equals("Product")) {
                    String prodId = atts.getValue("Prod_id");
                    if (prodIdToExclude.contains(prodId)) {
                        skip = true;
                    } else {
                        super.startElement(uri, localName, qName, atts);
                        skip = false;
                    }
                } else {
                    if (!skip) {
                        super.startElement(uri, localName, qName, atts);
                    }
                }
            }

            public void endElement(String uri, String localName, String qName) throws SAXException {
                if (!skip) {
                    super.endElement(uri, localName, qName);
                }
            }

            @Override
            public void characters(char[] ch, int start, int length) throws SAXException {
                if (!skip) {
                    super.characters(ch, start, length);
                }
            }
        };
        Source src = new SAXSource(xr, new InputSource(fileInputStream));
        Result res = new StreamResult(new FileOutputStream("output.xml"));
        TransformerFactory.newInstance().newTransformer().transform(src, res);
    }

是否要使用node.js执行此操作？或者任何工具/编程语言/平台？任何免费的工具/语言/平台都是可以的，因为您知道其结构，并且只想通过转发来识别您想要保留或删除的产品元素，一个基于XmlReader或SAX的代码可能会有所帮助，对于Python，在中回答了一个类似的问题。当然，XSLT也可以做到这一点，但仅转发，而不是基于树的XSLT仅在XSLT3中提供，其中包含流式处理，您需要Saxon EE，并且有试用许可证。对于具有免费处理器的普通XSLT 1或2，您可以尝试使用一个键是否可以加快速度，您选择的处理器似乎不支持它们。Saxon For node.js目前还不可用，但希望几周后即可使用。它不会提供流式传输，因此对于这么大的文档，您仍然需要大量内存。如果您需要流式XSLT处理器，则必须调用Java，例如通过HTTP请求。在您的情况下，最好使用@MartinHonnen建议的SAX方法。