惰性解析大型XML中的元素

惰性解析大型XML中的元素,xml,scala,xml-parsing,scales-xml,Xml,Scala,Xml Parsing,Scales Xml,我们正在处理这些文件。简而言之,它们是XML,包含大量数据,可能超过15GB 为了有效地处理这些文件,我们选择了库 让我给你举个例子: <?xml version="1.0" encoding="UTF-8"?> <Otds UpdateMode="Merge" xmlns="http://otds-group.org/otds" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Version="1.9.1" xsi:

我们正在处理这些文件。简而言之,它们是XML,包含大量数据,可能超过15GB

为了有效地处理这些文件,我们选择了库

让我给你举个例子:

<?xml version="1.0" encoding="UTF-8"?>
<Otds UpdateMode="Merge"
xmlns="http://otds-group.org/otds"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Version="1.9.1" xsi:schemaLocation="http://otds-group.org/otds ../xsd/otds.xsd">
 <Brands>
     ...
 </Brands>
 <Accommodations>
  <Accommodation Key="A">
   ...
   <SellingAccom>
    ...
    <PriceItems Key="1">...</PriceItems>
    ...
   </SellingAccom>
   ...
  </Accommodation>

...  <!-- A lot of <Accomodation> tags -->

  <Accommodation Key="Z">
  ...
  </Accommodation>
  <PriceItems Key="Global1"></PriceItems>   <!-- Collect all of these     -->
  <PriceItems Key="Global2"></PriceItems>
 </Accommodations>
</Otds>

...
...
...
...
...
...
...  
...
我们遇到了这个问题。XML包含大量沉重的
标记。我们将提取所有
,它们是
标记的直接子项

我创建了真正的简化文件:

<?xml version="1.0" encoding="UTF-8"?>
<Otds UpdateMode="Merge"
xmlns="http://otds-group.org/otds"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Version="1.9.1" xsi:schemaLocation="http://otds-group.org/otds ../xsd/otds.xsd">
 <Brands>
  <Brand>EBIWA</Brand>
 </Brands>
 <Accommodations>
  <Accommodation Key="ATH432">
   <SellingAccom>
    <PriceItems Key="1"></PriceItems>
   </SellingAccom>
  </Accommodation>
  <Accommodation Key="ATH433">
   <SellingAccom>
    <PriceItems Key="2"></PriceItems>
   </SellingAccom>
  </Accommodation>
  <PriceItems Key="Global"></PriceItems>
 </Accommodations>
</Otds>

艾比瓦
我目前的做法是:

  • 它为所有PriceItems返回迭代器[PriceItems],而不仅仅是预期的最后一个

    val ns = Namespace("http://otds-group.org/otds")
    val Otds = ns("Otds")
    val Accommodations = ns("Accommodations")
    val PriceItems = ns("PriceItems")
    val Accommodation = ns("Accommodation")
    
    val priceItemsPath = List(Otds, Accommodations, PriceItems)
    
    val xml = pullXml(inputstream, optimisationStrategy = QNameElemTreeOptimisation)
    
    val itr = iterate(priceItemsPath, xml)
    
    for {
      priceItems <- itr
    } yield {
      val parsedJson = parseXml(priceItems)
      val result = parsedJson.children.head.extract[PriceItems]
      result
    }
    
    val ns=命名空间(“http://otds-group.org/otds")
    val Otds=ns(“Otds”)
    val住宿=ns(“住宿”)
    val PriceItems=ns(“PriceItems”)
    val住宿=ns(“住宿”)
    val priceItemsPath=列表(OTD、住宿、PriceItems)
    val xml=pullXml(inputstream,OptimizationStrategy=QNameElemTreeOptimization)
    val itr=iterate(priceItemsPath,xml)
    为了{
    
    priceItems我们在当前实施中未找到解决方案,因此我们已自行准备更改。在空闲时间,我们将创建PR或fork.PR。我们在当前实施中未找到解决方案,因此我们已自行准备更改。在空闲时间,我们将创建PR或fork.PR