XSLT使用非常宽松的标准(EAD)处理XML

XSLT使用非常宽松的标准(EAD)处理XML,xslt,xml-parsing,standards,Xslt,Xml Parsing,Standards,我花了一周的时间试图编写XSLT代码,以处理符合(非常宽松的)XML文档 EAD文档中的有用信息很难精确定位。不同的EAD文档可以在数据树的完全不同的部分放置相同的信息位。此外,在单个EAD文档中,同一标签可以在不同位置多次用于不同的信息。有关此示例,请参见此。这使得很难设计一个单独的XSLT文件来正确处理这些不同的文件 一般来说,问题可以描述为: 如何选择位于未知位置的特定EAD节点 而不会意外选择具有相同名称()的不需要的节点 我终于把我需要的XSLT放在一起了,我认为最好在这里删除一个

我花了一周的时间试图编写XSLT代码,以处理符合(非常宽松的)XML文档

EAD文档中的有用信息很难精确定位。不同的EAD文档可以在数据树的完全不同的部分放置相同的信息位。此外,在单个EAD文档中,同一标签可以在不同位置多次用于不同的信息。有关此示例,请参见此。这使得很难设计一个单独的XSLT文件来正确处理这些不同的文件

一般来说,问题可以描述为:

  • 如何选择位于未知位置的特定EAD节点
  • 而不会意外选择具有相同
    名称()的不需要的节点
我终于把我需要的XSLT放在一起了,我认为最好在这里删除一个通用版本的代码,这样其他人就可以从中受益或改进它


我很想用“EAD”标记这个问题,但我没有足够的代表。如果有人认为适当数量的代表会有用,请这样做。

首先快速描述解决方案,然后是代码

  • 检查此EAD文档是否包含组件(子)记录(用
    指定)。如果没有,我们就不必担心重复的EAD标签。标签仍然可以埋在任意包装下。要找到它们,请参见步骤3
  • 如果存在子记录,请注意在处理其他标记之前不要处理
    标记。要查找其他标记,请参阅步骤3,然后是步骤4以处理子记录
  • 使用与之匹配的模板递归各种包装器,并在树下更远的任何元素节点上调用
    apply template
  • 我们现在正在处理一个子记录。重复步骤2(在处理此子记录的子项之前仔细处理所有其他标记),然后重复步骤4
  • 下面是我提出的XSLT代码(通用版本):

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="ISO-8859-1" indent="yes"/>
    
    <xsl:template match="/ead">
    <records>
        <xsl:if test="//dsc">
            <!-- if there are <cXX> nodes, we'll handle the main record differently.
                 <cXX> nodes are always found in the 'dsc' node, which contains nothing else -->
            <xsl:call-template name="carefully_process"/>
        </xsl:if>
        <xsl:if test="not(//dsc)">
            <record>
                <!-- Just process the existing nodes -->
                <xsl:apply-templates select="*"/>
            </record>
        </xsl:if>
    </records>
    </xsl:template>
    
    <xsl:template name="carefully_process">
        <!-- first we'll process all the nodes for the main
             record. Then we'll call the child records -->
        <record>
            <!-- have to be careful not to process //archdesc/dsc yet -->
            <xsl:apply-templates select="*[not(self::archdesc)]"/>
            <xsl:apply-templates select="archdesc/*[not(self::dsc)]"/>
    
        <!-- Now we can close off the master record, -->
        </record>
        <!-- and process the child records -->
        <xsl:apply-templates select="/ead/archdesc/dsc"/>
    </xsl:template>
    
    <xsl:template match="dsc">
        <!-- Start processing the child records (we use for-each to get a good position() -->
        <xsl:for-each select="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']">
            <xsl:apply-templates select=".">
                <!-- we pass the unittitle and unitid of the master record, so that child
                     records can be linked to it. We pass the position of the child so that
                     a unitid can be created if it doesn't exist -->
                <xsl:with-param name="partitle" select="normalize-space(/ead/archdesc/did/unittitle)"/>
                <xsl:with-param name="parid" select="normalize-space(/ead/archdesc/did/unitid)"/>
                <xsl:with-param name="pos" select="position()"/>
            </xsl:apply-templates>
        </xsl:for-each>
    </xsl:template>
    
    <!-- process child nodes -->
    <xsl:template match="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']" >
    <xsl:param name="partitle"/>
    <xsl:param name="parid"/>
    <xsl:param name="pos"/>
        <!-- start this child record -->
        <record>
    
            <!-- EAD does not require a unitid, but my code does.
                 If it doesn't exist, create it -->
            <xsl:if test="not(./did/unitid)">
                <atom name="unitid">
                    <xsl:value-of select="$parid"/><xsl:text>-</xsl:text><xsl:value-of select="$pos"/>
                </atom>
            </xsl:if>
    
            <!-- get the level of this component -->
            <atom name="eadlevel">
                <xsl:value-of select="concat(translate(substring(@level,1,1),'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ'),substring(@level,2))"/>
            </atom>
    
            <!-- Do *something* to attach this record to it's parent.
                 Probably involves $partitle and $parid. For example: -->
            <ref>
                <atom name="unittitle"><xsl:value-of select="$partitle"/></atom>
                <atom name="unitid"><xsl:value-of select="$parid"/></atom>
            </ref>
    
            <!-- now process all the other nodes -->
            <xsl:apply-templates select="*[not(starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c')]"/>
    
        <!-- finish this child record -->
        </record>
    
        <!-- prep the variables we'll need for attaching any child records (<cXX+1>) to this record -->
        <xsl:variable name="this_title">
            <xsl:value-of select="normalize-space(./did/unittitle)"/>
        </xsl:variable> 
        <xsl:variable name="this_id">
            <xsl:if test="./did/unitid">
                <xsl:value-of select="./did/unitid"/>
            </xsl:if>
            <xsl:if test="not(./did/unitid)">
                <xsl:value-of select="$parid"/><xsl:text>-</xsl:text><xsl:value-of select="$pos"/>
            </xsl:if>
        </xsl:variable>
    
        <!-- now process the children of this node -->
        <xsl:for-each select="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']">
            <xsl:apply-templates select=".">
                <xsl:with-param name="partitle" select="$this_title"/>
                <xsl:with-param name="parid" select="$this_id"/>
                <xsl:with-param name="pos" select="position()"/>
            </xsl:apply-templates>
        </xsl:for-each>
    </xsl:template>
    
    <!-- these are usually just wrappers. Go one level deeper -->
    <xsl:template match="descgrp|eadheader|revisiondesc|filedesc|titlestmt|profiledesc|archdesc|archdescgrp|daogrp|langusage|did|frontmatter">
        <xsl:apply-templates select="*"/>
    </xsl:template>
    
    <!-- below this point, add templates for processing specific EAD units
         of information. For example, the template might look like
    
    <xsl:template match="titleproper">
        <atom name="titleproper">
            <xsl:value-of select="normalize-space(.)"/>
        </atom>
    </xsl:template>
    -->
    
    <!-- instead of having a template for each EAD information unit, consider
         a generic template that handles them all the same way. For example:
    -->
    <xsl:template match="*">
        <atom>
            <xsl:attribute name="name"><xsl:value-of select="name()"/></xsl:attribute>
            <xsl:value-of select="normalize-space(.)"/>
        </atom>
    </xsl:template>
    
    </xsl:stylesheet>
    
    
    -
    -
    
    首先快速描述解决方案,然后是代码

  • 检查此EAD文档是否包含组件(子)记录(用
    指定)。如果没有,我们就不必担心重复的EAD标签。标签仍然可以埋在任意包装下。要找到它们,请参见步骤3
  • 如果存在子记录,请注意在处理其他标记之前不要处理
    标记。要查找其他标记,请参阅步骤3,然后是步骤4以处理子记录
  • 使用与之匹配的模板递归各种包装器,并在树下更远的任何元素节点上调用
    apply template
  • 我们现在正在处理一个子记录。重复步骤2(在处理此子记录的子项之前仔细处理所有其他标记),然后重复步骤4
  • 下面是我提出的XSLT代码(通用版本):

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="ISO-8859-1" indent="yes"/>
    
    <xsl:template match="/ead">
    <records>
        <xsl:if test="//dsc">
            <!-- if there are <cXX> nodes, we'll handle the main record differently.
                 <cXX> nodes are always found in the 'dsc' node, which contains nothing else -->
            <xsl:call-template name="carefully_process"/>
        </xsl:if>
        <xsl:if test="not(//dsc)">
            <record>
                <!-- Just process the existing nodes -->
                <xsl:apply-templates select="*"/>
            </record>
        </xsl:if>
    </records>
    </xsl:template>
    
    <xsl:template name="carefully_process">
        <!-- first we'll process all the nodes for the main
             record. Then we'll call the child records -->
        <record>
            <!-- have to be careful not to process //archdesc/dsc yet -->
            <xsl:apply-templates select="*[not(self::archdesc)]"/>
            <xsl:apply-templates select="archdesc/*[not(self::dsc)]"/>
    
        <!-- Now we can close off the master record, -->
        </record>
        <!-- and process the child records -->
        <xsl:apply-templates select="/ead/archdesc/dsc"/>
    </xsl:template>
    
    <xsl:template match="dsc">
        <!-- Start processing the child records (we use for-each to get a good position() -->
        <xsl:for-each select="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']">
            <xsl:apply-templates select=".">
                <!-- we pass the unittitle and unitid of the master record, so that child
                     records can be linked to it. We pass the position of the child so that
                     a unitid can be created if it doesn't exist -->
                <xsl:with-param name="partitle" select="normalize-space(/ead/archdesc/did/unittitle)"/>
                <xsl:with-param name="parid" select="normalize-space(/ead/archdesc/did/unitid)"/>
                <xsl:with-param name="pos" select="position()"/>
            </xsl:apply-templates>
        </xsl:for-each>
    </xsl:template>
    
    <!-- process child nodes -->
    <xsl:template match="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']" >
    <xsl:param name="partitle"/>
    <xsl:param name="parid"/>
    <xsl:param name="pos"/>
        <!-- start this child record -->
        <record>
    
            <!-- EAD does not require a unitid, but my code does.
                 If it doesn't exist, create it -->
            <xsl:if test="not(./did/unitid)">
                <atom name="unitid">
                    <xsl:value-of select="$parid"/><xsl:text>-</xsl:text><xsl:value-of select="$pos"/>
                </atom>
            </xsl:if>
    
            <!-- get the level of this component -->
            <atom name="eadlevel">
                <xsl:value-of select="concat(translate(substring(@level,1,1),'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ'),substring(@level,2))"/>
            </atom>
    
            <!-- Do *something* to attach this record to it's parent.
                 Probably involves $partitle and $parid. For example: -->
            <ref>
                <atom name="unittitle"><xsl:value-of select="$partitle"/></atom>
                <atom name="unitid"><xsl:value-of select="$parid"/></atom>
            </ref>
    
            <!-- now process all the other nodes -->
            <xsl:apply-templates select="*[not(starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c')]"/>
    
        <!-- finish this child record -->
        </record>
    
        <!-- prep the variables we'll need for attaching any child records (<cXX+1>) to this record -->
        <xsl:variable name="this_title">
            <xsl:value-of select="normalize-space(./did/unittitle)"/>
        </xsl:variable> 
        <xsl:variable name="this_id">
            <xsl:if test="./did/unitid">
                <xsl:value-of select="./did/unitid"/>
            </xsl:if>
            <xsl:if test="not(./did/unitid)">
                <xsl:value-of select="$parid"/><xsl:text>-</xsl:text><xsl:value-of select="$pos"/>
            </xsl:if>
        </xsl:variable>
    
        <!-- now process the children of this node -->
        <xsl:for-each select="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']">
            <xsl:apply-templates select=".">
                <xsl:with-param name="partitle" select="$this_title"/>
                <xsl:with-param name="parid" select="$this_id"/>
                <xsl:with-param name="pos" select="position()"/>
            </xsl:apply-templates>
        </xsl:for-each>
    </xsl:template>
    
    <!-- these are usually just wrappers. Go one level deeper -->
    <xsl:template match="descgrp|eadheader|revisiondesc|filedesc|titlestmt|profiledesc|archdesc|archdescgrp|daogrp|langusage|did|frontmatter">
        <xsl:apply-templates select="*"/>
    </xsl:template>
    
    <!-- below this point, add templates for processing specific EAD units
         of information. For example, the template might look like
    
    <xsl:template match="titleproper">
        <atom name="titleproper">
            <xsl:value-of select="normalize-space(.)"/>
        </atom>
    </xsl:template>
    -->
    
    <!-- instead of having a template for each EAD information unit, consider
         a generic template that handles them all the same way. For example:
    -->
    <xsl:template match="*">
        <atom>
            <xsl:attribute name="name"><xsl:value-of select="name()"/></xsl:attribute>
            <xsl:value-of select="normalize-space(.)"/>
        </atom>
    </xsl:template>
    
    </xsl:stylesheet>
    
    
    -
    -