Python 3.x 如何使用Beauty Soup修改xml?

Python 3.x 如何使用Beauty Soup修改xml?,python-3.x,xml,beautifulsoup,lxml,Python 3.x,Xml,Beautifulsoup,Lxml,我试图修改xml文件中的查找数据元素。xml的一个片段如下所示: <?xml version="1.0" encoding="UTF-8"?> <Configuration> <Options> <SampleRate>1000</SampleRate> <MaxStateSize>1</MaxStateSize>

我试图修改xml文件中的查找数据元素。xml的一个片段如下所示:

    <?xml version="1.0" encoding="UTF-8"?>
<Configuration>
    <Options>
        <SampleRate>1000</SampleRate>
        <MaxStateSize>1</MaxStateSize>
        <MaxOutputSize>1</MaxOutputSize>
    </Options>

    <CustomDefinitions>
        <MyRser class="OhmicResistance">
            <Object class="LookupObj2dWithState">
                <RowState cacheref="Soc"/>
                <ColState cacheref="ThermalState"/>
                <LookupData>
                    0.02597518381655694900, 0.02513715386193249600, 0.02394715132636577100, 0.02325996676357371800, 0.02317075771456176400, 0.02277814077034603900, 0.02267913709322775700, 0.02258569292134297900, 0.02235026503875497600, 0.02222478423822949300, 0.02207606555239715500, 0.02198493491067361700, 0.02188144525929673300, 0.02167985791309091600, 0.02145797158835977700, 0.02137484908165417400, 0.02126561803424023600, 0.02124462299304301700, 0.02123310358079429400, 0.02126287857906075300, 0.02094998489960795500, 0.02073326148328196600, 0.02062489977511897100, 0.02038933084432985300;
                </LookupData>
                <MeasurementPointsRow desc="StateOfCharge">
                -5, 0, 7.100000e+00, 1.120000e+01, 16, 2.080000e+01, 2.560000e+01, 3.040000e+01, 3.520000e+01, 4.010000e+01, 4.490000e+01, 4.970000e+01, 5.450000e+01, 5.930000e+01, 6.420000e+01, 69, 7.380000e+01, 7.860000e+01, 8.350000e+01, 8.830000e+01, 9.310000e+01, 9.770000e+01, 100, 105
                </MeasurementPointsRow>
                <MeasurementPointsColumn desc="ThermalState">
                25
                </MeasurementPointsColumn>
            </Object>
        </MyRser>
但是,当我这样做时,特定的修改已经完成,但是它改变了xml结构

 <?xml version="1.0" encoding="UTF-8"?><html><body><configuration>
<options>
<samplerate>1000</samplerate>
<maxstatesize>1</maxstatesize>
<maxoutputsize>1</maxoutputsize>
</options>
<customdefinitions>
<myrser class="OhmicResistance">
<object class="LookupObj2dWithState">
<rowstate cacheref="Soc"></rowstate>
<colstate cacheref="ThermalState"></colstate>
0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339
<measurementpointsrow desc="StateOfCharge">
                -5, 0, 7.100000e+00, 1.120000e+01, 16, 2.080000e+01, 2.560000e+01, 3.040000e+01, 3.520000e+01, 4.010000e+01, 4.490000e+01, 4.970000e+01, 5.450000e+01, 5.930000e+01, 6.420000e+01, 69, 7.380000e+01, 7.860000e+01, 8.350000e+01, 8.830000e+01, 9.310000e+01, 9.770000e+01, 100, 105
                </measurementpointsrow>
<measurementpointscolumn desc="ThermalState">
                25
                </measurementpointscolumn>
</object>
</myrser>

1000
1.
1.
0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339, 0.02217779408499339
-5,0,7.100000e+00,1.120000e+01,16,2.080000e+01,2.560000e+01,3.040000e+01,3.520000e+01,4.010000e+01,4.490000e+01,4.970000e+01,5.450000e+01,5.930000e+01,6.420000e+01,69,7.380000e+01,7.860000e+01,8.350000e+01,8.830000e+01,9.310000e+01,9.770000e+105
25
我想保留结构,只修改数据。我知道这可以通过ElementTree来完成,但是我需要我的代码如何工作,beautifulsoup更易于使用。因此,如果只考虑使用beautifulsoup,如何编辑和保存xml的副本而不丢失xml的原始结构?
任何帮助都将不胜感激。

要保留XML结构,请在写入文件时使用以下方法:

file.write(str(soup.prettify()))

注:

  • BeautifulSoup
    将XML标记转换为小写

  • 由于您使用上下文管理器打开文件,因此无需使用
    file.close()
    关闭文件,文件将在退出缩进块时自动关闭


  • 使用lxml,您可以执行以下操作:

    from lxml import etree
    config = """[your xml above, corrected - it's not well formed]"""
    new_values = "1,2,3,4"
    doc = etree.XML(config.encode())
    target = doc.xpath('//LookupData')[0]
    target.text = new_values
    print(etree.tostring(doc).decode())
    
    输出:

    <Configuration>
        <Options>
            <SampleRate>1000</SampleRate>
            <MaxStateSize>1</MaxStateSize>
            <MaxOutputSize>1</MaxOutputSize>
        </Options>
    
        <CustomDefinitions>
            <MyRser class="OhmicResistance">
                <Object class="LookupObj2dWithState">
                    <RowState cacheref="Soc"/>
                    <ColState cacheref="ThermalState"/>
                    <LookupData>1,2,3,4</LookupData>
                    <MeasurementPointsRow desc="StateOfCharge">
                    -5, 0, 7.100000e+00, 1.120000e+01, 16, 2.080000e+01, 2.560000e+01, 3.040000e+01, 3.520000e+01, 4.010000e+01, 4.490000e+01, 4.970000e+01, 5.450000e+01, 5.930000e+01, 6.420000e+01, 69, 7.380000e+01, 7.860000e+01, 8.350000e+01, 8.830000e+01, 9.310000e+01, 9.770000e+01, 100, 105
                    </MeasurementPointsRow>
                    <MeasurementPointsColumn desc="ThermalState">
                    25
                    </MeasurementPointsColumn>
                </Object>
            </MyRser>
            </CustomDefinitions>
        </Configuration>
    
    
    1000
    1.
    1.
    1,2,3,4
    -5,0,7.100000e+00,1.120000e+01,16,2.080000e+01,2.560000e+01,3.040000e+01,3.520000e+01,4.010000e+01,4.490000e+01,4.970000e+01,5.450000e+01,5.930000e+01,6.420000e+01,69,7.380000e+01,7.860000e+01,8.350000e+01,8.830000e+01,9.310000e+01,9.770000e+105
    25
    
    感谢您的回复。问题是通过beautifulsoap进行修改会在xml中添加一个html标记,这就是为什么,这给了我一个错误,在我试图使用和运行修改后的XML的进一步代码中关于你提到的第一点..有没有一种方法我们不能使用所有标记到小写的功能?@user14447985我不确定你是否可以更改BS不应该将标记转换为小写。查看是否将解析器更改为
    html。parser
    而不是
    lxml
    可以避免添加额外的标记,或者查看是否可以解决问题。doc.xpath中的索引值[0]表示什么……其次,对于xml中的其他元素……如果我想使用类似于:soc=doc.xpath的内容来自动执行xpath,那么('//CustomDefinitions//'+elem1+'[@class=“'+elem2+']]///MeasurementPointsRow[@desc=“StateOfCharge”]/text())…它在element2处给我错误,因为它无法读取..我如何修复…这里elem1和elem2是我要循环到第一部分的列表-
    doc.xpath('//lookupdatea')
    返回一个
    list
    (对于
    len()=1
    ,在本例中,虽然它不会影响此处的结果);您必须使用
    [0]
    索引以访问该列表中的目标节点,以便您可以修改其文本属性值。至于第二部分,我不太理解,无论如何,您可能应该将其作为一个单独的问题发布,并提供所有必要的详细信息。
    <Configuration>
        <Options>
            <SampleRate>1000</SampleRate>
            <MaxStateSize>1</MaxStateSize>
            <MaxOutputSize>1</MaxOutputSize>
        </Options>
    
        <CustomDefinitions>
            <MyRser class="OhmicResistance">
                <Object class="LookupObj2dWithState">
                    <RowState cacheref="Soc"/>
                    <ColState cacheref="ThermalState"/>
                    <LookupData>1,2,3,4</LookupData>
                    <MeasurementPointsRow desc="StateOfCharge">
                    -5, 0, 7.100000e+00, 1.120000e+01, 16, 2.080000e+01, 2.560000e+01, 3.040000e+01, 3.520000e+01, 4.010000e+01, 4.490000e+01, 4.970000e+01, 5.450000e+01, 5.930000e+01, 6.420000e+01, 69, 7.380000e+01, 7.860000e+01, 8.350000e+01, 8.830000e+01, 9.310000e+01, 9.770000e+01, 100, 105
                    </MeasurementPointsRow>
                    <MeasurementPointsColumn desc="ThermalState">
                    25
                    </MeasurementPointsColumn>
                </Object>
            </MyRser>
            </CustomDefinitions>
        </Configuration>