Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/347.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在Python中循环文件并删除部分文件?_Python_Python 3.x - Fatal编程技术网

如何在Python中循环文件并删除部分文件?

如何在Python中循环文件并删除部分文件?,python,python-3.x,Python,Python 3.x,我有如下数据结构 <?xml version='1.0' encoding='UTF-8'?> <corpus name="corpus"> <recording audio="audio.wav" name="first audio"> <segment name="1" start="0" end="2">

我有如下数据结构

<?xml version='1.0' encoding='UTF-8'?>
<corpus name="corpus">
  <recording audio="audio.wav" name="first audio">
    <segment name="1" start="0" end="2">
        <orth>some text 1</orth>
    </segment>
    <segment name="2" start="2" end="4">
        <orth>some text 2</orth>
    </segment>
    <segment name="3" start="4" end="6">
        <orth>some text 3</orth>
    </segment>
  </recording>
</corpus>
它将删除那些具有
名称的段。例如,给定了1和3,因此已删除名为1和3的段

<?xml version='1.0' encoding='UTF-8'?>
<corpus name="corpus">
  <recording audio="audio.wav" name="first audio">
    <segment name="2" start="2" end="4">
        <orth>some text 2</orth>
    </segment>
  </recording>
</corpus>

一些文本2
到目前为止我掌握的代码

with open('file.txt', 'r') as inputFile:
    w_file = inputFile.readlines()

w_file = w_file.strip('\n')

with open('to_delete_nums.txt', 'r') as File:
    d_file = deleteFile.readlines()

d_file = d_file.strip('\n')

for line in w_file:
    if line.contains("<segment name"):
        for d in d_file:
            //if segment name is equal to d then delete that segment.

打开('file.txt',r')作为输入文件的
:
w_file=inputFile.readlines()
w_file=w_file.strip('\n')
打开('to_delete_nums.txt','r')作为文件:
d_file=deleteFile.readlines()
d_file=d_file.strip('\n')
对于w_文件中的行:
如果第行包含(“方法1(带模块):
正如所说的,使用XML解析/操作库,您可以简单快速地完成这项工作

使用模块尝试此操作,然后:

短得多:

from lxml import etree

with open("xml.txt", "r") as xml_file:
    xml_data = xml_file.read()

with open('nums.txt', 'r') as file:
    list_of_names = file.read().split("\n")

new_xml = xml_data
for each_name in list_of_names:
    tree = etree.XML(new_xml.encode())
    find_segments = tree.xpath("*//segment[@name='{}']".format(each_name))
    for each_segment in find_segments:
        each_segment.getparent().remove(each_segment)
    new_xml = str(etree.tostring(tree, pretty_print=True, xml_declaration=True), encoding="utf-8")

print(new_xml)
from lxml import etree

with open("xml.txt", "r") as xml_file:
    tree = etree.XML(xml_file.read().encode())

with open('nums.txt', 'r') as file:
    list_of_names = list(set(file.read().split("\n")))

xpath = "*//segment[{}]".format(" or ".join(["@name='{}'".format(each_name) for each_name in list_of_names]))

print(xpath)
for each_segment in tree.xpath(xpath):
    each_segment.getparent().remove(each_segment)
new_xml = str(etree.tostring(tree, pretty_print=True, xml_declaration=True), encoding="utf-8")

print(new_xml)
方法1(带模块): 正如所说的,使用XML解析/操作库,您可以简单快速地完成这项工作

使用模块尝试此操作,然后:

短得多:

from lxml import etree

with open("xml.txt", "r") as xml_file:
    xml_data = xml_file.read()

with open('nums.txt', 'r') as file:
    list_of_names = file.read().split("\n")

new_xml = xml_data
for each_name in list_of_names:
    tree = etree.XML(new_xml.encode())
    find_segments = tree.xpath("*//segment[@name='{}']".format(each_name))
    for each_segment in find_segments:
        each_segment.getparent().remove(each_segment)
    new_xml = str(etree.tostring(tree, pretty_print=True, xml_declaration=True), encoding="utf-8")

print(new_xml)
from lxml import etree

with open("xml.txt", "r") as xml_file:
    tree = etree.XML(xml_file.read().encode())

with open('nums.txt', 'r') as file:
    list_of_names = list(set(file.read().split("\n")))

xpath = "*//segment[{}]".format(" or ".join(["@name='{}'".format(each_name) for each_name in list_of_names]))

print(xpath)
for each_segment in tree.xpath(xpath):
    each_segment.getparent().remove(each_segment)
new_xml = str(etree.tostring(tree, pretty_print=True, xml_declaration=True), encoding="utf-8")

print(new_xml)

为什么不使用XML解析/操作库?您想要得到什么样的确切输出?请给那个数据记录器使用
lxml
BeautifulSoup
解析
XML
,并与树中的元素一起工作。为什么不使用XML解析/操作库?您想要得到什么样的确切输出?请给那个数据记录器使用
lxml>
BeautifulSoup
解析
XML
并处理树中的元素。谢谢您的评论!
@name='1'或@name='2'
似乎需要大量手动输入。有没有办法从文件中自动读取这些名称?在这个问题中,我说已经有一个文件包含每行一个名称。@JosephKars;是的,等等,我会写的。@JosephKars;检查我的更新并通知我me@JosephKars:再次更新检查。这一个比上一个代码短得多。使用上一个代码,我得到了
TypeError:str()最多接受1个参数(给定2个)
谢谢您的评论!
@name='1'或@name='2'
似乎需要大量的手动输入。有没有办法从文件中自动读取这些内容?在这个问题中,我说已经有一个文件包含每行一个名称。@JosephKars;是的,等等,我会写它。@JosephKars;检查我的更新并通知me@JosephKar再次更新检查。这个比上一个代码要短得多。使用上一个代码,我得到了
TypeError:str()最多接受1个参数(给定2个)
from lxml import etree

with open("xml.txt", "r") as xml_file:
    tree = etree.XML(xml_file.read().encode())

with open('nums.txt', 'r') as file:
    list_of_names = list(set(file.read().split("\n")))

xpath = "*//segment[{}]".format(" or ".join(["@name='{}'".format(each_name) for each_name in list_of_names]))

print(xpath)
for each_segment in tree.xpath(xpath):
    each_segment.getparent().remove(each_segment)
new_xml = str(etree.tostring(tree, pretty_print=True, xml_declaration=True), encoding="utf-8")

print(new_xml)