Python仅为CSV文件写入一行_Python_Excel_Csv_Export To Csv

Python仅为CSV文件写入一行

python excel csv

Python仅为CSV文件写入一行,python,excel,csv,export-to-csv,Python,Excel,Csv,Export To Csv,很抱歉，我重申了这个问题，但是，这个问题还没有解决这不是一个非常复杂的问题，我确信这是相当直接的，但我根本看不到这个问题我通过XML文件进行解析的代码是以我想要的格式打开和读取的——最终for循环中的print语句证明了这一点例如，它输出以下内容：旋转支撑手柄D0584129 20090106 US 铰链D0584130 20090106美国锁销转盘D0584131 20090106美国这正是我希望将数据写入CSV文件的方式。但是，当我尝试将这些作为行写入CSV本身时，它只打印XML

很抱歉，我重申了这个问题，但是，这个问题还没有解决

这不是一个非常复杂的问题，我确信这是相当直接的，但我根本看不到这个问题

我通过XML文件进行解析的代码是以我想要的格式打开和读取的——最终for循环中的print语句证明了这一点

例如，它输出以下内容：

旋转支撑手柄D0584129 20090106 US

铰链D0584130 20090106美国

锁销转盘D0584131 20090106美国

这正是我希望将数据写入CSV文件的方式。但是，当我尝试将这些作为行写入CSV本身时，它只打印XML文件中最后一行中的一行，并且以这种方式：

手电筒组件，D058413820090106，美国

这是我的全部代码，因为它可能有助于理解整个过程，其中感兴趣的领域是分离的xml中的xml字符串开始的地方：

from bs4 import BeautifulSoup
import csv
import unicodecsv as csv

infile = "C:\\Users\\Grisha\\Documents\\Inventor\\2009_Data\\Jan\\ipg090106.xml"

# The first line of code defines a function "separated_xml" that will allow us to separate, read, and then finally parse the data of interest with

def separated_xml(infile):  # Defining the data reading function for each xml section - This breaks apart the xml from the start (root element <?xml...) to the next iteration of the root element 
    file = open(infile, "r")   # Used to open the xml file
    buffer = [file.readline()] # Used to read each line and placing inside vector
    
# The first for-loop is used to slice every section of the USPTO XML file to be read and parsed individually
# It is necessary because Python wishes to read only one instance of a root element but this element is found many times in each file which causes reading errors
    
    for line in file:       # Running for-loop for the opened file and searches for root elements
        if line.startswith("<?xml "):
            yield "".join(buffer)  # 1) Using "yield" allows to generate one instance per run of a root element and 2) .join takes the list (vector) "buffer" and connects an empty string to it
            buffer = []     # Creates a blank list to store the beginning of a new 'set' of data in beginning with the root element
        buffer.append(line) # Passes lines into list
    yield "".join(buffer)   # Outputs
    file.close()

# The second nested set of for-loops are used to parse the newly reformatted data into a new list

for xml_string in separated_xml(infile): # Calls the output of the separated and read file to parse the data
    soup = BeautifulSoup(xml_string, "lxml")     # BeautifulSoup parses the data strings where the XML is converted to Unicode
    pub_ref = soup.findAll("publication-reference") # Beginning parsing at every instance of a publication
    lst = []  # Creating empty list to append into
    

    with open('./output.csv', 'wb') as f:
        writer = csv.writer(f, dialect = 'excel')
    
        for info in pub_ref:  # Looping over all instances of publication
        
# The final loop finds every instance of invention name, patent number, date, and country to print and append into
        
            
                for inv_name, pat_num, date_num, country in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), soup.findAll("date"), soup.findAll("country")):
                    print(inv_name.text, pat_num.text, date_num.text, country.text)
                    lst.append((inv_name.text, pat_num.text, date_num.text, country.text))                   
                    writer.writerow([inv_name.text, pat_num.text, date_num.text, country.text])

从bs4导入美化组
导入csv
将Unicodesv导入为csv
infle=“C:\\Users\\Grisha\\Documents\\Inventor\\2009\u Data\\Jan\\ipg090106.xml”
#第一行代码定义了一个函数“separated_xml”，该函数允许我们使用
def separated_xml（infle）：#定义每个xml部分的数据读取函数-这将xml从一开始就分开（根元素我相信（无论如何是第一个工作原理）。问题的基础是，您的with open
语句属于for循环，并且使用了“wb”模式这意味着每次运行for循环时，它都会覆盖以前存在的任何内容，并在完成后只留下一行输出
我可以看到您有两种方法来处理此问题。更正确的方法是将file open语句移到最外层的for循环之外。我知道您提到您已经尝试过此方法，但问题在于细节。这会使您的更新代码看起来像这样：
    with open('./output.csv', 'wb') as f:
      writer = csv.writer(f, dialect='excel')

      for xml_string in separated_xml(infile):
        soup = BeautifulSoup(xml_string, "lxml")
        pub_ref = soup.findAll("publication-reference")
        lst = []

        for info in pub_ref:

          for inv_name, pat_num, date_num, country in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), soup.findAll("date"), soup.findAll("country")):
            print(inv_name.text, pat_num.text, date_num.text, country.text)
            lst.append((inv_name.text, pat_num.text, date_num.text, country.text))
            writer.writerow([inv_name.text, pat_num.text, date_num.text, country.text])

有一种简单、快捷、简单的方法是，将open调用中的模式改为“ab”（append，binary），而不是“wb”（write binary，它会覆盖任何现有数据）。这种方法的效率要低得多，因为每次仍然通过for循环重新打开文件，但它可能会工作
我希望这有帮助
with open('./output.csv', 'wb') as f:

只需更改“wb”->“ab”即可不覆盖
第一次循环没有工作，但是在最后2个循环之前移动打开函数。这要感谢那些帮助过的人。
没有输入文件的样本，不可能测试这个运行，但是打开一个for循环中间的输出文件SMAK，只会给你最后一个循环的数据。代码中的语句。这应该有助于调试它。如果使用“wb”，则每次迭代都会删除文件的内容。请改用“ab”…谢谢你们两位。RolfofSaxony我使用print语句在最终的for循环中解决了这一问题。我想要的每个迭代都通过了，所以看起来很好。问题是它何时实际运行y写入文件并覆盖每一行，只剩下1行。此外，@jlaur我在第一次运行bug时尝试了“ab”、“a”和“ab+”，但都没有用。除了一些原因外，这次它起了作用，我需要坐下来找出原因。谢谢。它起作用了。无论出于什么原因，我以前尝试过“ab”，但它没有起作用。感觉像是在这样工作不过，时间不多了！