如何使用openpyxl/pandas或任何python将从多个excel工作表中提取的字符串数据保存到新工作簿中？_Python_Excel_String_Openpyxl

如何使用openpyxl/pandas或任何python将从多个excel工作表中提取的字符串数据保存到新工作簿中？

python excel string

如何使用openpyxl/pandas或任何python将从多个excel工作表中提取的字符串数据保存到新工作簿中？,python,excel,string,openpyxl,Python,Excel,String,Openpyxl,Stack overflow社区的第二个问题——我还不擅长这个我试图写一些代码，将打开一系列excel文档并找到“已审核”的工作表从多个单元格中提取值将数据重新排列到新的excel工作表中，每个单独的电子表格表示为一行新的单元格我认为我已经实现了上面列表中的1和2，尽管值以字符串形式返回，这似乎会导致保存到excel时出现问题。草率的导入代码部分反映了我到目前为止探索的选项 `import sys import os import openpyxl import pandas as p

Stack overflow社区的第二个问题——我还不擅长这个

我试图写一些代码，将

打开一系列excel文档并找到“已审核”的工作表

从多个单元格中提取值

将数据重新排列到新的excel工作表中，每个单独的电子表格表示为一行新的单元格

我认为我已经实现了上面列表中的1和2，尽管值以字符串形式返回，这似乎会导致保存到excel时出现问题。草率的导入代码部分反映了我到目前为止探索的选项

`import sys
import os
import openpyxl
import pandas as pd
import numpy as np
import glob
from openpyxl.workbook import workbook
from openpyxl import load_workbook

path=r'C:\Users\longr\Desktop\pfile\sandbox' #working directory
filenames = glob.glob(path + "/*.xlsx")#lists all excel files

for file in filenames:
    
    wb1 = load_workbook(file, data_only=True)#works
    ws1=wb1['Moderated']#works
    
    for row in ws1.iter_rows(min_row=3,max_row=7,min_col=5,max_col=5):
        for cell in row:
            a=(cell.value)
            print (a) #works
            
   
    for row in ws1.iter_rows(min_row=3,max_row=7,min_col=7,max_col=7):
        for cell in row:
            b=(cell.value)
            print (b) 

print(type(a))
    
writer = pd.ExcelWriter(r'C:\users\longr\Desktop\pfile\sandbox\Out\Out.xlsx', engine='openpyxl')
df.to_excel(writer, index=True)`

到目前为止的输出…
第1页文本1（e2）
第1页文本2（e4）
第1页文本3（e5）
无
第1页文本4（e7）
第1页文本5（g3）
第1页文本6（g4）
第1页文本7（g5）
第1页文本8（g6）
第1页文本9（g7）
第2页文本1（e2）
第2页文本2（e4）
第2页文本3（e5）
无
第2页文本4（e7）
第2页文本5（g3）
第2页文本6（g4）
第2页文本7（g5）
第2页文本8（g6）
第2页文本9（g7）

我最终想要的是。。

任何帮助都将不胜感激，尤其是针对新手程序员

感谢JONAS推荐下面的代码-现在输出如下 5列，而不是我想要的9列。我也希望标题不同，所以H1/HA/Header A只是一个书签，当我第一次问这个问题时，我并不清楚这一点

Jonas-你的代码比我的代码好很多[更优雅！]

使用建议的代码

错误代码：名称错误：名称“wb”未定义，因此您可以尝试将单元格的值保存到

列表中，然后再次将此列表保存到每个excel文件的列表中，该文件将成为新的数据框：
new_df = [] #create a new list, which will be your result

for file in filenames:
    
    wb1 = load_workbook(file, data_only=True)#works
    ws1=wb1['Moderated']#works
    
    a = [] #list for values in col = 5
    b = [] #list for values in col = 7

    for row in ws1.iter_rows(min_row=3,max_row=7,min_col=5,max_col=7): #use the loop to directly get the values from column 5 and 7.
        for i, cell in enumerate(row):
            if i == 0: a.append(cell.value) # save cell of col = 5 value into list a
            if i == 2: b.append(cell.value) # save cell of col = 7 value into list b

    new_df.append(a+b) #append list a and b to your bigger list for each excel-file    


import string   
alphabet = string.ascii_uppercase[:27] # Alphabet for column names (header A, header B, ...)
    
df = pd.DataFrame(new_df, columns = ['header ' + alphabet[i] for i in range(len(new_df[0]))]) #create new DataFrame

with pd.ExcelWriter('C:\users\longr\Desktop\pfile\sandbox\Out\Out.xlsx') as writer:
    df.to_excel(writer)

这看起来好多了，但是输出是在5列上，而不是我想要的9列上。你可以在内部for循环中改变行和列的范围。目前，您正在获取第5列和第7列的最小行=3到最大行=7的值。如果您想获得更多行，请将这些值（最小行/最大行）更改为您喜欢的任何值。我可能不清楚。。。循环工作得很好-拾取所有正确的单元格。。。。excel的输出是5列，行数是电子表格的两倍。我希望每个电子表格有一行，但我似乎无法做到这一点#挫败在这种情况下，您可以将“new_df.append（a）”和“new_df.append（b）”更改为“new_df.append（a+b）”（我在我的帖子中编辑了它）。弄虚作假-得到了一个新的unicode错误-因此在文件名中添加了“r”（r'C:\ user…）并且很有效！！谢谢！所有的内容都应该被选为“已回答”-有用的等等-请让我知道我是否可以给您买啤酒！：D
new_df = [] #create a new list, which will be your result

for file in filenames:
    
    wb1 = load_workbook(file, data_only=True)#works
    ws1=wb1['Moderated']#works
    
    a = [] #list for values in col = 5
    b = [] #list for values in col = 7

    for row in ws1.iter_rows(min_row=3,max_row=7,min_col=5,max_col=7): #use the loop to directly get the values from column 5 and 7.
        for i, cell in enumerate(row):
            if i == 0: a.append(cell.value) # save cell of col = 5 value into list a
            if i == 2: b.append(cell.value) # save cell of col = 7 value into list b

    new_df.append(a+b) #append list a and b to your bigger list for each excel-file    


import string   
alphabet = string.ascii_uppercase[:27] # Alphabet for column names (header A, header B, ...)
    
df = pd.DataFrame(new_df, columns = ['header ' + alphabet[i] for i in range(len(new_df[0]))]) #create new DataFrame

with pd.ExcelWriter('C:\users\longr\Desktop\pfile\sandbox\Out\Out.xlsx') as writer:
    df.to_excel(writer)