Python 从一批文件中提取文本,并将其写入Excel文件
(环境:Python 2.7.6 Shell空闲+BeautifulSoup 4.3.2+) 我想从一批文件(大约50个文件)中提取一些文本,并将它们很好地放入Excel文件中,可以是逐行,也可以是逐列 每个文件中的文本示例包含以下内容:Python 从一批文件中提取文本,并将其写入Excel文件,python,Python,(环境:Python 2.7.6 Shell空闲+BeautifulSoup 4.3.2+) 我想从一批文件(大约50个文件)中提取一些文本,并将它们很好地放入Excel文件中,可以是逐行,也可以是逐列 每个文件中的文本示例包含以下内容: <tr> <td width=25%> Arnold Ed </td> <td width=15%> 18 Feb 1959 &l
<tr>
<td width=25%>
Arnold Ed
</td>
<td width=15%>
18 Feb 1959
</td>
</tr>
<tr>
<td width=15%>
男性
</td>
<td width=15%>
02 March 2002
</td>
</tr>
<tr>
<td width=15%>
Guangxi
</td>
</tr>
实际上,它只是将最后一段文本写入Excel文件。那么,我怎样才能正确地完成它呢
在laike9m的帮助下,最终版本是:
list_open = open("c:\\file list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)
for i,each_file in enumerate(line_in_list):
page = open(each_file)
soup = BeautifulSoup(page.read())
all_texts = soup.find_all("td")
for j,a_t in enumerate(all_texts):
a = a_t.renderContents()
sheet.write (i, j, a)
book.save("C:\\details.xls")
您没有将最后四行放入
for
循环。我想这就是为什么它只将最后一段文本写入Excel文件的原因
from bs4 import BeautifulSoup
import xlwt
list_open = open("c:\\file list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
for each_file in line_in_list:
page = open(each_file)
soup = BeautifulSoup(page.read())
all_texts = soup.find_all("td")
for a_t in all_texts:
a = a_t.renderContents()
#"print a" here works ok
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)
sheet.write (0, 0, a)
book.save("C:\\details.xls")
编辑
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)
for i, each_file in enumerate(line_in_list):
page = open(each_file)
soup = BeautifulSoup(page.read())
all_texts = soup.find_all("td")
for j, a_t in enumerate(all_texts):
a = a_t.renderContents()
sheet.write(i, j, a)
book.save("C:\\details.xls")
来吧,谢谢你的回复。您是对的,最后4行应包括在for循环中。但是,它不能总是写在(0,0)单元格中。有什么帮助吗?再次谢谢你。中文是我工作的语言之一。我运行了修改过的代码,但它给出了“异常:重复的工作表名称u'namelist'”。@MarkK等一下,我会测试它。Laike9m,您非常有帮助!我拉动整条线,在forloop开始之前移动它。它起作用了!!现在让我来消化一下。再次感谢你。顺便问一下,你是日本人吗?