Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/318.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/performance/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从一批文件中提取文本,并将其写入Excel文件_Python - Fatal编程技术网

Python 从一批文件中提取文本,并将其写入Excel文件

Python 从一批文件中提取文本,并将其写入Excel文件,python,Python,(环境:Python 2.7.6 Shell空闲+BeautifulSoup 4.3.2+) 我想从一批文件(大约50个文件)中提取一些文本,并将它们很好地放入Excel文件中,可以是逐行,也可以是逐列 每个文件中的文本示例包含以下内容: <tr> <td width=25%> Arnold Ed </td> <td width=15%> 18 Feb 1959 &l

(环境:Python 2.7.6 Shell空闲+BeautifulSoup 4.3.2+)

我想从一批文件(大约50个文件)中提取一些文本,并将它们很好地放入Excel文件中,可以是逐行,也可以是逐列

每个文件中的文本示例包含以下内容:

<tr> 
    <td width=25%>
        Arnold Ed   
    </td>
    <td width=15%>
        18 Feb 1959     
    </td>
</tr>
<tr> 
    <td width=15%>
        男性
    </td>   
    <td width=15%>
        02 March 2002   
    </td>
</tr>
<tr>
    <td width=15%>
        Guangxi         
    </td>   
</tr>
实际上,它只是将最后一段文本写入Excel文件。那么,我怎样才能正确地完成它呢


在laike9m的帮助下,最终版本是:

list_open = open("c:\\file list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")

book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)

for i,each_file in enumerate(line_in_list):
    page = open(each_file)
    soup = BeautifulSoup(page.read())

    all_texts = soup.find_all("td")

    for j,a_t in enumerate(all_texts):
        a = a_t.renderContents()
        sheet.write (i, j, a)

book.save("C:\\details.xls")

您没有将最后四行放入
for
循环。我想这就是为什么它只将最后一段文本写入Excel文件的原因

from bs4 import BeautifulSoup
import xlwt

list_open = open("c:\\file list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")


for each_file in line_in_list:
    page = open(each_file)
    soup = BeautifulSoup(page.read())

    all_texts = soup.find_all("td")

    for a_t in all_texts:
        a = a_t.renderContents()

        #"print a" here works ok

    book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
    sheet = book.add_sheet('namelist', cell_overwrite_ok = True)
    sheet.write (0, 0, a)
    book.save("C:\\details.xls")
编辑

book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)

for i, each_file in enumerate(line_in_list):
    page = open(each_file)
    soup = BeautifulSoup(page.read())

    all_texts = soup.find_all("td")

    for j, a_t in enumerate(all_texts):
        a = a_t.renderContents()                   
        sheet.write(i, j, a)

book.save("C:\\details.xls")

来吧,谢谢你的回复。您是对的,最后4行应包括在for循环中。但是,它不能总是写在(0,0)单元格中。有什么帮助吗?再次谢谢你。中文是我工作的语言之一。我运行了修改过的代码,但它给出了“异常:重复的工作表名称u'namelist'”。@MarkK等一下,我会测试它。Laike9m,您非常有帮助!我拉动整条线,在forloop开始之前移动它。它起作用了!!现在让我来消化一下。再次感谢你。顺便问一下,你是日本人吗?