Html 以pythonic方式访问文件结构中的数据_Html_Performance_Python

Html 以pythonic方式访问文件结构中的数据

html performance python

Html 以pythonic方式访问文件结构中的数据,html,performance,python,Html,Performance,Python,我想以最有效的方式访问目录20中存储的.txt文件~1000中的每个值~10000。当数据被抓取时，我想把它们放在一个HTML字符串中。我这样做是为了显示一个HTML页面，其中包含每个文件的表。伪： fh=open('MyHtmlFile.html','w') fh.write('''<head>Lots of tables</head><body>''') for eachDirectory in rootFolder:

我想以最有效的方式访问目录20中存储的.txt文件~1000中的每个值~10000。当数据被抓取时，我想把它们放在一个HTML字符串中。我这样做是为了显示一个HTML页面，其中包含每个文件的表。伪：

    fh=open('MyHtmlFile.html','w')
    fh.write('''<head>Lots of tables</head><body>''')
    for eachDirectory in rootFolder:

        for eachFile in eachDirectory:
            concat=''

            for eachData in eachFile:
               concat=concat+<tr><td>eachData</tr></td>
            table='''
                  <table>%s</table>
                  '''%(concat)
        fh.write(table)
    fh.write('''</body>''')
    fh.close()

一定有更好的办法，我想这需要永远！我已经检查了集合并阅读了一些关于哈希表的内容，但在挖掘漏洞之前，我更愿意询问专家

谢谢你抽出时间！

/卡尔

你为什么认为这需要永远？您正在读取文件，然后将其打印出来—这几乎是您唯一的需求—这就是您所做的全部工作。您可以通过以下几种方式调整脚本：读取块而不是行、调整缓冲区、打印而不是串联，等等，但是如果您不知道现在要花多少时间，您如何知道什么更好/更糟

首先分析脚本，然后找出脚本是否太慢，然后找一个速度慢的地方，然后再优化或询问优化问题。

为什么你认为这会花很长时间？您正在读取文件，然后将其打印出来—这几乎是您唯一的需求—这就是您所做的全部工作。

import os, os.path
# If you're on Python 2.5 or newer, use 'with'
# needs 'from __future__ import with_statement' on 2.5
fh=open('MyHtmlFile.html','w')
fh.write('<html>\r\n<head><title>Lots of tables</title></head>\r\n<body>\r\n')
# this will recursively descend the tree
for dirpath, dirname, filenames in os.walk(rootFolder):
    for filename in filenames:
        # again, use 'with' on Python 2.5 or newer
        infile = open(os.path.join(dirpath, filename))
        # this will format the lines and join them, then format them into the table
        # If you're on Python 2.6 or newer you could use 'str.format' instead
        fh.write('<table>\r\n%s\r\n</table>' % 
                     '\r\n'.join('<tr><td>%s</tr></td>' % line for line in infile))
        infile.close()
fh.write('\r\n</body></html>')
fh.close()

您可以通过以下几种方式调整脚本：读取块而不是行、调整缓冲区、打印而不是串联，等等，但是如果您不知道现在要花多少时间，您如何知道什么更好/更糟

首先分析脚本，然后找出脚本是否太慢，然后找到脚本速度慢的地方，然后再优化或询问优化问题。

只是一个提示：对于大量字符串，绝对不鼓励使用+=连接字符串。@jellybean如何提供连接字符串的替代方法？将它们全部添加到列表mylist中，然后，把它们列出来afterwards@Kberg你试过我的方法吗？它应该比字符串连接更快，并使用os.walk，这是遍历目录树的Puthonic方法。只是一个提示：对于大量字符串，绝对不建议使用+=连接字符串。@jellybean提供连接字符串的替代方法如何？将它们全部附加到列表mylist，然后，把它们列出来afterwards@Kberg你试过我的方法吗？它应该比字符串连接更快，并使用os.walk，这是遍历目录树的Puthonic方法。我不是在寻找代码的优化，我在寻找一种不同的更高效或优雅的解决方案。我现在已经实现了伪代码，所用的时间是215秒。答案是，在开始分析之前，只有一个解决方案：您希望从所有文件中读取所有数据，唯一的方法是从所有文件中读取所有数据。如果不做后者，就不可能做前者。我不是在寻找代码的优化，我在寻找一个不同的更有效或优雅的解决方案。我现在已经实现了伪代码，所用的时间是215秒。答案是，在开始分析之前，只有一个解决方案：您希望从所有文件中读取所有数据，唯一的方法是从所有文件中读取所有数据。如果不做后者，就不可能做前者。

import os, os.path
# If you're on Python 2.5 or newer, use 'with'
# needs 'from __future__ import with_statement' on 2.5
fh=open('MyHtmlFile.html','w')
fh.write('<html>\r\n<head><title>Lots of tables</title></head>\r\n<body>\r\n')
# this will recursively descend the tree
for dirpath, dirname, filenames in os.walk(rootFolder):
    for filename in filenames:
        # again, use 'with' on Python 2.5 or newer
        infile = open(os.path.join(dirpath, filename))
        # this will format the lines and join them, then format them into the table
        # If you're on Python 2.6 or newer you could use 'str.format' instead
        fh.write('<table>\r\n%s\r\n</table>' % 
                     '\r\n'.join('<tr><td>%s</tr></td>' % line for line in infile))
        infile.close()
fh.write('\r\n</body></html>')
fh.close()