Performance 输入/输出编码：速度还是内存优先级？_Performance_Memory_Coding Style_Io

Performance 输入/输出编码：速度还是内存优先级？

performance memory coding-style io

Performance 输入/输出编码：速度还是内存优先级？,performance,memory,coding-style,io,Performance,Memory,Coding Style,Io,我目前正在写一篇简单的IO解析文章，对于如何编写它，我感到进退两难这是web应用程序的情况，在web应用程序中，多个用户可以在一秒钟内多次调用此特定解析函数假设文件大小大于2MB，每次调用的硬件IO延迟为5ms 第一种情况：内存第一种情况是为内存编码，但以牺牲速度为代价。该函数将接收文件的一小部分，并按这些部分进行解析，从而使用更多的迭代，但占用更少的内存伪代码： function parser() { Open file and put into handle variable

我目前正在写一篇简单的IO解析文章，对于如何编写它，我感到进退两难

这是web应用程序的情况，在web应用程序中，多个用户可以在一秒钟内多次调用此特定解析函数

假设文件大小大于2MB，每次调用的硬件IO延迟为5ms

第一种情况：内存第一种情况是为内存编码，但以牺牲速度为代价。该函数将接收文件的一小部分，并按这些部分进行解析，从而使用更多的迭代，但占用更少的内存

伪代码：

function parser() {
    Open file and put into handle variable fHandle
    while (file position not passed EOF) {
        read 1024 bytes from file using fHandle into variable data
        process(data)
    }
    Close file using handle fHandle
}

function parser() {
    read entire file and store into variable data
    declare parsing position variable and set to 0
    while (parsing position not past data length) {
        get position of next token and store into variable pos
        process( substring from current position to pos of data )
    }
}

function parser() {
    if (memory is too little) {
        Open file and put into handle variable fHandle
        while (file position not passed EOF) {
            read 1024 bytes from file using fHandle into variable data
            process(data)
        }
        Close file using handle fHandle
    } else {
        read entire file and store into variable data
        declare parsing position variable and set to 0
        while (parsing position not past data length) {
            get position of next token and store into variable pos
            process( substring from current position to pos of data )
        }
    }
}

第二种情况：速度第二种情况是以牺牲内存使用率为代价编写速度代码。该函数将把整个文件内容加载到内存中，并直接解析它

伪代码：

function parser() {
    Open file and put into handle variable fHandle
    while (file position not passed EOF) {
        read 1024 bytes from file using fHandle into variable data
        process(data)
    }
    Close file using handle fHandle
}

function parser() {
    read entire file and store into variable data
    declare parsing position variable and set to 0
    while (parsing position not past data length) {
        get position of next token and store into variable pos
        process( substring from current position to pos of data )
    }
}

function parser() {
    if (memory is too little) {
        Open file and put into handle variable fHandle
        while (file position not passed EOF) {
            read 1024 bytes from file using fHandle into variable data
            process(data)
        }
        Close file using handle fHandle
    } else {
        read entire file and store into variable data
        declare parsing position variable and set to 0
        while (parsing position not past data length) {
            get position of next token and store into variable pos
            process( substring from current position to pos of data )
        }
    }
}

注意：在读取整个文件时，我们使用library direct可用函数来读取整个文件。在开发人员端读取文件时不使用循环

第三种情况：最终用户选择然后建议同时写入这两个函数，并且每当函数运行时，该函数都会检测内存是否充足。如果有大量可用内存空间，该函数将使用内存密集型版本

伪代码：

function parser() {
    Open file and put into handle variable fHandle
    while (file position not passed EOF) {
        read 1024 bytes from file using fHandle into variable data
        process(data)
    }
    Close file using handle fHandle
}

function parser() {
    read entire file and store into variable data
    declare parsing position variable and set to 0
    while (parsing position not past data length) {
        get position of next token and store into variable pos
        process( substring from current position to pos of data )
    }
}

function parser() {
    if (memory is too little) {
        Open file and put into handle variable fHandle
        while (file position not passed EOF) {
            read 1024 bytes from file using fHandle into variable data
            process(data)
        }
        Close file using handle fHandle
    } else {
        read entire file and store into variable data
        declare parsing position variable and set to 0
        while (parsing position not past data length) {
            get position of next token and store into variable pos
            process( substring from current position to pos of data )
        }
    }
}

使用异步I/O（或第二个线程），在驱动器忙于获取下一个数据块时处理一个数据块。这两种方法都是最好的。

如果您需要以任何一种方式读取完整的文件，并且该文件可以毫无问题地装入内存，那么就从内存中读取它。每次都是同一个文件，还是一组小文件？将它们缓存在内存中。

如果解析的输入像通常一样来自I/O，那么任何好的解析技术，如递归下降，都将受到I/O限制。换句话说，从I/O中获取字符的平均时间应该超过处理字符的平均时间，这是一个健康的因素。所以这真的没什么关系。

唯一的区别在于你要占用多少工作存储空间，这通常不是什么大问题。

你的文件平均有多大？我提到过，假设文件最小大小为2MB。如果你需要一个精确的估计值，那么每个文件需要2MB。我会把它加载到RAM中，因为2MB不算什么。但那只是我。2MB对于一个用户来说是微不足道的。同时从1000个用户中获得2MB是件好事。啊，错过了多用户部分。抱歉，但是开发人员如何知道平均时间并进行比较，因为这是一个变量，取决于应用程序部署到的系统。我要做的就是这样。首先，我在慢速机器上进行测试，所以如果这不是一个问题，那么在快速机器上的问题就更少了。如果我有一点担心的话，我会挤出任何不必要的周期。然后，如果我仍然认为这是一个问题，我可能会考虑双缓冲，如晁建议的，对象是尽可能地保持I/O硬件忙。如果平台支持它，一个线程可以使用异步I/O。如果它不受支持，或者支持得不好，那么第二个线程将有效地实现相同的目标——即尽可能多地同时处理和读取。