gzip解压java的改进_Java_Oracle_Gzip

gzip解压java的改进

java oracle

gzip解压java的改进,java,oracle,gzip,Java,Oracle,Gzip,场景：我在oracle数据库中有将近1500万条记录，每条记录都有一列经过压缩。任务是导出相同的表，但列值已解压缩。我的解决步骤如下： - Read a chunk of data using jdbcTemplate (returns List) - For each of the record above decompress the column value and form an updated list - Use the above list to insert into an

场景： 我在oracle数据库中有将近1500万条记录，每条记录都有一列经过压缩。任务是导出相同的表，但列值已解压缩。我的解决步骤如下：

- Read a chunk of data using jdbcTemplate (returns List)
 - For each of the record above decompress the column value and form an updated list
 - Use the above list to insert into another table (This is being executed by another thread).

因此，对于一批48842条记录，这里是分析

- Reading takes around 9 seconds
 - Writing takes around 47 seconds    
 - Compression takes around 135 seconds

根据上述分析，处理1500万条记录需要16-17个小时。有什么办法可以改进它吗？我正在寻找减压技术的巨大改进领域。在我的情况下，即使是减压技术上的微小改进也会带来巨大的不同。任何帮助都将不胜感激

下面是我正在使用的解压缩方法

public String decompressMessage(String message)
    throws Exception
    {
        ByteArrayInputStream byteArrayIPStream = null;
        GZIPInputStream gZipIPStream = null;
        BufferedReader bufferedReader = null;
        String decompressedMessage = "";
        String line="";
        byte[] compressByteArray = null;
        try{
            if(message==null || "".equals(message))
            {
                logger.error("Decompress is not possible as the string is empty");
                return "";
            }
            compressByteArray = Base64.decode(message);
            byteArrayIPStream = new ByteArrayInputStream(compressByteArray);
            gZipIPStream = new GZIPInputStream(byteArrayIPStream);
            bufferedReader = new BufferedReader(new InputStreamReader(gZipIPStream, "UTF-8"));
            while ((line = bufferedReader.readLine()) != null) {                
                decompressedMessage = decompressedMessage + line;               
              }
            return decompressedMessage;
        }
        catch(Exception e)
        {
            logger.error("Exception while decompressing the message with details {}",e);
            return "";
        }
        finally{
            line = null;
            compressByteArray = null;
            if(byteArrayIPStream!=null)
                byteArrayIPStream.close();
            if(gZipIPStream!=null)
                gZipIPStream.close();
            if(bufferedReader!=null)
                bufferedReader.close();
        }
    }

当然，最大的问题是在循环中连接字符串。字符串是不可变的，这意味着您将O（n2）时间复杂性强加给本质上是O（n）作业

用

StringWriter

替换字符串，并从输入端删除

BufferedReader

。使用

Reader#read（char[]）

然后使用

StringWriter#write（char[]）

在

StringWriter

中累积数据，最后使用

StringWriter.toString（）

获取字符串，让Oracle数据库来完成。例如：

-- NOTE: This example would be simpler if compressed_data were a RAW type...
create table matt1 ( compressed_data VARCHAR2(4000) );

-- Put 100,000 rows of compressed data in there
insert into matt1 (compressed_data)
select utl_raw.cast_to_varchar2(utl_compress.lz_compress(src => utl_raw.cast_to_raw(dbms_random.string('a',30) || 'UNCOMPRESSED_DATA' || lpad(rownum,10,'0') || dbms_random.string('a',30))))
from dual
connect by rownum <= 100000;

-- Create the uncompressed version of the table to export
create table matt1_uncompressed as
select utl_raw.cast_to_varchar2(utl_compress.lz_uncompress(src => utl_raw.cast_to_raw(compressed_data))) uncompressed_data
from matt1
where rownum <= 100000;

--- execution time was 3.448 seconds

刚开始：使用StringBuffer作为

解压消息的开始：）谢谢你，Weroy如果我能控制压缩技术，你的方法看起来不错。问题是数据已经用gzip技术压缩了，我没有这样做。oracle本身是否有gzip解压缩技术？根据oracle文档，UTL\u COMPRESS
与gzip兼容。请参阅@Matthew McPeak-它或多或少是兼容的，UTL_COMRESS不会在末尾存储校验和。而且，在11gR1（包括）之前，这个包中有一个bug，压缩比从未低于50%。关于校验和，这个链接可能对以下内容有用：。没有他的压缩数据样本，很难确定。我的观点是，Oracle确实拥有他们声称的与gzip兼容的解压例程，而且使用它们似乎比他迄今为止用Java实现的要快得多。考虑到源数据在Oracle中，而目标数据必须在Oracle中，应该尝试在Oracle中进行解压缩。Matthew，我在这里看到了一个断开连接的情况，在我的例子中，压缩文本的一个示例是“h4siaaaaaaaaaaaaaa=”，您如何直接从Oracle中解压缩它？这就是数据存储在oracle中的内容。基本上，我的问题是，如果一个文本正在被压缩（gzip压缩），那么如果它是从oracle使用UTL_COMPRESS完成的，而另一个使用Java完成的，它会是相同的吗？无论我有什么观察，它都与你所说的相矛盾。我采纳了wero（对这个问题发表评论的人）的建议，使用了StringBuffer，它得到了极大的改进。我想问你，在这里使用StringWriter合适吗？你能把你的方法更新后寄给我吗？我不能像你说的那样使用Reader。（使用StringBuffer和BufferedReader的错误是这里的下划线问题）！！！传输数据的最佳方式是通过固定大小的缓冲区readLine必须不必要地逐字符扫描输入字符，寻找换行符。
SELECT utl_compress.lz_uncompress(src =>     
utl_encode.base64_decode(utl_raw.cast_to_raw(your_table.compressed_column)))
from your_table;