C# 使用CLR和GZIP压缩行集

C# 使用CLR和GZIP压缩行集,c#,.net,tsql,sql-server-2012,sqlclr,C#,.net,Tsql,Sql Server 2012,Sqlclr,我想压缩一些包含很少读取或根本不读取的历史数据的大型表。我首先尝试使用内置压缩(行,页,列存储,列存储存档),但它们都不能压缩行外值(varchar(max),nvarchar(max)),最后尝试使用CLR解决方案 解决方案是使用用户定义的CLR类型压缩给定查询返回的整个行集 例如: CREATE TABLE Archive ( [Date] DATETIME2 DEFAULT(GETUTCDATE()) ,[Data] [dbo].[CompressedRowset] )

我想压缩一些包含很少读取或根本不读取的历史数据的大型表。我首先尝试使用内置压缩(
列存储
列存储存档
),但它们都不能压缩行外值(
varchar(max)
nvarchar(max)
),最后尝试使用
CLR
解决方案

解决方案是使用用户定义的
CLR
类型压缩给定查询返回的整个行集

例如:

CREATE TABLE Archive
(
     [Date] DATETIME2 DEFAULT(GETUTCDATE())
    ,[Data] [dbo].[CompressedRowset]
)

INSERT INTO Archive([Data])
SELECT [dbo].[CompressQueryResults]('SELECT * FROM [dbo].[A]')
它正在工作,但我遇到了以下问题:

  • 当我试图压缩一个较大的结果行集时,我得到以下错误:

    static SqlDbType ToSqlType(Type t){
        if (t == typeof(int)){
            return SqlDbType.Int;
        }
    
        ...
    
        if (t == typeof(Byte[])){
            return SqlDbType.VarBinary;
        } else {
            throw new NotImplementedException("CLR Type " + t.Name + " Not supported for conversion");
        }
    }
    
    消息0,第11级,状态0,第0行在上发生严重错误 当前命令。如果有结果,则应放弃

    此外,以下声明正在发挥作用:

    SELECT [dbo].[CompressQueryResults] ('SELECT * FROM [dbo].[LargeA]')
    
    但这些不是:

    INSERT INTO Archive
    SELECT [dbo].[CompressQueryResults] ('SELECT * FROM [dbo].[LargeA]'
    
    DECLARE @A [dbo].[CompressedRowset]
    SELECT @A = [dbo].[CompressQueryResults] ('SELECT * FROM [dbo].[LargeA]')
    
  • 为了压缩行集,
    t-sql类型
    应映射到
    .net类型
    ;不幸的是,并非所有sql类型都是如此-;我已经扩展了以下函数来处理更多类型,但是如何处理诸如
    地理
    之类的类型,例如

    static SqlDbType ToSqlType(Type t){
        if (t == typeof(int)){
            return SqlDbType.Int;
        }
    
        ...
    
        if (t == typeof(Byte[])){
            return SqlDbType.VarBinary;
        } else {
            throw new NotImplementedException("CLR Type " + t.Name + " Not supported for conversion");
        }
    }
    
以下是整个
.net
代码:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;
using System.IO.Compression;
using System.Xml.Serialization;
using System.Xml;

[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedType
    (
        Format.UserDefined
        ,IsByteOrdered = false
        ,IsFixedLength = false
        ,MaxByteSize = -1
    )
]
public struct CompressedRowset : INullable, IBinarySerialize, IXmlSerializable
{
    DataTable rowset;

    public DataTable Data
    {
        get { return this.rowset; }
        set { this.rowset = value; }
    }

    public override string ToString()
    {
        using (var sw = new StringWriter())
        using (var xw = new XmlTextWriter(sw))
        {
            WriteXml(xw);
            xw.Flush();
            sw.Flush();
            return sw.ToString();
        }
    }

    public bool IsNull
    {
        get { return (this.rowset == null);}
    }

    public static CompressedRowset Null
    {
        get
        {
            CompressedRowset h = new CompressedRowset();
            return h;
        }
    }

    public static CompressedRowset Parse(SqlString s)
    {
        using (var sr = new StringReader(s.Value))
        using (var xr = new XmlTextReader(sr))
        {
            var c = new CompressedRowset();
            c.ReadXml(xr);
            return c;
        }
    }


    #region "Stream Wrappers"
    abstract class WrapperStream : Stream
    {
        public override bool CanSeek
        {
            get { return false; }
        }

        public override bool CanWrite
        {
            get { return false; }
        }

        public override void Flush()
        {

        }

        public override long Length
        {
            get { throw new NotImplementedException(); }
        }

        public override long Position
        {
            get
            {
                throw new NotImplementedException();
            }
            set
            {
                throw new NotImplementedException();
            }
        }


        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotImplementedException();
        }

        public override void SetLength(long value)
        {
            throw new NotImplementedException();
        }


    }

    class BinaryWriterStream : WrapperStream
    {
        BinaryWriter br;
        public BinaryWriterStream(BinaryWriter br)
        {
            this.br = br;
        }
        public override bool CanRead
        {
            get { return false; }
        }
        public override bool CanWrite
        {
            get { return true; }
        }
        public override int Read(byte[] buffer, int offset, int count)
        {
            throw new NotImplementedException();
        }
        public override void Write(byte[] buffer, int offset, int count)
        {
            br.Write(buffer, offset, count);
        }
    }

    class BinaryReaderStream : WrapperStream
    {
        BinaryReader br;
        public BinaryReaderStream(BinaryReader br)
        {
            this.br = br;
        }
        public override bool CanRead
        {
            get { return true; }
        }
        public override bool CanWrite
        {
            get { return false; }
        }
        public override int Read(byte[] buffer, int offset, int count)
        {
            return br.Read(buffer, offset, count);
        }
        public override void Write(byte[] buffer, int offset, int count)
        {
            throw new NotImplementedException();
        }
    }
    #endregion

    #region "IBinarySerialize"
    public void Read(System.IO.BinaryReader r)
    {
        using (var rs = new BinaryReaderStream(r))
        using (var cs = new GZipStream(rs, CompressionMode.Decompress))
        {
            var ser = new BinaryFormatter();
            this.rowset = (DataTable)ser.Deserialize(cs);
        }
    }
    public void Write(System.IO.BinaryWriter w)
    {
        if (this.IsNull)
            return;

        rowset.RemotingFormat = SerializationFormat.Binary;
        var ser = new BinaryFormatter();
        using (var binaryWriterStream = new BinaryWriterStream(w))
        using (var compressionStream = new GZipStream(binaryWriterStream, CompressionMode.Compress))
        {
            ser.Serialize(compressionStream, rowset);
        }

    }

    #endregion

    /// <summary>
    /// This procedure takes an arbitrary query, runs it and compresses the results into a varbinary(max) blob.
    /// If the query has a large result set, then this procedure will use a large amount of memory to buffer the results in 
    /// a DataTable, and more to copy it into a compressed buffer to return. 
    /// </summary>
    /// <param name="query"></param>
    /// <param name="results"></param>
    //[Microsoft.SqlServer.Server.SqlProcedure]
    [SqlFunction(DataAccess = DataAccessKind.Read, SystemDataAccess = SystemDataAccessKind.Read, IsDeterministic = false, IsPrecise = false)]
    public static CompressedRowset CompressQueryResults(string query)
    {
        //open a context connection
        using (var con = new SqlConnection("Context Connection=true"))
        {
            con.Open();
            var cmd = new SqlCommand(query, con);
            var dt = new DataTable();
            using (var rdr = cmd.ExecuteReader())
            {
                dt.Load(rdr);
            }
            //configure the DataTable for binary serialization
            dt.RemotingFormat = SerializationFormat.Binary;
            var bf = new BinaryFormatter();

            var cdt = new CompressedRowset();
            cdt.rowset = dt;
            return cdt;


        }
    }

    /// <summary>
    /// partial Type mapping between SQL and .NET
    /// </summary>
    /// <param name="t"></param>
    /// <returns></returns>
    static SqlDbType ToSqlType(Type t)
    {
        if (t == typeof(int))
        {
            return SqlDbType.Int;
        }
        if (t == typeof(string))
        {
            return SqlDbType.NVarChar;
        }
        if (t == typeof(Boolean))
        {
            return SqlDbType.Bit;
        }
        if (t == typeof(decimal))
        {
            return SqlDbType.Decimal;
        }
        if (t == typeof(float))
        {
            return SqlDbType.Real;
        }
        if (t == typeof(double))
        {
            return SqlDbType.Float;
        }
        if (t == typeof(DateTime))
        {
            return SqlDbType.DateTime;
        }
        if (t == typeof(Int64))
        {
            return SqlDbType.BigInt;
        }
        if (t == typeof(Int16))
        {
            return SqlDbType.SmallInt;
        }
        if (t == typeof(byte))
        {
            return SqlDbType.TinyInt;
        }
        if ( t == typeof(Guid))
        {
            return SqlDbType.UniqueIdentifier;
        }
        //!!!!!!!!!!!!!!!!!!!
        if (t == typeof(Byte[]))
        {
            return SqlDbType.VarBinary;
        }   
        else
        {
            throw new NotImplementedException("CLR Type " + t.Name + " Not supported for conversion");
        }

    }

    /// <summary>
    /// This stored procedure takes a compressed DataTable and returns it as a resultset to the clinet
    /// or into a table using exec .... into ...
    /// </summary>
    /// <param name="results"></param>
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void UnCompressRowset(CompressedRowset results)
    {
        if (results.IsNull)
            return;

        DataTable dt = results.rowset;
        var fields = new SqlMetaData[dt.Columns.Count];
        for (int i = 0; i < dt.Columns.Count; i++)
        {
            var col = dt.Columns[i];
            var sqlType = ToSqlType(col.DataType);
            var colName = col.ColumnName;
            if (sqlType == SqlDbType.NVarChar || sqlType == SqlDbType.VarBinary)
            {
                fields[i] = new SqlMetaData(colName, sqlType, col.MaxLength);
            }
            else
            {
                fields[i] = new SqlMetaData(colName, sqlType);
            }
        }
        var record = new SqlDataRecord(fields);

        SqlContext.Pipe.SendResultsStart(record);
        foreach (DataRow row in dt.Rows)
        {
            record.SetValues(row.ItemArray);
            SqlContext.Pipe.SendResultsRow(record);
        }
        SqlContext.Pipe.SendResultsEnd();

    }

    public System.Xml.Schema.XmlSchema GetSchema()
    {
        return null;
    }

    public void ReadXml(System.Xml.XmlReader reader)
    {
        if (rowset != null)
        {
            throw new InvalidOperationException("rowset already read");
        }
        var ser = new XmlSerializer(typeof(DataTable));
        rowset = (DataTable)ser.Deserialize(reader);
    }

    public void WriteXml(System.Xml.XmlWriter writer)
    {
        if (String.IsNullOrEmpty(rowset.TableName))
            rowset.TableName = "Rows";

        var ser = new XmlSerializer(typeof(DataTable));
        ser.Serialize(writer, rowset);
    }
}

您是否考虑过创建一个新的“归档”数据库(可能设置为简单恢复模式),在其中转储所有旧数据?可以在查询中轻松访问,因此不存在任何问题

SELECT * FROM archive..olddata
当您创建数据库时,将其放在另一个磁盘上,并在备份过程中以不同的方式进行处理—也许您每周执行一次归档过程,然后只需要在之后进行备份—并且在您使用7zip/rar将其压缩到几乎为零之后


不过,不要尝试使用NTFS压缩来压缩数据库,SQL server不支持它。

对于最初的问题来说可能太晚了,但对于其他人来说,这可能值得考虑:在SQL server 2016中有压缩和解压缩功能(请参阅和)如果您试图存档的数据在
[N]VARCHAR
VARBINARY
列中包含较大的值,则此选项在此处可能很有用

您需要将其烘焙到业务逻辑层中,或者在SQL Server中生成某种安排,从而将未压缩的表作为视图复制到备份表(压缩值所在的位置)上,并通过
DECOMPRESS
派生未压缩的数据,并使用
而不是
触发器更新备份表(因此,除了性能差异外,该视图的行为类似于select/insert/update/delete的原始表)

对于较旧的SQL版本,您可能也可以编写一个CLR函数来完成这项工作


这种方法显然不适用于由小字段组成的数据集。当然,这种压缩方式无法在小值上实现任何效果(事实上,它会使它们变大).

序列化会产生相当大的输出,而内置的GZIpStream压缩效果很差。难怪结果不太好。我会尝试将对象[]序列化为行,以至少消除DataTable。我想本例中的想法是压缩行集,以便有更多信息进行压缩(从这里可以得到更好的压缩比),对吗?另外,我找不到任何GZip备选方案,你能建议这样做吗?我不能结束这个问题,因为它有一个开放的悬赏,但我会结束它,原因太广泛:可能的答案太多,或者好的答案对于这种格式来说太长。请添加详细信息以缩小答案集或隔离一个可能的问题请在几段中回答。因此,请提出更具体的问题。在这个庞大、无序的代码示例中,我对一些主题有答案,有意见,有顾虑……请改进它。硬盘很便宜。买一个更大的吧。@Magnus这根本不是解决方案-备份更大,总是在使用更多的数据和数据SQL Server中没有针对大型对象的内置压缩,这真是令人恼火。