C# 如何写一个“"；“过滤器”；XML的流包装器？_C#_Xml_Filter_Stream_Wrapper

C# 如何写一个“"；“过滤器”；XML的流包装器？

c# xml filter stream

C# 如何写一个“"；“过滤器”；XML的流包装器？,c#,xml,filter,stream,wrapper,C#,Xml,Filter,Stream,Wrapper,我有一些大型XML提要文件，其中包含非法字符（0x1等）。这些文件是第三方的，我无法更改写入过程我想使用XmlReader来处理这些文件，但它会在这些非法字符上爆炸我可以读取文件，过滤掉坏字符，保存它们，然后处理它们。。。但这是大量的I/O，而且似乎不必要我想做的是这样的： using(var origStream = File.OpenRead(fileName)) using(var cleanStream = new CleansedXmlStream(origStream)) us

我有一些大型XML提要文件，其中包含非法字符（0x1等）。这些文件是第三方的，我无法更改写入过程

我想使用

XmlReader

来处理这些文件，但它会在这些非法字符上爆炸

我可以读取文件，过滤掉坏字符，保存它们，然后处理它们。。。但这是大量的I/O，而且似乎不必要

我想做的是这样的：

using(var origStream = File.OpenRead(fileName))
using(var cleanStream = new CleansedXmlStream(origStream))
using(var streamReader = new StreamReader(cleanStream))
using(var xmlReader = XmlReader.Create(streamReader))
{
    //do stuff with reader
}

[TestMethod]
public void CleanTextReaderCleans()
{
    //arrange
    var originalString = "The quick brown fox jumped over the lazy dog.";
    var badChars = new string(new[] {(char) 0x1});
    var concatenated = string.Concat(badChars, originalString);

    //act
    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(concatenated)))
    {
        using (var reader = new CleanTextReader(stream))
        {
            var newString = reader.ReadToEnd().Trim();
            //assert
            Assert.IsTrue(originalString.Equals(newString));
        }
    }
}

using(var origStream = File.OpenRead(fileName))
using(var streamReader = new CleanTextReader(origStream))
using(var xmlReader = XmlReader.Create(streamReader))
{
    //do stuff with reader
}

我尝试从

流继承

，但当我开始实现

读取（byte[]buffer，int offset，int count）

时，我失去了一些信心。毕竟，我正计划删除字符，因此计数似乎会被关闭，我必须将每个字节转换为一个

char

，这似乎很昂贵（尤其是在大型文件上），我不清楚这将如何与Unicode编码一起工作，但我的问题的答案并不直观

在谷歌搜索“c#stream wrapper”或“c#filter stream”时，我没有得到令人满意的结果。有可能我用了错误的词或描述了错误的概念，所以我希望so社区能让我明白过来

使用上面的示例，

cleanedxmlstream

看起来像什么

以下是我的第一次尝试：

public class CleansedXmlStream : Stream
{
    private readonly Stream _baseStream;

    public CleansedXmlStream(Stream stream)
    {
        this._baseStream = stream;
    }

    public new void Dispose()
    {
        if (this._baseStream != null)
        {
            this._baseStream.Dispose();
        }
        base.Dispose();
    }

    public override bool CanRead
    {
        get { return this._baseStream.CanRead; }
    }

    public override bool CanSeek
    {
        get { return this._baseStream.CanSeek; }
    }

    public override bool CanWrite
    {
        get { return this._baseStream.CanWrite; }
    }

    public override long Length
    {
        get { return this._baseStream.Length; }
    }

    public override long Position
    {
        get { return this._baseStream.Position; }
        set { this._baseStream.Position = value; }
    }

    public override void Flush()
    {
        this._baseStream.Flush();
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        //what does this look like?

        throw new NotImplementedException();
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        return this._baseStream.Seek(offset, origin);
    }

    public override void SetLength(long value)
    {
        this._baseStream.SetLength(value);
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        throw new NotSupportedException();
    }
}

受到@CharlesMager评论的启发，我最终没有制作一个

流

，而是制作了一个

流阅读器

，就像这样：

public class CleanTextReader : StreamReader
{
    private readonly ILog _logger;

    public CleanTextReader(Stream stream, ILog logger) : base(stream)
    {
        this._logger = logger;
    }

    public CleanTextReader(Stream stream) : this(stream, LogManager.GetLogger<CleanTextReader>())
    {
        //nothing to do here.
    }

    /// <summary>
    ///     Reads a specified maximum of characters from the current stream into a buffer, beginning at the specified index.
    /// </summary>
    /// <returns>
    ///     The number of characters that have been read, or 0 if at the end of the stream and no data was read. The number
    ///     will be less than or equal to the <paramref name="count" /> parameter, depending on whether the data is available
    ///     within the stream.
    /// </returns>
    /// <param name="buffer">
    ///     When this method returns, contains the specified character array with the values between
    ///     <paramref name="index" /> and (<paramref name="index + count - 1" />) replaced by the characters read from the
    ///     current source.
    /// </param>
    /// <param name="index">The index of <paramref name="buffer" /> at which to begin writing. </param>
    /// <param name="count">The maximum number of characters to read. </param>
    /// <exception cref="T:System.ArgumentException">
    ///     The buffer length minus <paramref name="index" /> is less than
    ///     <paramref name="count" />.
    /// </exception>
    /// <exception cref="T:System.ArgumentNullException"><paramref name="buffer" /> is null. </exception>
    /// <exception cref="T:System.ArgumentOutOfRangeException">
    ///     <paramref name="index" /> or <paramref name="count" /> is
    ///     negative.
    /// </exception>
    /// <exception cref="T:System.IO.IOException">An I/O error occurs, such as the stream is closed. </exception>
    public override int Read(char[] buffer, int index, int count)
    {
        try
        {
            var rVal = base.Read(buffer, index, count);
            var filteredBuffer = buffer.Select(x => XmlConvert.IsXmlChar(x) ? x : ' ').ToArray();
            Buffer.BlockCopy(filteredBuffer, 0, buffer, 0, count);
            return rVal;
        }
        catch (Exception ex)
        {
            this._logger.Error("Read(char[], int, int)", ex);
            throw;
        }
    }

    /// <summary>
    ///     Reads a maximum of <paramref name="count" /> characters from the current stream, and writes the data to
    ///     <paramref name="buffer" />, beginning at <paramref name="index" />.
    /// </summary>
    /// <returns>
    ///     The position of the underlying stream is advanced by the number of characters that were read into
    ///     <paramref name="buffer" />.The number of characters that have been read. The number will be less than or equal to
    ///     <paramref name="count" />, depending on whether all input characters have been read.
    /// </returns>
    /// <param name="buffer">
    ///     When this method returns, this parameter contains the specified character array with the values
    ///     between <paramref name="index" /> and (<paramref name="index" /> + <paramref name="count" /> -1) replaced by the
    ///     characters read from the current source.
    /// </param>
    /// <param name="index">The position in <paramref name="buffer" /> at which to begin writing.</param>
    /// <param name="count">The maximum number of characters to read. </param>
    /// <exception cref="T:System.ArgumentNullException"><paramref name="buffer" /> is null. </exception>
    /// <exception cref="T:System.ArgumentException">
    ///     The buffer length minus <paramref name="index" /> is less than
    ///     <paramref name="count" />.
    /// </exception>
    /// <exception cref="T:System.ArgumentOutOfRangeException">
    ///     <paramref name="index" /> or <paramref name="count" /> is
    ///     negative.
    /// </exception>
    /// <exception cref="T:System.ObjectDisposedException">The <see cref="T:System.IO.TextReader" /> is closed. </exception>
    /// <exception cref="T:System.IO.IOException">An I/O error occurs. </exception>
    public override int ReadBlock(char[] buffer, int index, int count)
    {
        try
        {
            var rVal = base.ReadBlock(buffer, index, count);
            var filteredBuffer = buffer.Select(x => XmlConvert.IsXmlChar(x) ? x : ' ').ToArray();
            Buffer.BlockCopy(filteredBuffer, 0, buffer, 0, count);
            return rVal;
        }
        catch (Exception ex)
        {
            this._logger.Error("ReadBlock(char[], in, int)", ex);
            throw;
        }
    }

    /// <summary>
    ///     Reads the stream from the current position to the end of the stream.
    /// </summary>
    /// <returns>
    ///     The rest of the stream as a string, from the current position to the end. If the current position is at the end of
    ///     the stream, returns an empty string ("").
    /// </returns>
    /// <exception cref="T:System.OutOfMemoryException">
    ///     There is insufficient memory to allocate a buffer for the returned
    ///     string.
    /// </exception>
    /// <exception cref="T:System.IO.IOException">An I/O error occurs. </exception>
    public override string ReadToEnd()
    {
        var chars = new char[4096];
        int len;
        var sb = new StringBuilder(4096);
        while ((len = Read(chars, 0, chars.Length)) != 0)
        {
            sb.Append(chars, 0, len);
        }
        return sb.ToString();
    }
}

。。。用法如下所示：

using(var origStream = File.OpenRead(fileName))
using(var cleanStream = new CleansedXmlStream(origStream))
using(var streamReader = new StreamReader(cleanStream))
using(var xmlReader = XmlReader.Create(streamReader))
{
    //do stuff with reader
}

[TestMethod]
public void CleanTextReaderCleans()
{
    //arrange
    var originalString = "The quick brown fox jumped over the lazy dog.";
    var badChars = new string(new[] {(char) 0x1});
    var concatenated = string.Concat(badChars, originalString);

    //act
    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(concatenated)))
    {
        using (var reader = new CleanTextReader(stream))
        {
            var newString = reader.ReadToEnd().Trim();
            //assert
            Assert.IsTrue(originalString.Equals(newString));
        }
    }
}

using(var origStream = File.OpenRead(fileName))
using(var streamReader = new CleanTextReader(origStream))
using(var xmlReader = XmlReader.Create(streamReader))
{
    //do stuff with reader
}

如果有人提出改进建议，我很乐意听取。

我尝试了@JeremyHolovacs流实现，但仍然不足以满足我的用例：

使用（var fstream=File.OpenRead（dlpath））
{
使用（var zstream=new GZipStream（fstream，CompressionMode.Decompress））
{
使用（var xstream=newcleanTextReader（zstream））
{
var ser=新的XmlSerializer（typeof（MyType））；
prods=ser.Deserialize（XmlReader.Create（xstream，newxmlreadersettings（）{CheckCharacters=false}））作为MyType；
}
}
}

不知何故，并非所有相关的重载都已实现。我对课程进行了如下调整，效果如预期：

公共类CleanTextReader:StreamReader
{
公共CleanTextReader（流）：基本（流）
{
}
公共覆盖int Read（）
{
var val=base.Read（）；
返回XmlConvert.IsXmlChar（（char）val）？val：（char）'；
}
公共重写整型读取（字符[]缓冲区，整型索引，整型计数）
{
var ret=base.Read（缓冲区、索引、计数）；
for（int i=0；i0x01是SOH。流类默认为ASCII编码。我会将您的流类设置为UTF8。请尝试以下操作：StreamReader Stream=new StreamReader（filename，encoding.UTF8）；@jdweng per，new StreamReader（Stream）
默认为UTF8，因此这没有什么区别。也许您需要在更高的抽象级别上工作。流
是二进制数据，而无效字符是解码该二进制数据的结果。也许您需要一个装饰性文本阅读器
而不是装饰性流
？@CharlesMager perha我会调查这个想法。