Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/email/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C# XmlReader缓冲区似乎忽略了对缓冲区的更改?_C#_Streamreader_Xmlreader - Fatal编程技术网

C# XmlReader缓冲区似乎忽略了对缓冲区的更改?

C# XmlReader缓冲区似乎忽略了对缓冲区的更改?,c#,streamreader,xmlreader,C#,Streamreader,Xmlreader,也许我对应该发生什么的理解是错误的,所以希望有人能在这里纠正我的思维过程 我正在尝试处理许多大的XML文件,这些文件不断地被发送给我们,并且在文本中嵌入了错误字符(0x1A)。。。不幸的是,它的客户正在发送文件,所以不管我们问他们多么好,使文件实际上是有效的XML,他们认为这是我们的问题。 最后,我编写了一个子类StreamReader,如下所示: public class CleanTextReader : StreamReader { private readonly ILog _l

也许我对应该发生什么的理解是错误的,所以希望有人能在这里纠正我的思维过程

我正在尝试处理许多大的XML文件,这些文件不断地被发送给我们,并且在文本中嵌入了错误字符(0x1A)。。。不幸的是,它的客户正在发送文件,所以不管我们问他们多么好,使文件实际上是有效的XML,他们认为这是我们的问题。 最后,我编写了一个子类
StreamReader
,如下所示:

public class CleanTextReader : StreamReader
{
    private readonly ILog _logger;

    public CleanTextReader(Stream stream, ILog logger) : base(stream)
    {
        this._logger = logger;
    }

    public CleanTextReader(Stream stream) : this(stream, LogManager.GetLogger<CleanTextReader>())
    {
        //nothing to do here.
    }
    public override int Read(char[] buffer, int index, int count)
    {
        try
        {
            var rVal = base.Read(buffer, index, count);

            var filteredBuffer = buffer.Select(x => XmlConvert.IsXmlChar(x) ? x : ' ').ToArray();

            Buffer.BlockCopy(filteredBuffer, 0, buffer, 0, count);
            return rVal;
        }
        catch (Exception ex)
        {
            this._logger.Error("Read(char[], int, int)", ex);
            throw;
        }
    }

    public override int ReadBlock(char[] buffer, int index, int count)
    {
        try
        {
            var rVal = base.ReadBlock(buffer, index, count);
            var filteredBuffer = buffer.Select(x => XmlConvert.IsXmlChar(x) ? x : ' ').ToArray();
            Buffer.BlockCopy(filteredBuffer, 0, buffer, 0, count);
            return rVal;
        }
        catch (Exception ex)
        {
            this._logger.Error("ReadBlock(char[], in, int)", ex);
            throw;
        }
    }

    public override string ReadToEnd()
    {
        var chars = new char[4096];
        int len;
        var sb = new StringBuilder(4096);
        while ((len = Read(chars, 0, chars.Length)) != 0)
        {
            sb.Append(chars, 0, len);
        }
        return sb.ToString();
    }
}
using (var theCleanser = new CleanTextReader(myStreamedInput))
using (var theReader = XmlReader.Create(theCleanser))
{
    ...
    // do stuff with theReader
}
我有一个这样的单元测试:

    [TestMethod]
    public void CleanTextReaderCleans0X1A()
    {
        //arrange
        var originalString = "The quick brown fox jumped over the lazy dog.";
        var badChars = new string(new[] {(char) 0x1a});
        var concatenated = originalString.Replace("jumped", badChars + "jumped" + badChars);

        //act
        using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(concatenated)))
        {
            using (var reader = new CleanTextReader(stream))
            {
                var newString = reader.ReadToEnd().Trim().Replace("  ", " ");
                //assert
                Assert.IsTrue(originalString.Equals(newString));
            }
        }
    }
…这就过去了

但当我试图解析包含0x1A字符的XML文件时,仍然会得到一个
System.XML.xmleexception
:'',十六进制值0x1A,是无效字符。第XX行,位置XX

深入研究
CleanTextReader
我检查了
Read(char[],int,int)
方法,因为它似乎被
XmlReader
击中了。原始的
buffer
包含非法字符,但是
filteredBuffer
没有,并且在运行
buffer.BlockCopy()
后,
buffer
filteredBuffer
都没有特殊字符

同样值得注意的是,我发现行号和位置引用不是无效字符的第一个实例,而是第二个实例,因此它会看到第一个并更正它,但只更正一次

所以我在这里挠头。
XmlReader
如何获取特殊字符?它是在控件从方法返回之前从缓冲区读取的吗?如何解决此问题

更新

根据请求,以下是我获得的堆栈跟踪:

"System.Xml.XmlException: '', hexadecimal value 0x1A, is an invalid character. Line 84, position 38.
   at System.Xml.XmlTextReaderImpl.Throw(Exception e)
   at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
   at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
   at System.Xml.XmlTextReaderImpl.ParseText()
   at System.Xml.XmlTextReaderImpl.ParseElementContent()
   at System.Xml.XmlTextReaderImpl.Read()
   at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
   at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
   at System.Xml.Linq.XElement.ReadElementFrom(XmlReader r, LoadOptions o)
   at System.Xml.Linq.XNode.ReadFrom(XmlReader reader)
   at MyCompany.Importers.GroupEligibilityModel.Loader.<GetGroupEligibilityElements>d__2b.MoveNext() in c:\\Projects\\MyCompanyHealth\\MyCompany.Importers\\MyCompany.Importers.GroupEligibilityModel\\MyCompany.Importers.GroupEligibilityModel\\Loader.cs:line 138
   at MyCompany.Importers.GroupEligibilityModel.Loader.<GetGroupEligibilities>d__18.MoveNext() in c:\\Projects\\MyCompanyHealth\\MyCompany.Importers\\MyCompany.Importers.GroupEligibilityModel\\MyCompany.Importers.GroupEligibilityModel\\Loader.cs:line 71
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at MyCompany.Importers.GroupEligibilityModel.Test.LoadingTests.GroupEligibilityFileWithBadCharactersProperlyCleansed() in c:\\Projects\\MyCompanyHealth\\MyCompany.Importers\\MyCompany.Importers.GroupEligibilityModel\\MyCompany.Importers.GroupEligibilityModel.Test\\LoadingTests.cs:line 118"   string
“System.Xml.XmlException:”,十六进制值0x1A是无效字符。第84行,位置38。
位于System.Xml.XmlTextReaderImpl.Throw(异常e)
位于System.Xml.XmlTextReaderImpl.Throw(字符串res,字符串[]args)
位于System.Xml.XmlTextReaderImpl.ParseText(Int32&startPos、Int32&endPos、Int32&outOrChars)
在System.Xml.XmlTextReaderImpl.ParseText()处
位于System.Xml.XmlTextReaderImpl.ParseElementContent()处
位于System.Xml.XmlTextReaderImpl.Read()处
位于System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
位于System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r,LoadOptions o)
位于System.Xml.Linq.XElement.ReadElementFrom(XmlReader r,LoadOptions o)
位于System.Xml.Linq.XNode.ReadFrom(XmlReader)
在c:\\Projects\\MyCompanyHealth\\MyCompany.Importers\\MyCompany.Importers.GroupEligibilityModel\\MyCompany.Importers.GroupEligibilityModel\\MyCompany.Importers.GroupEligibilityModel\\Loader.cs中的MyCompany.Importers.GroupEligibilityModel\\Loader.cs:第138行
在c:\\Projects\\MyCompanyHealth\\MyCompany.Importers\\MyCompany.Importers.GroupEligibilityModel\\MyCompany.Importers.GroupEligibilityModel\\MyCompany.Importers.GroupEligibilityModel\\Loader.d_18.MoveNext()中:第71行
位于System.Collections.Generic.List`1..ctor(IEnumerable`1集合)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
在c:\\Projects\\MyCompanyHealth\\MyCompany.Importers\\MyCompany.Importers\\MyCompany.Importers\\MyCompany.Importers\\MyCompany.Importers.GroupEligibilityModel\\MyCompany.Importers.GroupEligibilityModel\\MyCompany.Importers.GroupEligibilityModel.Test\\LoadingTests.cs:118行“字符串中的MyCompany.Importers.GroupEligibilityModel

您能否包含System.Xml.XmlException的调用堆栈?另外,我无法重现您的问题,请提供一个简单的Xml示例。确切指定文件中的字符/字节,例如通过从十六进制编辑器发布快照。@helb我添加了堆栈跟踪,但xml文件更令人担忧,因为它是客户数据。我将看看是否可以生成一个包含失败的坏字符的xml文件。我很难将异常调用堆栈与您发布的代码关联起来。哪些行准确地抛出?我认为当使用
var node=XNode.ReadFrom(theReader)作为XElement使用节点时会抛出错误我认为这不太相关,因为如果过滤器工作正常,就不会爆炸。