Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/vb.net/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
.net System.StackOverflowException与LinqToHtml一起发生_.net_Vb.net_Stack Overflow - Fatal编程技术网

.net System.StackOverflowException与LinqToHtml一起发生

.net System.StackOverflowException与LinqToHtml一起发生,.net,vb.net,stack-overflow,.net,Vb.net,Stack Overflow,我继承了一个WebSpider应用程序,包含所有源代码。看来,对于普通的宣传册风格的网站(比如15页以下),该软件运行得非常好 对于其他(超过20ish页),软件在下面代码中标记的行上抛出StackOverflowException 它似乎没有利用递归,不幸的是,它不支持正在使用的LinqToHtml(SuperStarCoders)库 以下是发生异常时正在运行的代码: Private Function ExportXml(Optional ByVal _Worker As Compone

我继承了一个WebSpider应用程序,包含所有源代码。看来,对于普通的宣传册风格的网站(比如15页以下),该软件运行得非常好

对于其他(超过20ish页),软件在下面代码中标记的行上抛出StackOverflowException

它似乎没有利用递归,不幸的是,它不支持正在使用的LinqToHtml(SuperStarCoders)库

以下是发生异常时正在运行的代码:

   Private Function ExportXml(Optional ByVal _Worker As ComponentModel.BackgroundWorker = Nothing) As Boolean
    Dim _L = PopulateSEOList(_Worker)
    Try
        Dim _TmpStr As New Text.StringBuilder
        Dim _X As New XDocument, _ct As Long = 0, _Elements As Typing.SEO.Elements = Nothing
        ReportProgress(0, _Worker)
        With _TmpStr
            .Append("<?xml version=""1.0"" encoding=""UTF-8""?>")
            .Append("<o7th.Web.Design.Web.Spider>")
            For i As Long = 0 To _L.Count - 1
                _ct += 1
                .Append("   <Page>")
                .Append("       <Link>" & XmlEscape(_L(i).Link) & "</Link>")
                .Append("       <Title>" & XmlEscape(_L(i).Title) & "</Title>")
                .Append("       <Keywords>" & XmlEscape(_L(i).Keywords) & "</Keywords>")
                .Append("       <Description>" & XmlEscape(_L(i).Description) & "</Description>")
                .Append("       <Elements>")
                _Elements = _L(i).ContentElements
                If _Elements IsNot Nothing Then
                    If _Elements.H1 IsNot Nothing Then
                        .Append(<H1>
                                    <%= (From n In _Elements.H1.AsParallel()
                                        Select
                                        <Content><%= XmlEscape(n) %></Content>).ToList() %>
                                </H1>)
                    End If
                    If _Elements.H2 IsNot Nothing Then
                        .Append(<H2>
                                    <%= (From n In _Elements.H2.AsParallel()
                                        Select
                                        <Content><%= XmlEscape(n) %></Content>).ToList() %>
                                </H2>)
                    End If
                    If _Elements.H3 IsNot Nothing Then
                        .Append(<H3>
                                    <%= (From n In _Elements.H3.AsParallel()
                                        Select
                                        <Content><%= XmlEscape(n) %></Content>).ToList() %>
                                </H3>)
                    End If
                    If _Elements.H4 IsNot Nothing Then
                        .Append(<H4>
                                    <%= (From n In _Elements.H4.AsParallel()
                                        Select
                                        <Content><%= XmlEscape(n) %></Content>).ToList() %>
                                </H4>)
                    End If
                    If _Elements.H5 IsNot Nothing Then
                        .Append(<H5>
                                    <%= (From n In _Elements.H5.AsParallel()
                                        Select
                                        <Content><%= XmlEscape(n) %></Content>).ToList() %>
                                </H5>)
                    End If
                    If _Elements.H6 IsNot Nothing Then
                        .Append(<H6>
                                    <%= (From n In _Elements.H6.AsParallel()
                                        Select
                                        <Content><%= XmlEscape(n) %></Content>).ToList() %>
                                </H6>)
                    End If
                    If _Elements.UL IsNot Nothing Then
                        .Append(<UL>
                                    <%= (From n In _Elements.UL.AsParallel()
                                        Select
                                        <Content><%= ConvertToCDATA(n) %></Content>).ToList() %>
                                </UL>)
                    End If
                    If _Elements.OL IsNot Nothing Then
                        .Append(<OL>
                                    <%= (From n In _Elements.OL.AsParallel()
                                        Select
                                        <Content><%= ConvertToCDATA(n) %></Content>).ToList() %>
                                </OL>)
                    End If
                    If _Elements.STRONG IsNot Nothing Then
                        .Append(<STRONG>
                                    <%= (From n In _Elements.STRONG.AsParallel()
                                        Select
                                        <Content><%= XmlEscape(n) %></Content>).ToList() %>
                                </STRONG>)
                    End If
                    If _Elements.EM IsNot Nothing Then
                        .Append(<EM>
                                    <%= (From n In _Elements.EM.AsParallel()
                                        Select
                                        <Content><%= XmlEscape(n) %></Content>).ToList() %>
                                </EM>)
                    End If
                    If _Elements.BLOCKQUOTE IsNot Nothing Then
                        .Append(<BLOCKQUOTE>
                                    <%= (From n In _Elements.BLOCKQUOTE.AsParallel()
                                        Select
                                        <Content><%= ConvertToCDATA(n) %></Content>).ToList() %>
                                </BLOCKQUOTE>)
                    End If
                    If _Elements.A IsNot Nothing Then
                        .Append(<LINKS>
                                    <%= (From n In _Elements.A.AsParallel()
                                        Select
                                        <Content>
                                            <HREF><%= XmlEscape(n.Href) %></HREF>
                                            <REL><%= XmlEscape(n.Rel) %></REL>
                                            <TITLE><%= XmlEscape(n.Title) %></TITLE>
                                            <TARGET><%= XmlEscape(n.Target) %></TARGET>
                                            <CONTENT><%= XmlEscape(n.Content) %></CONTENT>
                                        </Content>).ToList() %>
                                </LINKS>)
                    End If
                    If _Elements.IMG IsNot Nothing Then
                        .Append(<IMAGES>
                                    <%= (From n In _Elements.IMG.AsParallel()
                                        Select
                                        <Content>
                                            <SRC><%= XmlEscape(n.Source) %></SRC>
                                            <ALT><%= XmlEscape(n.Alt) %></ALT>
                                            <TITLE><%= XmlEscape(n.Title) %></TITLE>
                                        </Content>).ToList() %>
                                </IMAGES>)
                    End If
                End If
                .Append("       </Elements>")
                .Append("       <Content><![CDATA[" & _L(i).Content.ToString() & "]]></Content>")
                .Append("   </Page>")
                ReportProgress((_ct / _L.Count) * 100, _Worker)
            Next
            .Append("</o7th.Web.Design.Web.Spider>")
        End With
        Dim _xStr As String = _TmpStr.ToString()
        _X = XDocument.Parse(_xStr)
        _X.Save(ExportPath & "site.xml")
        _X = Nothing
        ReportProgress(100, _Worker)
        Return True
    Catch ex As Exception
        'Put logging in here
        Message = ex.Message & ":::Export.ExportXml"
        Return False
    End Try
End Function
其他两份名单如下:

Imports Superstar.Html.Linq

Public Class Typing

Partial Public Class SEO

    Public Property Link As String
    Public Property Title As String
    Public Property Description As String
    Public Property Keywords As String
    Public Property Content As HElement
    Public Property ContentElements As Elements

    Partial Public Class Elements

        Public Property H1 As List(Of String)
        Public Property H2 As List(Of String)
        Public Property H3 As List(Of String)
        Public Property H4 As List(Of String)
        Public Property H5 As List(Of String)
        Public Property H6 As List(Of String)
        Public Property UL As List(Of String)
        Public Property OL As List(Of String)
        Public Property STRONG As List(Of String)
        Public Property BLOCKQUOTE As List(Of String)
        Public Property EM As List(Of String)
        Public Property A As List(Of Links)
        Public Property IMG As List(Of Images)

        Partial Public Class Images
            Public Property Source As String
            Public Property Alt As String
            Public Property Title As String
        End Class

        Partial Public Class Links
            Public Property Href As String
            Public Property Rel As String
            Public Property Title As String
            Public Property Target As String
            Public Property Content As String
        End Class

    End Class

End Class

End Class
ReportProgress仅报告并更新Xaml窗口的backgroundworker,以更新此特定情况下的进度条:

Public Sub ReportProgress(ByVal ct As Integer, _Worker As ComponentModel.BackgroundWorker)
    If _Worker IsNot Nothing Then
        _Worker.ReportProgress(ct)
        Threading.Thread.Sleep(500)
    End If
End Sub
,下载程序类为:

Imports System.Reflection
Imports System.Net
Imports Superstar.Html.Linq

Public Class Downloader
Implements IDisposable

''' <summary>
''' Get the returned downloaded string
''' </summary>
''' <value></value>
''' <returns></returns>
''' <remarks></remarks>
Public ReadOnly Property ReturnString As String
    Get
        Return _StrReturn
    End Get
End Property
Private Property _StrReturn As String

''' <summary>
''' Get the returned downloaded byte array
''' </summary>
''' <value></value>
''' <returns></returns>
''' <remarks></remarks>
Public ReadOnly Property ReturnBytes As Byte()
    Get
        Return _FSReturn
    End Get
End Property
Private Property _FSReturn As Byte()

Private Property _UserAgent As String = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13"
Private Property DataReceived As Boolean = False

''' <summary>
''' Download a string, but do not block the calling thread
''' </summary>
''' <param name="_Path"></param>
''' <remarks></remarks>
Public Sub DownloadString(ByVal _Path As String, Optional ByVal _Worker As ComponentModel.BackgroundWorker = Nothing)
    SetAllowUnsafeHeaderParsing20()
    Using wc As New Net.WebClient()
        With wc
            Dim _ct As Long = 0
            DataReceived = False
            .Headers.Add("user-agent", _UserAgent)
            .DownloadStringAsync(New System.Uri(_Path))
            AddHandler .DownloadStringCompleted, AddressOf StringDownloaded
            Do While Not DataReceived
                If _Worker IsNot Nothing Then
                    _ct += 1
                    ReportProgress(_ct, _Worker)
                End If
            Loop
        End With
    End Using
End Sub

''' <summary>
''' Download a file, but do not block the calling thread
''' </summary>
''' <param name="_Path"></param>
''' <remarks></remarks>
Public Sub DownloadFile(ByVal _Path As String, Optional ByVal _Worker As ComponentModel.BackgroundWorker = Nothing)
    SetAllowUnsafeHeaderParsing20()
    Using wc As New Net.WebClient()
        With wc
            Dim _ct As Long = 0
            DataReceived = False
            .Headers.Add("user-agent", _UserAgent)
            .DownloadDataAsync(New System.Uri(_Path))
            AddHandler .DownloadDataCompleted, AddressOf FileStreamDownload
            Do While Not DataReceived
                If _Worker IsNot Nothing Then
                    _ct += 1
                    ReportProgress(_ct, _Worker)
                End If
            Loop
        End With
    End Using
End Sub

''' <summary>
''' Download a parsable HDocument, for using HtmlToLinq
''' </summary>
''' <param name="_Path"></param>
''' <returns></returns>
''' <remarks></remarks>
Public Function DownloadHDoc(ByVal _Path As String, Optional ByVal _Worker As ComponentModel.BackgroundWorker = Nothing) As HDocument
    Try
        'StackOverFlowException Occurring Here!
        DownloadString(_Path, _Worker)
        Return HDocument.Parse(_StrReturn)
    Catch soex As StackOverflowException
        'put some logging in here, with the path attempted
        Return Nothing
    Catch ex As Exception
        SetAllowUnsafeHeaderParsing20()
        Return HDocument.Load(_Path)
    End Try
End Function

#Region "Internals"

Private Sub SetAllowUnsafeHeaderParsing20()
    Dim a As New System.Net.Configuration.SettingsSection
    Dim aNetAssembly As System.Reflection.Assembly = Assembly.GetAssembly(a.GetType)
    Dim aSettingsType As Type = aNetAssembly.GetType("System.Net.Configuration.SettingsSectionInternal")
    Dim args As Object() = Nothing
    Dim anInstance As Object = aSettingsType.InvokeMember("Section", BindingFlags.Static Or BindingFlags.GetProperty Or BindingFlags.NonPublic, Nothing, Nothing, args)
    Dim aUseUnsafeHeaderParsing As FieldInfo = aSettingsType.GetField("useUnsafeHeaderParsing", BindingFlags.NonPublic Or BindingFlags.Instance)
    aUseUnsafeHeaderParsing.SetValue(anInstance, True)
End Sub

Private Sub FileStreamDownload(ByVal sender As Object, ByVal e As DownloadDataCompletedEventArgs)
    If e.Cancelled = False AndAlso e.Error Is Nothing Then
        DataReceived = True
        _FSReturn = DirectCast(e.Result, Byte())
    Else
        _FSReturn = Nothing
    End If
End Sub

Private Sub StringDownloaded(ByVal sender As Object, ByVal e As DownloadStringCompletedEventArgs)
    If e.Cancelled = False AndAlso e.Error Is Nothing Then
        DataReceived = True
        _StrReturn = DirectCast(e.Result, String)
    Else
        _StrReturn = String.Empty
    End If
End Sub

#End Region

#Region "IDisposable Support"
Private disposedValue As Boolean ' To detect redundant calls

' IDisposable
Protected Overridable Sub Dispose(disposing As Boolean)
    If Not Me.disposedValue Then
        If disposing Then
        End If
        _StrReturn = Nothing
        _FSReturn = Nothing
    End If
    Me.disposedValue = True
End Sub

Public Sub Dispose() Implements IDisposable.Dispose
    Dispose(True)
    GC.SuppressFinalize(Me)
End Sub

#End Region

End Class

本地人向我展示了我上面提到的页面,它的大小超过了500k,我首先要消除所有的并行性——不管怎样,这可能是过度的,创建多个线程的开销大于性能增益

一旦这样做,只需调试代码并等待异常。您可以检查调用堆栈和所有集合

堆栈溢出通常发生在递归调用同一方法时,并且由于某种原因,结束条件没有生效。您将在调用堆栈中清楚地看到它。

(我需要更多的空间,否则我会将此作为注释添加到@Jakub Konecki的帖子中。)

多年来,我已经构建了几个spider,并行性唯一的巨大性能提升是URL的实际下载。在大型文档上进行HTML解析可能需要几百毫秒,但所获得的好处不值得付出调试代价。因此,让你的生活更轻松,消除平行性

您还遇到了一个奇怪的异步阻塞问题。在
DownloadHDoc
中,您同步调用
DownloadString
,但在
DownloadString
中,您启动了一个异步方法,然后在位标志上阻塞,从而破坏了异步的目的。更糟糕的是,您阻塞了一个
do-while
循环,该循环以每小时一百万英里的速度旋转,并且每次都调用
ReportProgress
。我想这就是给你国有企业的真正原因。在那里放一个
线程。Sleep(100)
,可能对初学者有所帮助

[编辑]

位标志上阻塞的代码如下:

        .DownloadStringAsync(New System.Uri(_Path))
        AddHandler .DownloadStringCompleted, AddressOf StringDownloaded
        Do While Not DataReceived
            If _Worker IsNot Nothing Then
                _ct += 1
                ReportProgress(_ct, _Worker)
            End If
        Loop
第1行启动了一个异步方法,第2行为完成添加了一个处理程序,并立即返回。第3行反复检查全局变量,等待函数设置它。这种情况每秒发生数百或数千次(或更多)。虽然不是最优的,但每次调用
ReportProgress
方法都是很糟糕的。文档越大,调用
ReportProgress
的次数就越多。你真的只需要最多每100毫秒更新一次UI,我通常将我的设置为每250毫秒或500毫秒

[编辑2]

如果出现上述问题,您应该能够将其更改为:

    .DownloadStringAsync(New System.Uri(_Path))
    AddHandler .DownloadStringCompleted, AddressOf StringDownloaded
    Do While Not DataReceived
        If _Worker IsNot Nothing Then
            _ct += 1
            ReportProgress(_ct, _Worker)
        End If
        Thread.Sleep(250) ''//Sleep inside of the loop
    Loop

p、 你可能会在这里看到一些尝试,可能是为了清理初始列表(认为这是问题的一部分)另一个注释。做这件事的不是页数。我在多次查看跟踪后发现,它总是出现在某个特定页面上。这个页面的大小刚好超过了500k。我认为代码还是有缺陷的。除了使用过多并行(imho)之外,还有几个任务/线程在_doc.substands中迭代,这很可能不是线程安全的。在这种情况下,可能会导致StackOverflow?您是否可以中断此异常,捕获一个调用堆栈并将其发布到此处?@sll-这不一定会有帮助,因为堆栈将只显示最后几个调用。一个更好的方法是进入
DownloadHDoc
,然后
DownloadString
,看看它为什么会调用自己。请看我的问题,因为我编辑了它,发现在哪里,在哪一页上发生了这件事。有一次,我从列表中删除了要解析的页面,异常不会发生,我会在一分钟内发布。。。运行另一个测试当前太大,无法在此处发布,我将在问题中发布。请阅读上面的所有评论,您会看到我确实删除了并行性:)另外,请解释一下位标志上的阻塞,因为我不确定您的意思。另外,最初,ReportProgress具有确切的睡眠值。我把它拿走了,因为我认为它可能没有必要。我将把它添加回并尝试一下,看看发生了什么。我尝试在
线程中添加回。Sleep(100)
,并对数字进行了一些处理,但我想知道相同的SOE、相同的调用堆栈、相同的Localsso。在
循环上方添加
线程.Sleep(100)
是否会让这一切变得更好…嗨,凯文,我不想让你感到痛苦,但你能发布你更新的代码吗?阅读300多行代码,然后猜测所做的编辑是很困难的。对于初学者,请尝试注释
ReportProgress(\u ct,\u Worker)
        .DownloadStringAsync(New System.Uri(_Path))
        AddHandler .DownloadStringCompleted, AddressOf StringDownloaded
        Do While Not DataReceived
            If _Worker IsNot Nothing Then
                _ct += 1
                ReportProgress(_ct, _Worker)
            End If
        Loop
    .DownloadStringAsync(New System.Uri(_Path))
    AddHandler .DownloadStringCompleted, AddressOf StringDownloaded
    Do While Not DataReceived
        If _Worker IsNot Nothing Then
            _ct += 1
            ReportProgress(_ct, _Worker)
        End If
        Thread.Sleep(250) ''//Sleep inside of the loop
    Loop