Html 在Haskell中使用TagSoup解析标记_Html_Regex_Haskell_Tag Soup

Html 在Haskell中使用TagSoup解析标记

html regex haskell

Html 在Haskell中使用TagSoup解析标记,html,regex,haskell,tag-soup,Html,Regex,Haskell,Tag Soup,我一直在努力学习如何从Haskell中的HTML文件中提取数据，但遇到了麻烦。我对Haskell一点经验都没有，我以前的知识来自Python（以及用于HTML解析的BeatifulSoup）我使用TagSoup查看我的HTML（似乎是推荐的），并对它的工作原理有一些基本的了解。下面是我的代码的基本部分（自包含，并输出测试信息）：我试着写出来： let tags = dropWhile (~/= TagOpen "div" [("id", "scores-[0-9]+"), ("class",

我一直在努力学习如何从Haskell中的HTML文件中提取数据，但遇到了麻烦。我对Haskell一点经验都没有，我以前的知识来自Python（以及用于HTML解析的BeatifulSoup）

我使用TagSoup查看我的HTML（似乎是推荐的），并对它的工作原理有一些基本的了解。下面是我的代码的基本部分（自包含，并输出测试信息）：

我试着写出来：

let tags = dropWhile (~/= TagOpen "div" [("id", "scores-[0-9]+"), ("class", "scoreBox( spanCol[0-9]?)+( lastCol)?")]) (parseTags http)

但随后它试图找到文字[0-9]+。我还没有找到Text.Regex.Posix模块的解决方法，转义字符也不起作用。这里的解决方案是什么？

~=

不做正则表达式，您必须自己编写一个匹配器，类似于

import Data.Maybe
import Text.Regex

goodTag :: TagOpen -> Bool
goodTag tag = tag ~== TagOpen "div" []
    && fromAttrib "id" tag `matches` "scores-[0-9]+"

-- Just a wrapper around Text.Regex.matchRegex
matches :: String -> String -> Bool
matches string regex = isJust $ mkRegex regex `matchRegex` string

从属性“id”标签=~“分数-[0-9]+”怎么样？谢谢，伙计们！这两种方法都有效。我不确定哪一个“更好”，但因为我想写尽可能多的代码（出于学习的目的，别担心），所以我现在只介绍Koterpillar的方法。非常感谢！

let tags = dropWhile (~/= TagOpen "div" [("id", "scores-[0-9]+"), ("class", "scoreBox( spanCol[0-9]?)+( lastCol)?")]) (parseTags http)

import Data.Maybe
import Text.Regex

goodTag :: TagOpen -> Bool
goodTag tag = tag ~== TagOpen "div" []
    && fromAttrib "id" tag `matches` "scores-[0-9]+"

-- Just a wrapper around Text.Regex.matchRegex
matches :: String -> String -> Bool
matches string regex = isJust $ mkRegex regex `matchRegex` string