Parsing 如何贪婪地使用parsec?

Parsing 如何贪婪地使用parsec?,parsing,haskell,parsec,Parsing,Haskell,Parsec,在我的工作中,我遇到了很多粗糙的sql,我有一个聪明的想法,编写一个程序来解析sql并将其打印出来。我很快就做到了,但我遇到了一个我不知道如何解决的问题 因此,让我们假设sql是select foo from bar where 1。我的想法是,总是有一个关键字后跟它的数据,所以我所要做的就是解析一个关键字,然后在下一个关键字之前捕获所有乱七八糟的内容,并将其存储起来,以便以后进行清理(如果值得的话)。代码如下: import Text.Parsec import Text.Parsec.Com

在我的工作中,我遇到了很多粗糙的sql,我有一个聪明的想法,编写一个程序来解析sql并将其打印出来。我很快就做到了,但我遇到了一个我不知道如何解决的问题

因此,让我们假设sql是select foo from bar where 1。我的想法是,总是有一个关键字后跟它的数据,所以我所要做的就是解析一个关键字,然后在下一个关键字之前捕获所有乱七八糟的内容,并将其存储起来,以便以后进行清理(如果值得的话)。代码如下:

import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Char
import Data.Text (strip)

newtype Statement = Statement [Atom]
data Atom = Branch String [Atom] | Leaf String deriving Show

trim str = reverse $ trim' (reverse $ trim' str)
  where
    trim' (' ':xs) = trim' xs
    trim' str = str

printStatement atoms = mapM_ printAtom atoms
printAtom atom = loop 0 atom 
  where
    loop depth (Leaf str) = putStrLn $ (replicate depth ' ') ++ str
    loop depth (Branch str atoms) = do 
      putStrLn $ (replicate depth ' ') ++ str
      mapM_ (loop (depth + 2)) atoms

keywords :: [String]
keywords = [
  "select",
  "update",
  "delete",
  "from",
  "where"]

keywordparser :: Parsec String u String
keywordparser = try ((choice $ map string keywords) <?> "keywordparser")

stuffparser :: Parsec String u String
stuffparser = manyTill anyChar (eof <|> (lookAhead keywordparser >> return ()))

statementparser = do
  key <- keywordparser
  stuff <- stuffparser
  return $ Branch key [Leaf (trim stuff)]
  <?> "statementparser"

tp = parse (many statementparser) ""
但我想不出任何方法来重写它,让它工作。我曾尝试在括号部分使用manyTill,但当我同时使用生成字符串的paren和单字符作为替代项时,我最终很难让它进行类型检查。有人对此有什么建议吗?

是的,这可能不符合你的要求。当然,对于您的用例,我会按照hammar的建议,使用现成的SQL解析器。个人意见:或者,除非你真的必须使用SQL,否则尽量不要使用SQL;使用字符串进行数据库查询的想法是一个历史错误

注意:我添加了一个名为的操作符,它将连接两个解析器的结果,无论它们是字符串还是字符。代码在底部

首先,对于解析括号的任务:顶层将解析相关字符之间的一些内容,这正是代码所说的

parseParen = char '(' <++> inner <++> char ')'
我将假设,对于解决方案的其余部分,您要做的是分析如何通过顶级SQL关键字将内容拆分。i、 忽略括号中的内容。也就是说,我们将有一个语法分析器,其行为如下

Main> parseTest parseSqlToplevel "select asdf(select m( 2) fr(o)m w where n) from b where delete 4"
[(Select," asdf(select m( 2) fr(o)m w where n) "),(From," b "),(Where," "),(Delete," 4")]
假设我们有一个parseKw解析器,它将获得select之类的内容。在我们使用一个关键字之后,我们需要一直读到下一个[top-level]关键字。我的解决方案的最后一个技巧是使用先行组合器来确定下一个单词是否是关键字,如果是,则将其放回原处。如果不是,则使用一个括号或其他字符,然后在其余字符上递归

-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
    (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
     option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))
我的整个解决方案如下

-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g

data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)

parseKw =
    (Select <$ string "select") <|>
    (Update <$ string "update") <|>
    (Delete <$ string "delete") <|>
    (From <$ string "from") <|>
    (Where <$ string "where") <?>
    "keyword (select, update, delete, from, where)"

-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
    (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
     option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))

parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof

parseParen = char '(' <++> inner <++> char ')' where
    inner = many (noneOf "()") <++> option "" (parseParen <++> inner)
编辑-支持报价的版本 您可以使用与Paren相同的方法来支持引号

import Control.Applicative hiding (many, (<|>))
import Text.Parsec
import Text.Parsec.Combinator

-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g

data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)

parseKw =
    (Select <$ string "select") <|>
    (Update <$ string "update") <|>
    (Delete <$ string "delete") <|>
    (From <$ string "from") <|>
    (Where <$ string "where") <?>
    "keyword (select, update, delete, from, where)"

-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
    (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
     option "" ((parseParen <|> parseQuote <|> many1 (noneOf "'() \t")) <++> parseOther))

parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof

parseQuote = char '\'' <++> inner <++> char '\'' where
    inner = many (noneOf "'\\") <++>
        option "" (char '\\' <++> anyChar <++> inner)

parseParen = char '(' <++> inner <++> char ')' where
    inner = many (noneOf "'()") <++>
        (parseQuote <++> inner <|> option "" (parseParen <++> inner))

我用parseTest parseSqlToplevel选择'asdf'b进行了尝试。干杯

咬紧牙关,编写一个完整的SQL解析器。根据您的SQL的风格和复杂性,您可能可以使用以下方法:。几年前,当我还很幼稚的时候,我尝试过这样做,您知道mysql的bnf比宇宙中的原子还要多吗?我已经看过了yacc文件,它太多了,不可能被拆开。你能用正则表达式来提取你想要的东西吗?听起来您想要做的是孤岛解析搜索应该会带来一些对TXL和ASDF/SGLR工具的引用。虽然孤岛解析表面上看起来很简单,因为您需要一个比合适的解析器更小的语法,但尝试使用Parsec进行解析是相当困难的。特别是开始符号是一个问题——你需要看很多符号才能知道什么时候该跳出所谓的水。我真的希望它能解析sql。我又做了一些,我真的很接近了。有许多和eof的并发症阻止了彻底的成功,但我还没有准备好放弃。我仍然认为这是可能的。运行你的代码,它会工作。不幸的是,我不太理解应用型风格,所以我无法判断它。尽管如此,在接下来的几天里,我将对此进行剖析,以了解它是如何工作的。谢谢大家!@当然,我使用的应用程序样式非常基本。[对于单子]运算符将一元值函数应用于一元值参数;这和mf mx=do{f一样,有一个问题:选择'im a string with a皱眉:',表中的id失败。但是你的有我没有的东西,所以我会一直蒙混过关,直到它对我有效。谢谢。
-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
    (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
     option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))
-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g

data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)

parseKw =
    (Select <$ string "select") <|>
    (Update <$ string "update") <|>
    (Delete <$ string "delete") <|>
    (From <$ string "from") <|>
    (Where <$ string "where") <?>
    "keyword (select, update, delete, from, where)"

-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
    (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
     option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))

parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof

parseParen = char '(' <++> inner <++> char ')' where
    inner = many (noneOf "()") <++> option "" (parseParen <++> inner)
import Control.Applicative hiding (many, (<|>))
import Text.Parsec
import Text.Parsec.Combinator

-- overloaded operator to concatenate string results from parsers
class CharOrStr a where toStr :: a -> String
instance CharOrStr Char where toStr x = [x]
instance CharOrStr String where toStr = id
infixl 4 <++>
f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g

data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)

parseKw =
    (Select <$ string "select") <|>
    (Update <$ string "update") <|>
    (Delete <$ string "delete") <|>
    (From <$ string "from") <|>
    (Where <$ string "where") <?>
    "keyword (select, update, delete, from, where)"

-- consume spaces, then eat a word or parenthesis
parseOther = many space <++>
    (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
     option "" ((parseParen <|> parseQuote <|> many1 (noneOf "'() \t")) <++> parseOther))

parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof

parseQuote = char '\'' <++> inner <++> char '\'' where
    inner = many (noneOf "'\\") <++>
        option "" (char '\\' <++> anyChar <++> inner)

parseParen = char '(' <++> inner <++> char ')' where
    inner = many (noneOf "'()") <++>
        (parseQuote <++> inner <|> option "" (parseParen <++> inner))