Clojure 从较小的流创建无限流
我有一个简单的HTML站点,其中包含我想要解析的表行。有不定式(或闭合式)页面,这些表格在哪里,即在Clojure 从较小的流创建无限流,clojure,Clojure,我有一个简单的HTML站点,其中包含我想要解析的表行。有不定式(或闭合式)页面,这些表格在哪里,即在http://example.com/?page=1页面上有表格和http://example.com/?page=2还有下一张表 我已经具备了以下基本功能: (defn next-page [link] ...) ; given http://example.com/?page=2 returns http://example.com/?page=3 (defn parse [link] ...
http://example.com/?page=1
页面上有表格和http://example.com/?page=2
还有下一张表
我已经具备了以下基本功能:
(defn next-page [link] ...) ; given http://example.com/?page=2 returns http://example.com/?page=3
(defn parse [link] ...) ; return list of rows from table parsed from HTML
现在我想写一个函数,它接受起始链接并创建所有行的无限流——首先从给定链接开始,然后从下一个链接开始
例如:
table on site: http://example.com/?page=2
|--------------------|
| table 2 |
|--------------------|
| row1: value21 |
| row2: value22 |
| row3: value23 |
|--------------------|
(deftest should-parse
(is (=
'(value21 value22 value23)
(parse "http://example.com/?page=2"))))
table on site: http://example.com/?page=3
|--------------------|
| table 3 |
|--------------------|
| row1: value31 |
| row2: value32 |
|--------------------|
这应该是事实:
(defntest should-return-stream-with-rows
(is (=
'(value21 value22 value23 value31 value32)
(take 5 (row-stream "http://example.com/?page=2")))))
如果我理解正确,您可能需要使用
mapcat
+iterate
:
让我们制作函数,就像你的一样(我猜)
因此,您可以按如下方式使用所需序列的模型:
(defn all-links [starting-page-id]
(mapcat parse
(take-while some? (iterate next-page starting-page-id))))
它迭代所有与第一页分层的下一页
结果,然后连接所有结果。请注意,next page
返回nil
(多亏了take while
)
答复:
user> (take 20 (all-links 0))
("link-0-0" "link-0-1" "link-0-2" "link-0-3" "link-0-4"
"link-0-5" "link-1-0" "link-1-1" "link-2-0" "link-2-1"
"link-2-2" "link-3-0" "link-3-1" "link-3-2" "link-3-3"
"link-3-4" "link-3-5" "link-3-6" "link-3-7" "link-4-0")
user> (take 20 (all-links 0))
("link-0-0" "link-0-1" "link-0-2" "link-0-3" "link-0-4"
"link-0-5" "link-1-0" "link-1-1" "link-2-0" "link-2-1"
"link-2-2" "link-3-0" "link-3-1" "link-3-2" "link-3-3"
"link-3-4" "link-3-5" "link-3-6" "link-3-7" "link-4-0")