clojure jdbc->;异步通道->;csv文件。。。为什么我不懒惰?

clojure jdbc->;异步通道->;csv文件。。。为什么我不懒惰?,jdbc,clojure,core.async,Jdbc,Clojure,Core.async,我试图更好地理解core.async和channels等 我手头的任务是在数据库上发出JDBCSELECT语句,并将结果流式传输到异步通道 我想从这个频道获得一个互斥线程,并使用clojure.data.csv写入csv文件 当运行下面的程序时,它似乎不是在懒惰地运行。。。我没有得到终端的输出,然后所有的东西都立即出现,我的csv文件有50行。我希望有人能帮助我理解为什么 提前感谢, (ns db-async-test.core-test (:require [clojure.java.jd

我试图更好地理解core.async和channels等

我手头的任务是在数据库上发出JDBCSELECT语句,并将结果流式传输到异步通道

我想从这个频道获得一个互斥线程,并使用
clojure.data.csv
写入csv文件

当运行下面的程序时,它似乎不是在懒惰地运行。。。我没有得到终端的输出,然后所有的东西都立即出现,我的csv文件有50行。我希望有人能帮助我理解为什么

提前感谢,

(ns db-async-test.core-test
  (:require [clojure.java.jdbc  :as j]
            [clojure.java.io    :as io]
            [clojure.data.csv   :as csv]
            [clojure.core.async :as async :refer [>! <! >!! <!!  chan thread]]
            [clojure.string     :as str]
            [while-let.core     :refer [while-let]]))


(defn db->chan [ch {:keys [sql db-spec]} ]
  "Given input channel ch, sql select, and db-spec connection info, put db
  hash-maps onto ch in a separate thread. Through back pressure I'm hoping to
  populate channel lazily as a consumer does downstream processing."
  (println "starting fetch...")
  (let [
        row-count           (atom 0)  ; For state on rows
        db-connection       (j/get-connection db-spec)
        statement (j/prepare-statement
                   db-connection
                   sql {
                        :result-type :forward-only  ;; you need this to be lazy
                        :fetch-size 3               ;; also this
                        :max-rows   0
                        :concurrency :read-only})
        row-fn (fn[d] (do
                       (>!! ch d)
                       ;; everything below is just for printing to stdout and
                       ;; trying to understand where my non-lazy bottleneck is.
                       (swap! row-count inc)
                       (when (zero? (mod @row-count 5))
                         (do
                           #_(Thread/sleep 2000 )
                           (println "\tFetched " @row-count " rows.")
                           (flush)
                           ))))]
    (thread
      (j/query db-connection [statement]
               {:as-arrays?    false
                :result-set-fn vec
                :row-fn row-fn
                })
      ;; as producer we finished popluting the chan, now close in this same
      ;; thread.
      (println "producer closing channel... (hopefully you have written rows by now...")
      (async/close! ch))))


(defn chan->csv [ch csv-file ]
  "With input channel ch and output file csv-file, read values off ch and write
  to csv file in a separate thread."
  (thread
    (println "starting csv write...")
    (def row-count (atom 0))
    (with-open [^java.io.Writer writer (io/writer csv-file :append false :encoding "UTF-8")]
      (while-let [data (<!! ch)]
        (swap! row-count inc)
        (csv/write-csv writer [data] :quote? (fn[x] false) )
        (when (zero? (mod @row-count 2))
          (do
            #_(Thread/sleep 2000 )
            (println "Wrote " @row-count " rows.")
            (.flush writer)
            (flush)))
        ))))

(def config {:db-spec {:classname "org.postgres.Driver"
                       :subprotocol "postgres"
                       :subname "//my-database-host:5432/mydb"
                       :user "me"
                       :password "****"}
             :sql "select row_id, giant_xml_column::text as xml_column from public.big_toasty_table limit 50"})

;; main sorta thing
(do
  (def ch (chan 1))
  (db->chan ch config)
  ;; could pipeline with transducers/etc at some point.
  (chan->csv ch "./test.csv"))

好的,我想我有适合我的东西

我的主要修复方法是调出
org.clojure.java/jdbc
和 在我的
project.clj
中,将其替换为
funcool/clojure.jdbc

funcool/clojure.jdbc
为我提供的是访问

新的
ns

(ns db-async-test.core-test
  (:require [jdbc.core :as j]
            [while-let.core :refer [while-let]]
            [clojure.java.io :as io]
            [clojure.data.csv :as csv]
            [clojure.core.async :as a :refer [>!! <!! chan thread]]
            [clojure.string :as str]))
写入线程的函数:

(defn db->chan [ch {:keys [sql db-spec]} ]
  "Put db hash-maps onto ch."
  (println "starting reader thread...")
  (let [
        row-count           (atom 0)  ; For state on rows
        row-fn (fn[r] (do (>!! ch r)
                         ;; everything below is just for printing to stdout
                         (swap! row-count inc)
                         (when (zero? (mod @row-count 100))
                           (println "Fetched " @row-count " rows."))))]
    (with-open [conn (j/connection db-spec)]
      (j/atomic conn
                (with-open [cursor (j/fetch-lazy conn sql)]
                  (doseq [row (j/cursor->lazyseq cursor)]
                    (row-fn row)))))
      (a/close! ch)))
(defn chan->csv [ch csv-file ]
  "Read values off ch and write to csv file."
  (println "starting writer thread...")
  (def row-count (atom 0))
  (with-open [^java.io.Writer writer (io/writer csv-file 
                                      :append false :encoding "UTF-8")]
    (while-let [data (<!! ch)]
      (swap! row-count inc)
      (csv/write-csv writer [data] :quote? (fn[x] false) )
      (when (zero? (mod @row-count 100))
        (println "Wrote " @row-count " rows.")))))

下面的输出,看起来两个线程同时工作,将数据流输出到频道,并从该频道弹出到csv。
此外,即使使用
giant\u xml\u列
我的系统仍然没有使用大量内存

starting fetch...
starting csv write...
Fetched  100 Wrote  rows.
100
  rows.
Fetched  200Wrote    rows.200

 rows.
Fetched  300  rows.
Wrote
...clip....
6000  rows.
Fetched  6100  rows.
Wrote  6100  rows.
Fetched  6200  rows.
Wrote  6200  rows.
Fetched  6300  rows.
Wrote
 6300  rows.
Fetched  6400Wrote    rows.6400

 rows.
Fetched  6500  rows.Wrote

6500  rows.

让做什么?请分享我从我的项目中提取的代码。clj:可能是jdbc与db/驱动程序之间的特殊关系。过去有过这样的问题,
clojure.java.jdbc
由于数据库的差异而不会有预期的行为。必须降低级别才能更好地配置jdbc。请参阅:有关postgreSQL的相关问题。感谢您的回复。我认为根据您发布的链接,唯一的解决方案是将所有功能放入
行fn
。。。因为我遵守所有其他规则。。。而且
vec
(我的
:结果集fn
)应该是懒惰的。由于我一直在修补这一点,我发现这似乎正是我所需要的。。。如果我让它工作,我会发布一个解决方案。我需要了解使用
funcool/clojure.jdbc
vs
clojure.java.jdbc
的利弊。。。
(def config {:db-spec {:subprotocol "postgresql"
                       :subname "//mydbhost:5432/mydb"
                       :user "me"
                       :password "*****"}
             :sql "select row_id, giant_xml_value::text from some_table"})

(do
  (def ch (chan 1))
  (thread (db->chan ch config))
  (thread (chan->csv ch "./test.csv")))
starting fetch...
starting csv write...
Fetched  100 Wrote  rows.
100
  rows.
Fetched  200Wrote    rows.200

 rows.
Fetched  300  rows.
Wrote
...clip....
6000  rows.
Fetched  6100  rows.
Wrote  6100  rows.
Fetched  6200  rows.
Wrote  6200  rows.
Fetched  6300  rows.
Wrote
 6300  rows.
Fetched  6400Wrote    rows.6400

 rows.
Fetched  6500  rows.Wrote

6500  rows.