While reading Programming Clojure I read that lazy sequence can break recursion and prevent stack overflow. This wasn’t completely intuitive to me. How could changing the type of a sequence (eager or lazy) change the way the stack is used? Let’s dig into it.
Let’s start first with the example given in the book:
(declare replace-symbol replace-symbol-expression)
(defn replace-symbol [coll oldsym newsym]
(if (empty? coll)
()
(cons (replace-symbol-expression
(first coll) oldsym newsym)
(replace-symbol
(rest coll) oldsym newsym))))
(defn replace-symbol-expression [symbol-expr oldsym newsym]
(if (symbol? symbol-expr)
(if (= symbol-expr oldsym)
newsym
symbol-expr)
(replace-symbol symbol-expr oldsym newsym)))
(defn deeply-nested [n]
(loop [n n
result '(bottom)]
(if (= n 0)
result
(recur (dec n)(list result)))))
The function deeply-nested
produces a structure like this
((((((((((bottom))))))))))
And the function replace-symbol
can be used to change bottom to something else
user=> (replace-symbol (deeply-nested 10) ‘bottom ‘other)
(((((((((other))))))))))
And we indeed get a stack overflow if the nesting is too big
user=> (replace-symbol (deeply-nested 1000) ‘bottom ‘other)
#<CompilerException java.lang.StackOverflowError (NO_SOURCE_FILE:0)>
And if I slightly change the replace-symbol
function to use a lazy sequence:
(defn replace-symbol [coll oldsym newsym]
(lazy-seq (if (empty? coll)
()
(cons (replace-symbol-expression
(first coll) oldsym newsym)
(replace-symbol
(rest coll) oldsym newsym)))))
I indeed prevent the stack overflow from happening.
What happens actually is that lazy sequences are just like futures. The sequence is realized only when the sequence is used. If the consumer of the loop consumes the value sequentially, this indeed breaks the recursion. Let’s consider a sequence which is recursively defined as ( cons 1 (cons 2 ( cons 3 (list 4) ) ) )
, and a function count defined with
(defn count-1 [col]
(loop [n 0 c col]
(if (= (first c) nil)
n
(recur (inc n)(rest c)))))
An attempt to count the number of items by traversing the sequence will produce the following call stack:
count-1
count-1, cons
count-1, cons, cons
count-1, cons, cons, cons
count-1, cons, cons, cons, list
(Count-1
itself is not recursive because it uses the recur
special form to perform tail call optimization, which is not possible at the JVM level)
If the sequence is lazily defined, then the call stack will be as follows:
count-1
count-1, cons
count-1, cons
count-1, cons
count-1, list
The lazy sequence are realized one after the other, but each realization returns in the count-1
function. This sequence is actually much closer to a generator, producing one item at a time.
This is the heart of infinite sequences also, as for instance in:
user=> (take 15 (cycle [1 2 3 4]))
(1 2 3 4 1 2 3 4 1 2 3 4 1 2 3)?
The special form lazy-seq in Clojure is actually a macro.
user=> (macroexpand-1 '(lazy-seq (list 1 2)))
(new clojure.lang.LazySeq (fn* [] (list 1 2)))
We clearly see here the parallel between future and lazy sequence. Both simply postpone the execution of a function at the time it is really needed.