What?
An iterator encapsulates a simple idea of sequences. The problem is that most sequences are not finite. In fact, mathematically, a sequence is a function from the natural numbers to any other set (usually the reals). That is computationally difficult to emulate.
The solution is fairly simple: instead of making lists, we could make the function. But this leaves an open question: what if the function is hard to compute, as is the case with the primes or the Fibonacci sequence?
The answer is to store the way the listing of the sequence would be built so that the sequence builder remembers where it was and can step through the list. Alternatively, you can think of it as a linked list, but the pointer to the next node is replaced by a function that returns the next node.
This, of course, is not limited to infinite lists - it can also be used for finite ones. This can boost performance as memory is not used as much and can even make code easier to understand.
There are many implementations of this. Here are a few of the names and where they arise:
- Streams (SICP's chapter 3, Java 8)
- Lazy Lists (or Lazy Seqs) (Clojure)
- Iterators (Java and C++)
- Enumerables (or IEnumerable, in C#)
- Generator (Python, more for functions)
- Iterable (Python, more for objects)
- Can also be addressed as "Sequences" or "Infinite Sequences"
All follow the same pattern and use-case. If a language does not include these, they are not too difficult to implement. Unfortunately, having experience a gamut of languages and customs, I will use the above terms interchangeably (though subtle differences do exist).
All the Implementations
Here are some implementations of iterators in various languages. In general, each will show how to make an iterator through the set of subsets of a list. The following languages will be used (in this pretty much random order):
The syntax of each won't be discussed in great depth. Furthermore, implementations will be as canonical as possible - the code will be workable and aim to show the use of iterators in every language, not the implementation. Interesting points will be explained and that is that.
All implementations will implement the following pseudo-code:
Procedure Subsets(list):
End_val = 2 ^ length of list
For i from 0 to End_val (inclusive):
Current_subset = the empty list
For each bit b in i indexed by j:
If b is 1:
Add the jth element of list into Current_subset.
Pause execution, providing Current_subset
Note that the "pause execution" makes this an iterator: at that point, a value is given, but there are values left to give out.
Furthermore, a few of the languages will include the following if recursion is elegant, canonical, or can be done in an iterator:
Procedure powerset(list):
If list is empty: Pause execution, providing the empty list
Otherwise:
Let v = the first value in the list
And s = the iterator from powerset(rest of the list)
For each value ss in s:
Pause execution, providing v appended to ss.
Pause execution, providing ss.
Note that in most cases, the second loop will actually be divided into two separate loops.
Scheme
Scheme is a nice starting point as it explains the true implementation of iterators. This is introduced in the end of chapter 3 in SICP, explaining streams.
Here is the basic idea: by storing values in functions, we can delay the evaluation of things that would otherwise be infinite.
Here is the (heavily commented - semi-colon comments the line) code:
(define-syntax delay (syntax-rules ()
((delay val) (lambda () val))))
(define-syntax force (syntax-rules ()
((force val) (val))))
(define-syntax lazy-cons (syntax-rules ()
((lazy-cons x y) (cons x (lambda () y)))))
(define (lazy-car seq) (car seq))
(define (lazy-cdr seq) (force (cdr seq)))
(define (any pred ls)
(if (null? ls) #f
(or (pred (car ls)) (any pred (cdr ls)))))
(define (lazy-map fn . seqs)
(if (any null? seqs) '()
(lazy-cons (apply fn (map lazy-car seqs))
(apply lazy-map (cons fn (map lazy-cdr seqs))))))
(define (list-of-stream st)
(if (null? st) '()
(cons (lazy-car st) (list-of-stream (lazy-cdr st)))))
(define (take-while pred seq)
(if (pred (lazy-car seq))
(lazy-cons (lazy-car seq) (take-while pred (lazy-cdr seq)))
'()))
(define ones (lazy-cons 1 ones))
(define numbers (lazy-cons 1 (lazy-map + ones numbers)))
(define fibonacci (lazy-cons 0 (lazy-cons 1
(lazy-map + fibonacci
(lazy-cdr fibonacci)))))
(define (subsets-by-int lis int)
(define (loop acc ind l)
(if (= 0 ind) acc
(loop (if (= 1 (remainder ind 2))
(cons (car l) acc)
acc)
(floor (/ ind 2)) (cdr l))))
(if (number? int) (loop '() int lis) '()))
(define (subsets lis)
(let ((ln (expt 2 (length lis))))
(lazy-map (lambda (i) (subsets-by-int lis i))
(take-while (lambda (x) (< x ln)) numbers))))
(define (concat-streams . seqs)
(cond ((null? seqs) '())
((null? (cdr seqs)) (car seqs))
((null? (car seqs)) (apply concat-streams (cdr seqs)))
(else (lazy-cons (lazy-car (car seqs))
(apply concat-streams
(cons (lazy-cdr (car seqs))
(cdr seqs)))))))
(define (powerset ls)
(if (null? ls) (lazy-cons '() '())
(let ((h (car ls)) (ss (powerset (cdr ls))))
(concat-streams (lazy-map (lambda (s) (cons h s)) ss) ss))))
Clojure
Clojure has more built-in... with semi-colons as comments too. The strings after the square brackets are a built-in documentation feature. Due to builtins, not everything had had to be implemented.
(ns subset.core)
(defn subset-by-int [lis i]
(loop [acc '() v i l lis]
(if (= v 0) acc
(recur (if (= 1 (mod v 2)) (conj acc (first l)) acc)
(int (/ v 2))
(rest l)))))
(defn subsets [s]
(map #(subset-by-int s %) (->> s count (Math/pow 2) range)))
(defn powerset [s]
(if (empty? s) '(())
(let [ss (-> s rest powerset)]
(concat (map #(conj % (first s)) ss) ss))))
This works since map is inherently lazy. Furthermore, concat will maintain the laziness.
Haskell
Haskell is naturally lazy. It evaluates expressions only when forced to. This means that every form of iteration is an iterator. Here is a power-set maker:
subset_of_int :: [a] -> Integer -> [a]
subset_of_int v 0 = []
subset_of_int (x:xs) i
| i `mod` 2 == 0 = subset_of_int xs (i `quot` 2)
| otherwise = x : (subset_of_int xs (i `quot` 2))
subsets :: [a] -> [[a]]
subsets xs = map (subset_of_int xs) [0.. (2^(length xs)) - 1]
powerset :: [a] -> [[a]]
powerset [] = [[]]
powerset (x:xs) = (map (x:) ps) ++ ps
where ps = powerset xs
OCaml
OCaml is, unfortunately, normal. It does not evaluate lazily. Nor does it include as many built-ins. We differ to a scheme-like implementation.
open List;;
type 'a stream = Nil | LazyList of 'a * (unit -> 'a stream)
let rec range st en = if st == en then Nil
else LazyList (st, fun () ->
range (st + 1) en);;
let rec map_str fn seq =
match seq with
| Nil -> Nil
| LazyList (hd, tl) -> LazyList ((fn hd), fun () ->
map_str fn (tl ()))
;;
let rec list_of_stream s =
match s with
| Nil -> []
| LazyList (hd, tl) -> (hd :: list_of_stream (tl ()))
;;
let rec subsets_of_int arr i =
match i with
| 0 -> []
| x when x mod 2 == 0 -> subsets_of_int (tl arr) (x / 2)
| y -> (hd arr) :: (subsets_of_int (tl arr) (y / 2))
;;
let subsets arr =
let en = (truncate ((float (length arr)) ** 2. -. 1.)) in
map_str (subsets_of_int arr) (range 0 en)
;;
let rec concat_str st1 st2 =
match st1 with
| Nil -> st2
| LazyList (h, t) -> LazyList (h, fun () -> concat_str (t ()) st2)
;;
let rec powerset arr =
match arr with
| [] -> LazyList ([], fun () -> Nil)
| (h :: t) -> let sm = powerset t in
concat_str (map_str (fun (s) -> h::s) sm) sm
;;
That's all for the functional languages, unfortunately.
Python
This will be in Python 3 with notes on how to translate it to Python 2. There are two syntaxes for the same design in python. I'll show the bit-based algorithm using the object-oriented syntax and the recursive builder using the generator feature.
class SubsetIter:
def __init__(self, data):
self.data = data
self.it = 0
self.mx = 2 ** len(data)
def __iter__(self):
return self
def __next__(self):
if self.mx == self.it:
raise StopIteration()
else:
i = self.it
j = 0
acc = []
while i > 0:
if i % 2 == 1:
acc.append(self.data[j])
i //= 2
j += 1
self.it += 1
return acc
def powerset(s):
if len(s) == 0:
yield []
else:
h, *t = s
for i in powerset(t):
yield i
yield i + [h]
C#
C# is cool too. Perhaps the best C-based language after Rust. And it has a weird liking for uppercase.
using System;
using System.Collections.Generic;
using System.Linq;
class MainClass {
public static IEnumerable<List<T>> Subsets<T>(IList<T> vals){
var acc = new List();
foreach(int i in Enumerable.Range(0, (int) Math.Pow(2, vals.Count()) - 1)){
for(int j = i, k = 0; j > 0; j /= 2, k++){
if(j % 2 == 1)
acc.Add(vals[k]);
}
yield return acc;
acc = new List();
}
}
public static IEnumerable<List<T>> Powerset<T>(IEnumerable<T> vals){
if(!vals.Any())
yield return new List();
else{
T tmp = vals.ElementAt(0);
foreach(var t in powerset(vals.Skip(1))){
yield return t;
var l = t.ToList();
l.Add(tmp);
yield return l;
}
}
}
}
JavaScript
JavaScript has difficulty with iterators. And objects. ES6 would be a different story, but here it is in good old JavaScript (5). This is very much like the OCamL version, minus a nice syntax. Or type system.
var LazyList = function(head, rest){
this.head = head;
this.rest = rest;
};
var ones = new LazyList(1, function(){return ones;});
var map = function(fn, lists){
if(lists.some(function(l){return l === null}))
return null;
var hd = function(l){return l.head;};
var recurse = function(){
return map(fn, lists.map(function(v){
return v.rest();
}));
};
return new LazyList(fn.apply(null, lists.map(hd)), recurse);
};
var toArray = function(lis){
var acc = [];
while(lis !== null && lis !== undefined){
acc.push(lis.head);
lis = lis.rest();
}
return acc;
}
var ints = new LazyList(1, function(){
return map(function(r, l){
return r + l;
}, [ones, ints]);
});
var takeN = function(lis, n){
if(n === 0) return null;
return new LazyList(lis.head, function(){
return takeN(lis.rest(), n - 1);
});
};
var getSubset = function(val, lis){
var j = 0;
var acc = [];
for(var i = val; i > 0; i = Math.floor(i / 2)){
if(i % 2 === 1) acc.push(lis[j]);
j++;
}
return acc;
};
var subsets = function(l){
return map(function(v){
return getSubset(v, l);
}, [takeN(ints, Math.pow(2, l.length))]);
};
var concat = function(lists){
if(lists.length === 0)
return null;
else if(lists.length === 1)
return lists[0];
else if(lists[0] === null || lists[0] === undefined)
return concat(lists.slice(1));
else{
var t = lists[0].head;
lists[0] = lists[0].rest();
return new LazyList(t, function(){
return concat(lists);
});
}
};
var powerset = function(l){
if(l.length === 0){
return new LazyList([], function(){
return null;
});
}else{
var v = l[0];
var r = powerset(l.slice(1));
return concat([r, map(function(vv){
return vv.concat([v]);
}, [r])]);
}
};
Rust
Saving a good one for the last. I love how much like Python this looks, but with like everything fixed.
struct Subsets<A>{
counter: usize,
data: Vec<A>
}
impl<A> Subsets<A>{
fn new(data: Vec<A>) -> Subsets<A> {
Subsets{
counter: 0,
data: data
}
}
}
impl<A> Iterator for Subsets<A>
where A: Clone{
type Item = Vec<A>;
fn next(&mut self) -> Option<Vec<A>>{
if self.counter >= 2_usize.pow(self.data.len() as u32) {
Option::None
}else{
let mut i = self.counter;
let mut rv = Vec::new();
let mut j = 0;
while i > 0 {
if i % 2 == 1 {
rv.push(self.data[j].clone());
}
i = i / 2;
j += 1;
}
self.counter += 1;
Option::Some(rv)
}
}
}
You can also use a Vec<Rc<A>>
to avoid cloning, if you like your references.
A more OcaML-like implementation is possible, but brings up some interesting issues. To implement the precise record type, the function
pointer you'd need is quite painful to work with. The trouble starts with the
impl Fn(A) -> B
that is the mapping function passed to "map." This needs a lifetime to be allowed to be moved into the recursive call to map. If you follow that
path, however, you find that the lifetimes you need are hellish (
example).
However, after reading up on what
'static means in a trait bound, the following turned out to make sense:
use std::rc::Rc;
enum Stream<T> {
Data {
value: T,
next: Rc<dyn Fn() -> Stream<T>>,
},
Nil,
}
impl<T: Clone> Clone for Stream<T> {
fn clone(&self) -> Self {
match self {
Stream::Nil => Stream::Nil,
Stream::Data { value, next } => Stream::Data {
value: value.clone(),
next: next.clone(),
},
}
}
}
fn map<A: 'static, B: 'static>(stream: Stream<A>, func: Rc<dyn Fn(A) -> B>) -> Stream<B> {
match stream {
Stream::Nil => Stream::Nil,
Stream::Data { value, next } => Stream::Data {
value: func(value),
next: Rc::new(move || map(next(), func.clone())),
},
}
}
fn concat<A: 'static + Clone>(stream1: Stream<A>, stream2: Stream<A>) -> Stream<A> {
match stream1 {
Stream::Nil => stream2,
Stream::Data { value, next } => Stream::Data {
value: value.clone(),
next: Rc::new(move || concat(next(), stream2.clone())),
},
}
}
fn singleton_stream<A>(a: A) -> Stream<A> {
Stream::Data {
value: a,
next: Rc::new(|| Stream::Nil),
}
}
fn subsets<A: 'static + Clone>(stream: Stream<A>) -> Stream<Stream<A>> {
match stream {
Stream::Nil => singleton_stream(Stream::Nil),
Stream::Data { value, next } => {
let others = subsets(next());
concat(
others.clone(),
map(others, Rc::new(move |st| concat(singleton_stream(value.clone()), st))),
)
}
}
}
I don't really like the assumptions about
Clone
, but it's a reasonable compromise since one stream element is actually rather cheap to clone.
A full proof-of-concept is
on my GitHub
Conclusion
That's about it. I tried a C++ implementation, but it got boring. Probably because I decided to implement a
std::random_access_iterator_tag
.
The half-baked code is
here and an attempt to do it in C
here.
C was definately worse since lambdas are an extension, so closures are painful. The lack of deterministic destruction
makes it worse: iterating down such a list multiple times needs deletion and that needs some notion of a root.
I have a few ideas on better memory semantics, so I'll keep y'all posted... It's just that there's no real good user interface for it, TBH.
Addendum: More Infinity and Laziness
This is an exercise in crazy in Haskell. So I won't propose it as "another implementation" for the lazy lists above. (It could be, but it's
sort of painful outside Haskell.)
This implementation is a lot like the counting implementation (where we run some int from 0 to 2^(length) - 1 and use the bits to pick
which elements get into that subset (so 0 has the empty set, 1 picks the first element, 2 the second, 3 the first two and so on...)),
but this implementation focusses on the potential infiniteness of the set we're picking subsets from. Without further ado:
nextSubset :: [Bool] -> [Bool]
nextSubset [] = [True]
nextSubset (True:xs) = False:(nextSubset xs)
nextSubset (False:xs) = True:xs
subsets :: [a] -> [[a]]
subsets xs = map (map fst . filter snd
. zip xs)
$ iterate nextSubset []
finiteSubsets :: [a] -> [[a]]
finiteSubsets xs = take (2 ^ (length xs)) $ subsets xs
Actually, the integer-based implementation is identical since Integer
is a bignum so wouldn't overflow.
Added Addendum: Much Infinite Rust
The algorithm in the addendum actually wasn't so painful in Rust. So, here goes nothing. First, Rust needs a few small preliminaries:
fn filter_map<A: 'static, B: 'static>(
stream: Stream<A>,
func: Rc<dyn Fn(A) -> Option<B>>,
) -> Stream<B> {
match stream {
Stream::Nil => Stream::Nil,
Stream::Data { value, next } => match func(value) {
Option::Some(val) => Stream::Data {
value: val,
next: Rc::new(move || filter_map(next(), func.clone())),
},
Option::None => filter_map(next(), func.clone()),
},
}
}
fn zip_streams<A: 'static, B: 'static>(left: Stream<A>, right: Stream<B>) -> Stream<(A, B)> {
match left {
Stream::Nil => Stream::Nil,
Stream::Data {
value: lv,
next: ln,
} => match right {
Stream::Nil => Stream::Nil,
Stream::Data {
value: rv,
next: rn,
} => Stream::Data {
value: (lv, rv),
next: Rc::new(move || zip_streams(ln(), rn())),
},
},
}
}
fn iterate_fn_stream<T: 'static + Clone>(initial: T, update: Rc<dyn Fn(T) -> T>) -> Stream<T> {
Stream::Data {
value: initial.clone(),
next: Rc::new(move || iterate_fn_stream(update(initial.clone()), update.clone())),
}
}
After that, we can have the Haskell code above more-or-less translated. I do think that
filter_map
is more
idiomatic to have than having to compose a
filter
inside some
map
. I'm also not really sure if the better way to write
zip_streams
would be with a nested
if let
and then just return
Stream::Nil
at the bottom.
But that's a trifle. Here's the rest of the code:
fn increment_infinite_binary_num(prev: Stream<bool>) -> Stream<bool> {
match prev {
Stream::Nil => singleton_stream(true),
Stream::Data { value: true, next } => concat(
singleton_stream(false),
increment_infinite_binary_num(next()),
),
Stream::Data { value: false, next } => concat(singleton_stream(true), next()),
}
}
fn large_finite_subsets<A: 'static + Clone>(set: Stream<A>) -> Stream<Stream<A>> {
let nats = iterate_fn_stream(Stream::Nil, Rc::new(increment_infinite_binary_num));
map(
nats,
Rc::new(move |nat| {
filter_map(
zip_streams(set.clone(), nat),
Rc::new(|(x, n)| if n { Option::Some(x) } else { Option::None }),
)
}),
)
}
I enjoy how much
increment_infinite_binary_num
looks like the Haskell function
nextSubset
, but I do wonder if I should've constructed the
Stream::Data
in-place
instead of concatenating a singleton in front of the list. What I wrote is definitely pithy, I just hope that the inliner sees me, at least a little bit -- I have no hope that
concat
can
(or should) be inlined.