Ruby Enumerator.new(size) Subscribe to my feed

written by Max Chernyak on 07 Aug, 22

Every ruby enumerator supports count. It’s a method that will iterate over every item and return their total count.

irb> enum = Enumerator.new { |yielder|
  (1..100).each do |i|
    puts "counting item: #{i}"
    yielder << i
  end
}

irb> enum.count
counting item: 1
counting item: 2
…
counting item: 100
=> 100

However, Enumerable also has size. Except, by default it’s just nil.

irb> enum.size
=> nil

A little-known feature in ruby is that you can pass a parameter to Enumerator.new to give it a shortcut answer” to the size question.

irb> enum = Enumerator.new(100) { |yielder|
  (1..100).each do |i|
    puts "counting item: #{i}"
    yielder << i
  end
}

irb> enum.size
=> 100

No more iterating to get the count. However, there’s an even more little-known feature. You can pass a lambda to determine the size lazily, and still faster than iterating. Let’s say that you’re enumerating over products in some kind of ecommerce API.

irb> api = EcommerceApi.new('connection config')
irb> enum = Enumerator.new { |yielder|
  api.products.each.with_index do |product, index|
    puts "fetching product: #{index}"
    yielder << product
  end
}
irb> enum.count
fetching product 0
fetching product 1
…
fetching product 235
=> 236

Let’s say our API has a more efficent way of obtaining the count: total_count endpoint.

irb> api = EcommerceApi.new('connection config')
irb> enum = Enumerator.new(api.products.total_count) { |yielder|
  api.products.each.with_index do |product, index|
    puts "fetching product: #{index}"
    yielder << product
  end
}
irb> enum.size
=> 236

We no longer have to iterate over products to get the total count, but notice a new problem: we now always run total_count, even if the user of our enum never calls size. Seems like a waste. Moreover, if the products are added to the API, our size will not change. The lambda would allow us to run the API call only when requested, and always get fresh count.

irb> api = EcommerceApi.new('connection config')
irb> enum = Enumerator.new(-> { api.products.total_count }) { |yielder|
  api.products.each.with_index do |product, index|
    puts "fetching product: #{index}"
    yielder << product
  end
}
irb> enum.size # Calls -> { api.products.total_count } lambda.
=> 236

This feature also exists when using enum_for/to_enum to create the enumerator. You have to return it from the block passed into enum_for. The block arguments are any additional arguments passed to enum_for.

irb> def each_number(max = 100)
  return enum_for(__method__, max) { |max| max } unless block_given?
  (1..max).each { |n| yield n }
end
irb> each_number(200).size
=> 200

P.S. I often forget how this Ruby feature works, and searching never brings up quick examples, so hopefully this article will help when in need of a quick reminder.