Ruby Numbers and the Memory: how I got confused.
I was suddenly doing a simple task to generate a bunch of random integers. There were no strict requirements, so I was thinking about an upper bound I could put. My gut advised me to come up with a number that could represent various numbers at a reasonable cost. By cost here, I mean memory and performance.
My first attempt was 2⁶³-1 as a signed 8 bytes integer (1 bit for a sign and 63 bit for the number itself). Then, I checked .size
to ensure it takes 8 bytes only.
(2**63-1).size
=> 8
You may start arguing here about the
.size
method on integers. We will come back to this later.
For a reason I don’t remember, I decided to find when Ruby would start returning 9 bytes, and, surprisingly, for me, it wasn’t 2⁶³.
(2**63).size
=> 8
The first number is 2⁶⁴.
(2**64).size
=> 9
For a second, I thought: okay, maybe it’s stored like an unsigned 8 bytes integer (all 8 bytes for the number itself). But checking how many bytes would take a negative number killed that idea.
(-1*(2**64-1)).size
=> 8
How can Ruby fit 8 bytes of data and a sign into 8 bytes?
The short answer: it can’t.
And now goes the long answer.
First of all, the size
method depends on the machine.
But that’s not all. As we know, everything in Ruby is an object: including numbers. Some objects may have different behavior than others due to internal implementation. For example, Symbols
and Fixnums
(now with the Bignum
part of the Integer
class) are special.
Having this in mind, Integer#size
doesn’t return the full size of the object but only its number part.
To get the actual size of the object, we should use ObjectSpace.memsize_of
instead.
ObjectSpace.memsize_of(2**63)
=> 40
Now, the size is 40 bytes, equal to the RValue
struct size that Ruby allocates to store objects.
You can read more about the memory internals of Ruby here.
What would be the size of 2⁶³-1 in Ruby’s object space? Let’s find out.
ObjectSpace.memsize_of(2**63-1)
=> 40
Okay, still 40 bytes. That means my initial gut feeling didn’t bring me anywhere. After slightly decreasing the power number, we can end up with the new number.
ObjectSpace.memsize_of(2**62-1)
=> 0
2⁶²-1 gives 0 bytes in the object space!
That means Ruby VM doesn’t allocate extra memory for every number (from -2⁶² to 2⁶²-1). It stores the number in its reference pointer, which takes 8 bytes only.
Why is it important to know?
There are two main reasons: performance and memory consumption.
Let’s start with performance first.
Operating with Bignum
(behind the scene, such Integer
numbers are still Bignum
) is slower.
We can check this with the following simple benchmark.
require 'benchmark/ips'
Benchmark.ips do |x|
x.report('Fixnum') { 2**60 + 2**60 }
x.report('Bignum') { 2**64 + 2**64 }
end
And the results.
Warming up --------------------------------------
Fixnum 1.160M i/100ms
Bignum 323.084k i/100ms
Calculating -------------------------------------
Fixnum 11.542M (± 1.0%) i/s - 58.021M in 5.027510s
Bignum 3.214M (± 0.9%) i/s - 16.154M in 5.026065s
As we can see, operating with Fixnum
(8 bytes only integers) is almost four times faster on addition operation. I expect a more considerable difference in more complex math operations.
Gladly or not, Ruby developers rarely need to think about memory issues due to Garbage Collector’s (GC) work. Unfortunately, sometimes this freedom from memory control can hit us.
For example, Ruby rarely can free the memory back to the system, which means we would rather avoid memory bloats. Mostly, memory bloats are hard to deal with, so why not eliminate potential issues when we can do it for no cost?
In my case, creating millions of random numbers in a few seconds that would take place in object space wasn’t perfect. So having bounds between -2⁶² and 2⁶²-1 for a random number would make the memory consumption smoother.