Is Anything Really Random?
Understanding how we can program randomness
Hear me out ... the question “Is Anything Really Random?” might sound more of a philosophical question; and to be fair, it usually is interpreted in such a way. But, that’s a topic for a whole other day. What I am trying to get at right now is - how do random functions even work? Sure, from a practical standpoint, you just import some library or even just an inbuilt function depending on the language to generate a random number between particularly bounds that you specify. But, how does a computer produce a random number?
In math, a function should always produce the same output if given the same input. In computer science, this more or less holds true except for the fact that we have to account for the state of the program (i.e. variables) and how that can change from time to time. However, in principle this must hold as this what makes math and computer incredibly useful. If we tell a human to do the sum: 109 + 11 one hundred times, the human will likely do it right most of the time but it might miscalculate every once in a while. Now, if you were to give a calculator, 109 + 11, you expect to get out 120 every single time. If it doesn’t do that, then it’s not a useful calculator anymore. Hence, one reason why computers are so useful is that they are meant to behave deterministically. For example, if we were writing a function to sum 109 and 11, we can trace what happens line by line and always prove that it will get 120 back since the exact same steps were followed.
So if computers function deterministically. The random function then raises a contradiction to our current assessment of computers. How do computers generate different outputs given the same input (without a change in the programs overall state)?
Randomness Everywhere
While it may seem that it is random to talk about random numbers (pun intended), random numbers are everywhere. Random numbers are extremely important in cryptography as it allows for people to have different “private” keys. If the computer generated the same private key for everyone, that wouldn’t be secure at all. In fact, that is how the Ethereum blockchain and may other blockchains work - where a user accounted is created by generating a random string (which follows a required convention) that becomes a private key. If the computer kept generating the same string, the everyone would be the same user. Thus, randomness is used so that the chance of the exact same private key being generated are astronomically low.
Apart from that, randomness is sprinkled throughout practically all use cases: from RnG mechanics in video games (ex. the spread of loot in Battle Royal games like Fortnite and Apex Legends and online gambling to a random quote generator.
True Random Numbers
When coming up with random numbers, there usually must be a source of entropy (degree of uncertainty) where more entropy is better because that means more randomness. Generating entropy is rather difficult in a computer since it aims to be deterministic at its core. However, nature provides us with inherent randomness due to the laws of quantum mechanics. This is usually done by observing quantum phenomena which is usually atomic and subatomic phenomena. Examples include radioactive decay and radio noise.
Another physical source of entropy can be found right in our some of computers: the head of a read/write hard drive (since the OS requires different resources based on what the user is doing). This leads into the another source of great randomness: human behaviour. A common way is to track the clicks, mouse movements and keyboard strokes of a user. Checkout (/dev/random on Reddit).
All of these processes use some real world physical process as a source of true randomness since we don’t know or cant calculate a governing equation or algorithm for these processes. This is known as a hardware random number generator (HRNG) or true random number generator (TRNG). Hence, in this scenario, computers achieve randomness by observing some type of external random phenomena in the real world.
Some Random Numbers are Imposters
You can already see how generating truly random numbers might be an issue as they rely on some type of external physical phenomena. This may not always be accessible to a computer, not to mention the resources it would take to setup some sort of device to monitor these physical phenomena. This where pseudorandom number generation (PRNG) or deterministic random bit generator (DRBG) comes in.
In pseudorandom number generation, an algorithm is used for generating numbers that attempt to emulate that of truly randomly generated numbers. The basis of PRNG is a seed (you have probably heard of this word if you have played Minecraft). We can think of the seed as the first number (or any type of initial input) in a recursive sequence of numbers where the next numbers emulate a sequence of random numbers.
This is not truly random because given the same initial number (seed), the computer will always produce the exact same sequence of numbers. This is similar to how in Minecraft, if you type in a particular seed, it will produce the exact same initial Minecraft world everytime. Keep in mind that the seed itself can be a truly random number or constantly use a unique number. For example, one could use date-time by letting the seed be the number of milliseconds since a particular date. This would ensure that ever time your code runs, the seed will be different each time.
A method that was prominent in the 1900s and still is used in varying capacities today is using linear congruence to define recursively define a sequence of numbers. One of the reasons it became so well-known was that it was simple to understand and implement while also being relatively fast in execution.
Here is the formula for the a sequence of pseudorandom numbers denoted by S. S_n+1 simply means the next number in the sequence while S_n is the current number in the sequence and S_0 would denote the first number in the sequence.
Note: If this confusing please read up on recursive sequences or more generally recurrence relation. If you are confused by the mod, read up on modulus and modular arithmetic.
Lets go over the constants in this formula:
- m: this is the modulus and you can think of this as the upper limit as all numbers in the sequence will be LESS THAN m. It can be any value greater than 0.
- a: this can thought of just as a scaling value. I generally like to think of this as the variation factor. As the higher this number is, the consequent numbers in the sequence have a higher chance of being “far” apart in the sequence relative to m. However, for the most reliable results set a to be less than m since if this value is greater than m, in some cases, we can get a visibly easy pattern (ex. if m was 1 and a was 2 with c=0, and if the seed is 1 then the sequence of numbers would just be 1,1,1,…; think about why this would be the case).
- c: this can be thought as the increment factor. The higher this is the more variation as well. For similar reasons as to a, it is best if c < m.
The aforementioned 3 constants usually make up a PRNG algorithm. Wikipedia has this great table on some popular algorithms and what constants they use. Linear congruential generator - Wikipedia.
S_0: this is the SEED. this is the constant that should vary every time you run a PRNG algorithm. The bounds on this is usually that it must be less than m so that the entire sequence (including the seed) is less than m. A quick way to get guarantee that your seed number is less than m is just to do q mod m where q represents any seed number that is positive.
The bounds presented for a and c assume that we want a sequence of only positive numbers. If you want the sequence to have numbers whole absolute values are less than m (i.e -m < S_i < m; where S_i just denotes any number in the sequence), than you can let a and c be negative.
Coding Random
Now let’s code our on random number generator. I’ll be using Python but it can implemented in any language (however, some aspects like getting the current time might be more cumbersome in other languages). We will be trying to implement a generic random function where each call to the random function gives back a single random number.
First, let’s try to come up with a sequence of random numbers. Let’s break down the the two main aspects that we need to implement:
- A way of getting the seed (the initial value) - we can use the current date-time (in milliseconds) as a seed
- The algorithm - We can write a function to implement the recurrence relation involving the modulus. I’ll be using the following constant values: m = 2^64, a = 75, b = 74 (this is the one from the MMIX machine by famous computer scientist Donald Knuth; I could use much larger values but I am trying to not use to too much computational power)
Let’s implement the seed function first. We use the inbuilt time library in python to get the current time in seconds and convert that to milliseconds and return that value:
import time def get_seed(): time_in_seconds = time.time() # gives back a decimal number time_in_millseconds = round(time_in_seconds * 1000) # round so that we get an integer return time_in_millseconds
Now, lets implement the algorithm:
def create_random_seq(seed, sequence_len = 10): current_num = seed # S_n next_num = 0 # S_n+1, initialized to 0 output_sequence = [] # return back a list that represents the sequence for i in range(sequence_len): next_num = ((a * current_num) + c) % m # mod in python is % output_sequence.append(next_num) current_num = next_num return output_sequence
This seems pretty good and when we run
print(create_random_seq(get_seed())) """ Recorded Output: [11260684675543830605, 17649827562155479512, 8226703646984096839, 3919629916759827146, 371871994021827793, 5046412414368422156, 7855751955641916523, 7697906753102894366, 14585259934290631061, 4383037009496517504] """
These numbers seem pretty random, so our job on that end is done. But there is one problem. When a person tries to use this random function, they ideally want a list of numbers where they can specify the lower and upper bound. We could change our constants and our algorithm to account for this. That would take extra and solve the issue, but, that could take extra work on our end when we acctually dont need to worry about it at all.
Instead, if we just normalize the results to be between 0 and 1, the end user/developer can figure out for themselves how they want to scale the random numbers that we are giving them. This is super simple on our end as we just divide every number by m when adding to the sequence. Thus, we get the following result: [0.21706724282182324, 0.7168112287081746, 0.7478098275028419, 0.869083540648827, 0.0363289524357985, 0.2159418389276548, 0.8239986160757405, 0.3471738892684373, 0.8294226460955662, 0.7268643832890157]
. That is perfect!
Now, back to the note about returning a single number instead a sequence of numbers. A naive way of going about this is just taking a seed and just doing mod m and then divide by 0:
def naive_single_random_num(): return (get_seed() % m) / m
This would be really insecure because if a developer sets the random function to run at a certain interval, they will get the same number every time. Thus, to actually make this more random from n algorithmic perspective we could just generate the sequence of 10 numbers and then just keep and index of where we are in the list and whenever we call the function, return the number at the current index and increment the index. Whenever, we have gone through the entire list, we just make a new list using a new seed. Here is that implementation:
index = 10 # initialize at 10 since it will force the function to create a new sequence random_sequence = [] def random(): if (index >= 10): random_sequence = create_random_seq(get_seed()) index = 0 random_num = random_sequence[index] index += 1 return random_num
And below is the final code. If you are confused by the global index and global random_sequence thats just python syntax indicating that it will write to the global variables instead of creating new local variables.
import time m = 2**64 a = 6364136223846793005 c = 1442695040888963407 index = 10 # initialize at 10 since it will force the function to create a new sequence random_sequence = [] def random(): global index global random_sequence if (index >= 10): random_sequence = create_random_seq(get_seed()) index = 0 random_num = random_sequence[index] index += 1 return random_num def get_seed(): time_in_seconds = time.time() # gives back a decimal number time_in_millseconds = round(time_in_seconds * 1000) # round so that we get an integer return time_in_millseconds def create_random_seq(seed, sequence_len = 10): current_num = seed # S_n next_num = 0 # S_n+1, initialized to 0 output_sequence = [] for i in range(sequence_len): next_num = ((a * current_num) + c) % m # mod in python is % normalized_num = round(next_num / m, 15) # divide by m to normalize the numbers and round to 15 decimals output_sequence.append(normalized_num) current_num = next_num return output_sequence # This code gets 100 random numbers for i in range(100): print(random())
And, here is the result:
0.678923028768546 0.933411422367772 0.248846503676038 0.270604153006309 0.183640006284917 0.718001827472825 0.078128812380895 0.929713566485907 0.426064993258875 0.629836054396439 ...
Now the only problem with this is that whenever another file tries to call this random function, it would create a new sequence of numbers. Hence, it would be ideal if the sequence of random numbers persists (for example, by saving permanently allocating space for it somewhere on the machine). But that’s beyond the scope of this random adventure into randomness.