C is Compulsory
Why C is still a great language for learners.
Here are some key statistics from the 2021 Stack Overflow Developer Survey which collected responses from 80,000 developers about their preferences and usage of various technologies:
You can easily interpret these results as:
- JS and Python are the commonly used programming language
- C is declining in popularity
- C is a pretty dreaded language
- Also, C is one of the lowest-paying languages careerwise
For an article that calls C essential, why am I showing how C is becoming less used particularly among new developers, less favored among developers, and is one of the lowest-paying technologies? Seems to be that C’s glory days are well behind it and will fade out into oblivion. Yet, if you peek under the hood, C has been the quintessential programming language for decades and will most likely continue to be so for decades to come. JS and Python are the cool kids on the block but C is the OG.
A Look Back
As you probably already know, C is a general-purpose imperative programming language. What some new developers might not know, is that C first appeared in 1972 (happy 50th anniversary C!) and was made by Dennis Ritchie while working at Bell Labs (fun fact: Nine Nobel Prizes have been awarded for work completed at Bell Laboratories; in short, it is 🐐d). It was initially developed to be used in the UNIX operating system and was a successor to the B language (which was also made by Dennis Ritchie; maybe he was trying to teach people their ABCs?). Because it was built for an OS (which need to be as efficient as possible), it was by design, efficient at compiling down to machine language. However, in the 1980s it picked up steam as a general-purpose programming language.
C Today
While it may not seem like C is no longer as relevant as it once was, it is present in almost all major areas of computer science today. It powers all of the most popular operating systems including the aforementioned UNIX-based OSs which birthed the language such as macOS and Linux, as well as OSs that came after it like Windows, iOS, and Android. In addition, most databases run C (as well as C++; but C++ is a whole nother can of worms for another day).
Most of the languages that are popular today like Python, JavaScript, and C were built using C to different extents. So even if you hate writing C code yourself, you are still producing C code.
At this point you may be wondering, how exactly did C get so popular? There are and probably were a bunch of different languages that can accomplish a similar goal (ex. you can build a web server using both Python and JavaScript), so why exactly is such a standout among other programming languages? There were probably a bunch of different reasons for C rising to prominence the way it has over the past half-century. But, I think the most important aspect comes down to the philosophy of C.
C Tells No Only Some Lies
One of the biggest practical reasons for using C would be its efficiency, both from the perspective of time and space. The executable machine instructions produced by C (called binaries) are much smaller in size when compared to other languages. Additionally, the amount of memory that C programs use upon execution and the time they take to execute is also much smaller and faster compared to most other languages. This is because languages like Python actually translate Python code to C code and then the C compiler generates executable machine code.
The reason for this is because of C is a “lower-level” programming language. This means that C programming is a lot more accurate in terms of how the machine executes certain instructions as compared to JavaScript or Python.
The actual reality is the computer only executes binary instructions which differ by CPU architecture and other technical details about the actual hardware behind the machine (GPU, RAM, etc), writing actual machine code is highly impractical as you would have to code in 1s and 0s (imagine autocomplete for 1s and 0s) and also write completely different code depending on the hardware that you want to run your machine code program on. But at the same time, modern “higher-level” languages like Python and JavaScript lie to you about what's going on (layers of abstraction). Hey, don't get me wrong, I would rather get lied to by Python on any day of the week over writing machine code (yea it is a toxic relationship), but what if there was a middle ground where your code was not so close to the machine to point where you have to write machine code but also so abstract that you don’t understand what's going on the implications that come with that.
That's where your savior C comes into the picture. In my opinion, C strikes a near-perfect balance between reality and abstraction. It abstracts away the very nasty work like writing in binary and understanding how every CPU architecture in existence works but still retains a lot of core concepts like memory management which gives us more control and understanding of how our code works. C essentially says I can convert whatever code you give me into machine code for basically all relevant processor architectures and abstract away some other very nasty things so you don't have to deal with them (using the C compiler) but everything else, you are going to have to do yourself. Technically C lies too but without some lies, all developers would start needing to code speed runs in machine code (that's bound to be the next big subcategory on twitch).
Now, you may be saying hold on a minute, it seems like C is a lot more work than other languages like Python and JS. And you wouldn’t be wrong, here is a code snippet of a simple function that adds two numbers together:
JavaScript:
// handles both ints and floats by default function addNums(x,y){ return x + y }
C:
// For integers int addInts(int x, int y){ return x + y; } // For decimals float addDecimalNums(float x, float y){ return x + y; }
So why should I use C and do more work? I think I'll just stick to my more robust and concise Python and JS. They do what C can do much more easily and concisely. Hey, at first I thought the same thing. So when the first language I was taught in high school was C when the same course at other schools taught people “better” languages like Python, I was furious. But after quite some time and widening my programming horizon, I think C is one of the best for understanding foundational computer science topics and thinking using a “first principles” approach which is essential to solving any type of problem. And this, in turn, has made me a much more knowledgeable and just overall better developer, even when I'm just typing away some JSX code using React in a JavaScript project or writing a web server using Flask in Python.
I think to demonstrate this, I can give a series of lies that Python tells you, and then we can dispel them using C. These technical examples will require some familiarity with basic data structures like arrays, linked lists, etc., and with some basic programming concepts.
Arrays Can Have Indefinite Length
If you don’t know what arrays are, the chance is that this article isn’t applicable to you. Just as a recap, one of the main reasons for choosing an array is because of constant-time lookup. Aka it's super fast to search up any index, whether it be the first of the one-thousandth. You are probably used to initializing arrays and modifying them like:
let my_array = [1,2,3] // create array my_array.push(4) // add 4 to the end of the array my_array.splice(2, 1) // remove one element starting from the 2nd index
my_array = [1,2,3] # create array my_array.append(4) # add 4 to the end of the array my_array.remove(3) # remove the first occurence of 3 from the array (index 2 in our case)
In both of the code samples, by the end of running the blocks the array will look like: [1,2,4]
. Keep in mind these are just one of the many ways of adding/removing elements from arrays in Python and JS. But, one of the biggest problems is that this way of initializing and modifying arrays is arrays appear to be more like Linked Lists (since we can keep adding elements to arrays in Python/JS indefinitely). However, somehow we still get constant time lookups which you cannot get with Linked Lists. So it seems like Python/JS just came up with some new and exciting data structure that can grow as we need it to while giving us blazing-fast data retrieval. Seems nearly perfect! If C doesn't have this, what in the world is the point of using C? Well, ignorance is bliss. Python and JS don’t actually have unlimited size arrays, they instead have fixed-sized arrays which change size once we exceed their capacity. Let’s check this out in C.
Here is how you would initialize and modify an array in C:
int my_array[3]; // create array of integers with size of 3 my_array[0] = 1; my_array[1] = 2; my_array[2] = 3; my_array[2] = 4;
After running this code, my_array would look like: [1,2,4]
just as it did with the code samples before but how we got there was completely different. Firstly, when I initialized the array I had to specify what type the elements in the array would be and I had explicitly stated the size of the array (3 in this case). Then, I had to go through the indices one by one and set them to a certain value (you could use the syntax int my_array[] = {1,2,3};
but this is still fixed size as C just counts the number of elements in the curly brackets and sets the size to that). Then, since I can’t add another element to the list, I just changed the last element in the list. At first, this way of doing things seems like a major bottleneck. What if I don’t know the number of elements I will need the array to hold when I first initialize it?
Directly, there is nothing we can do about that since we must allocate the amount of memory our array takes up upon initialization. In memory this looks like this (where “Other memory” may be occupied by other variables):
So it seems like we are at a standstill... However, we are missing one key piece - we can’t just expand our memory directly by just continuously adding to our array; buuuuttttt we can just create a new array every time we run out of space! At first thought, that seems wildly inefficient, but we will see how it is actually still constant time if done properly and this is the core idea behind how other languages can pull off dynamically sized arrays.
The high-level overview is that we can first initialize an array of a reasonable size (note: this is NOT the max capacity of the array, just a starting limit). Then, if we run out of the array space, we just create a new array with double the initial size and just copy over the elements from the previous array. Then free the previous array. This new array now will have, half of its slots still available and once we use that up, we can just repeat this doubling process again.
‘There is no way that this is constant time’ is probably what you are thinking. But, copying all of the contents into a new array every time we double is still actually constant time using amortized time analysis. Here is an excellent read of how the time complexity for that works: Dynamic Array Amortized Analysis | Interview Cake. So, whenever tells you unbounded arrays exist, you know how the magic trick actually works.
Variables Can Be of Any Type
In Python and JS you don’t even need to bother with specifying what type of variable you are initializing (i.e string, int, decimal, etc). As you can do things like this:
my_var = 5 my_var = "hello" my_var = [1,2,3] my_var = {"is this magic": "yes"}
However, in C you have to declare the type of a variable when you initialize it and can’t modify it after the fact:
This seems like a huge inconvenience. I regularly modify the type of my variables when programming in Python and JS. But why doesn’t C allow this? Once again, it's because memory doesn't work that way. Since every single piece of code and data is just stored as binary (aka 0s and 1s), how does the computer know if this set of 0s and 1s represents an integer or a character of a decimal? Also, different data types take a different amount of memory to store. This is pretty easy to imagine would it take the same amount of space to store the integer 42 or the string “Hello World”? This is why the machine must also know the type of variable that it is dealing with at all times.
So how do Python and JS deal with this? The old value of the variable and its storage slot in memory is freed and it is reinitialized with a new slot in memory which reflects the size required by the new variable type. This happens whenever we change the type of the variable. Here is what it would look like if we change we go from an integer to a string.
There is a lot more nuance to this as different languages deal with implementations differently to optimize their language for specific scenarios.
Numbers Are Unbounded In Length
Numbers are the foundation upon which computer science is built but a lot of languages seem to lie about the real nature of how numbers are stored. For example, in Python, if you can do:
my_number = 1000000000000; // one trillion print(my_number);
That's trivial. But, if you were to do this in C:
int my_number = 1000000000000; // one trillion printf("%d", my_number);
This code snippet is undefined behavior i.e. each C compiler may handle this error differently. But, integers in C do have an upper bound which is 2^31 - 1. At first glance, it may just seem like a random number but this upper bound gives a lot of insight into how numbers are stored on the machine. Here is what a typical RAM/memory looks like:
The width you see of 32 bits or 64 bits is how many bits (0s and 1s) can be stored in 1 memory slot. Since C is old, it is usually made to run on a 32-bit architecture, which means we can only store 32 0s/1s in one slot. So by this logic shouldn’t 2^32 be the upper bound for integers in C? Firstly, if we have 32 bits, we can only go up to 2^32 - 1 (this means all of the bits are 1s). Think of this as if we only have 1 decimal digit we can only go up to 10^1 - 1 = 9). Even with that, we actually need 1 bit reserved to indicate whether a number is positive or negative. This is achieved using twos complement (read more here: Two's Complement). Thus this leaves us with 2^31 - 1. So you may think that Python and JS are able to store numbers beyond 2^31 - 1 simply because they may have more “width” in their memory slots. This may be possible when using 64-bit architectures on some machines, but Python and JS have a clever trick up their sleeve.
Let’s say I gave you a small slip of paper and told you to write 1 on it. That would be no problem. Now let’s say I told you to write 100 trillion on that piece of paper in full (can't just write 10^14 since what if each digit’s information had to be preserved). You can probably start writing 10000... but you will have to stop at some point since you ran out of paper. But now let’s say I gave you a stack of these small paper slips? Then how would write down the number?
Our first guess might be to do something like putting a group of 4 digits of the number on each slip. This ensures that there are always at most 4 digits on each slip, this will fit on each slip and by adding and multiplying them together we get the original big number back. Thus 123456789 would become:
Here is how the code would look:
// Data Structure: (implementation of Linked List ADT) struct NumGroup { int num; struct Node *next; }; // Contructor for a new Num Group struct Node *new_num(int num, struct Node *nxt){ struct Node *num_group = malloc(sizeof(struct NumGroup)); num_group->num = num; num_group->next = nxt; return num_group; }
We can think of big numbers as an array of smaller numbers that will individually fit in memory slots.
While this obviously is really important in any language, writing your own language that handles these large integers is actually a lot of extra work as opposed to just capping integer size and letting programmers implement a system like this for themselves. Let’s be honest, if I was writing a new language myself, I would cap integer size any day of the week.
At this point, you will likely either hate Python and JavaScript for deceiving you or like Python and JavaScript for providing you a much more comfortable programming experience built up on lies they tell you at every turn.
Hey, I don’t blame anyone for choosing either approach. In fact, I actually enjoy using C in some places (like practicing CS fundamentals and doing data structures and algorithms practice) since it usually gives me a more complete understanding of what is going on and faster runtime usually at the sacrifice of ripping my hair at the thought of how easy and simpler to implement this would be in Python. I also use heavily Python and JavaScript in my personal projects that are usually in the form of web development and thus much meant to be much more practical in nature but I still occasionally end up specifying variable types and trying to allocate/deallocate memory, only to have JS and Python yell at me.
However, I still do think that everyone should dabble in C to get a better understanding of how our computers actually work and also to get better at thinking from the ground up (known as the first-principles approach). Also, a concluding note, in today’s age many great languages have been developed that are similar to C in the context of this article (i.e more accurate in terms of how machines execute while providing an abstraction for some other much-needed things) like Go and Rust which come with some modern benefits and fix some of the shortcomings of C. In fact, I would recommend Rust over C if you are planning to actually make a project using a lower-level language. I used C as the main reference instead of these languages because:
- C was the language I was taught and have more familiarity with
- Right now, C is much more used than comparable languages
- There are quite a few more resources and a larger community for C development due to it being around for so long
This article is fully inspired by my time taking CS 146 taught by Prof. Brad Lushman at the University of Waterloo. I really enjoyed the course and would encourage anyone else who has the chance to also take it.