Just Enough C for PHP: Variables and Types

astorm

Frustrated by Magento? Then you’ll love Commerce Bug, the must have debugging extension for anyone using Magento. Whether you’re just starting out or you’re a seasoned pro, Commerce Bug will save you and your team hours everyday. Grab a copy and start working with Magento instead of against it.

Updated for Magento 2! No Frills Magento Layout is the only Magento front end book you'll ever need. Get your copy today!

Programming Quickies

Quick dispatches from the life of a working programmer.

This entry is part 3 of 6 in the series Just Enough C for PHP. Earlier posts include Just Enough C for PHP, and Just Enough C for PHP: Running C Programs. Later posts include Just Enough C for PHP: C Macros, There's no Such Thing as PHP, and Just Enough C for PHP: Make Basics.

Just Enough C for PHP

Just Enough C for PHP
Just Enough C for PHP: Running C Programs
Just Enough C for PHP: Variables and Types
Just Enough C for PHP: C Macros
There’s no Such Thing as PHP
Just Enough C for PHP: Make Basics

Today’s episode of Just Enough C for PHP is light on code and heavy on core concepts. First, we’re going to describe using basic variables in a C program. Then, we’re going to take a detour into how C manages the computer’s memory for those variables.

You may be tempted to skip the second part, but we recommend you make time for it. While you can write some C code without understanding how C manages memory, memory management is one of those Rubicons that C programmer needs to cross eventually.

Declaring Variables in C

If you’re coming from a language like PHP, you’re used to declaring variables like this

$foo    = 'bar';
$baz    = 1;
$test   = 1.234;

Depending on your background, you may (or may not) know that each of the above variables has a type. The first is a string, the second is an integer, the third is a float (i.e a number with a decimal). String, integer, and float are all types.

Declaring variables in C is more complicated. You can’t just say

#include <stdio.h>
int main()
{
    //this program won't work
    foo = "bar";
    baz = 1
    test = 1.234   
    return 0;
}

Compiling the above program will result in an error that looks something like this.

$ clang main.c 
main.c:5:5: error: use of undeclared identifier 'foo'
    foo = "bar";

Unlike many modern languages, a C program requires you to explicitly declare your variables with a type, and then assign a value to that variable.

For example, here’s a small program that declares three variables.

#include <stdio.h>
int main()
{    
    char foo    = 'b';
    int baz     = 1;
    float test  = 1.234;    

    printf("Program Done");            
    return 0;
}

In the above program the first variable is a char, (a single character). The second variable is an int, (an integer, or whole, or non-decimal number). The the third is a float (a floating point number, or decimal, number).

The eagle-eyed among you may have noticed we didn’t include a string variable. Strings are bit of a special case in C — we’ll explain why in a later article. There are are also other types we’ll talk about in future articles, but for now lets stick to char, int, and float.

Types present the first challenge you’ll face when learning to program in C. With modern languages, you can easily get started without understanding types. C, on the other hand, forces you to understand (or at least use) these types up front.

Why Types?

So why does C make you declare types? Because the compiler needs to tell your computer how much RAM (i.e. computer memory) to set aside to store the variables. The compiler also needs to know how that RAM should be treated by other parts of your program.

If you’ve moved in-and-around programming circles long enough, you’ve probably heard someone say it’s all just ones and zeros. If you spend your day in a higher level programming language, you may don’t think about these ones and zeros much.

The ones and zeros are more than a metaphor. Everything on your computer can be traced back to a set of “bits”, each one being turned either on or off. Your computers stores bits in sets of eight, and we call eight bits a byte. You might represent a single byte of memory like this

on on off off on off off on

That’s 8 bits, each with an on/off value. You also might represent a byte like this

11001001

i.e. as a number written in base 2 — i.e. a number written in binary format.

As we stated earlier, you can get started writing C programs without being aware of memory. However, some C concepts will require you to be memory aware, and C itself has a culture where the idea of not being aware will peg you as a beginner-in-need-of-help (in the friendly communities) or a hopeless newb (in the ruder communities). Beyond the social impact of not understanding how C uses memory, many C APIs and libraries are built with the assumption you do understand memory management.

The rest of this article will be a description of how a compiled C program will store your variables in memory. You don’t need to (and probably won’t) understand these concepts on the first pass. Many C programmers will take their entire careers to really understand the implications of memory management. I’m still working on it myself. Remember that this is hard stuff, and don’t beat yourself up if it doesn’t immediately make sense.

Types and Memory

So, lets consider the char type. Compilers conforming to modern C standards will set aside a single byte of memory for a char. When you say

char foo = 'a';

The compiler will set aside one byte of memory, and set the bytes bits to 1100001. Why 1100001? Because the ASCII standard says that 1100001 (97 in decimal) should be considered the lowercase letter a.

The int type is both simpler, and more complex. When you say

int foo = 101;

Your compiler will set aside some memory, and set a byte’s bits to 01100101, because 01100101 is 101 in decimal.

On one hand this is simpler, because (unlike ASCII characters) there’s no need for a translation table. On the other hand — how much memory should the compiler set aside for an integer? We can represent the number 101 in one byte (8 bits) of memory. However — what about 518? Five hundred and eighteen takes a byte of memory, plus two additional bits (10 00000110).

So how much memory will a C program allocate for an integer? That depends on the enviornment your program was compiled in. In very old environments, an integer used a single byte (8-bit systems). In computers with modern hardware, your program might use as many as 8 bytes (64-bit systems).

If you follow computer chip architecture, these terms may be familiar. While there’s more going on than just the size of an integer, a computer chip with 64 bit architecture may have integers that are 8 bytes in size (sometimes referred to “8 bytes wide”).

Also — many C compilers let you use special types (int8_t, int64_t, etc.) to control how much memory the compiler will set aside for an integer. Even if you’re on a 64-bit system, you can use an integer with a int8_t type to use only one byte (8 bits) of memory.

There’s one last wrinkle to integers, and that’s positive and negative numbers. Consider a byte of memory representing 127: 01111111. What would the number -127 (that’s negative one hundred and twenty seven) look like?

We might write -01111111 in this article, but a computer’s RAM has no concept of positive and negative numbers. It’s all just ones and zeros. Instead — the compiler uses the leftmost bit (usually — but let’s not get sidetracked) to represent positive and negative. A 0 is considered positive, and a 1 is considered negative. This means -127 could be represented as 11111111.

We say could be because there’s actually two different ways a computer represents positive and negative numbers. The system we’ve described is called Signed Magnitude Representation. There’s another form of storing positive and negative numbers called Ones’ Compliment. Its implementation (and the reasons for it) are a little beyond our scope for today, but the short version is Ones’ Compliment gives you certain advantages when performing certain operations on your integers.

This meant the range for an integer in the early 8-bit computers was -127 to 127. i.e. the computer can only use 7 bits to represent an integer. The 8th bit keeps track of the number’s positivity or negativity. Modern computers have the same problem (albeit with a much larger range of numbers). To work around these limited ranges, C also has something called an unsigned integer.

unsigned int x;

When you use the unsigned keyword, you’re telling the compiler to set aside the normal amount of memory for a number, but to not keep track of its positivity or negativity. This leaves the left most bit free to be used as a numeric representation. In systems that use signed magnitude representation, 11111111 is negative one hundred and twenty seven. However, if we told the compiler to make the int unsigned, 11111111 is equal to 255. If you ever played RPG games on the old 8-bit NES systems and wondered why all the inventory limits were 255 — now you know.

Decimal Numbers

If integers seemed like a mouthful, then decimal based numbers (i.e. 12.3) are like two holiday parties in the same day. Decimal numbers have all the fun of integers, but also have a fundamental problem: If all we have are ones and zeros, how do we store numbers with decimals?

A naive approach might be two use two bytes of memory — one for the whole number, the other for the decimal. In this ad hoc system, 12.3 becomes

00001100 00000011

The 00001100 representing the 12, the 00000011 representing the .3.

Of course, this is an awful system. How would we represent 12.999? What about numbers larger than 256? What about the positive/negative problem we already saw with integers? If you asked a bunch of smart people how to solve this problem you’d probably end up with as many answers as people.

Fortunately, in 2017, C is pretty settled in how is stores floating point numbers in memory. Unfortunately, the answer involves math that falls outside what we’re ready to cover in this particular article. The short version is there are ways to use whole integers to represent decimal numbers. The long answer is the Wikipedia article on floating point arithmetic, in particular the Institute of Electrical and Electronics Engineers (IEEE) 754 section. The stand-alone IEEE 754 article is also worth a look.

While we can’t cover floating point memory representation today, there are two things worth mentioning. The first is, while its possible to represent a decimal number using whole numbers (and therefor in memory with bits), it’s not possible to do arithmetic on those numbers with 100% accuracy. While C will allow you to add, subtract, multiply, and divide floating point numbers, the answers you get may be slightly inaccurate. For example, if you try adding the following numbers

#include <stdio.h>
int main()
{    
    float answer;    
    answer = 1.0 + .0000000000000000000000001;

    printf("%f \n", answer);
    return 0;
}

you’ll probably end up with 1.000000 as the answer.

The second thing you’ll want to know is that C has three types for representing decimal numbers.

float x;
double y;
long double z;

Each of these type uses a larger number of bytes to represent your decimal number. Your computer uses these extra bytes to provide more precision for floating point math. The tradeoff is they use more computer memory, and operations will take longer to complete. In single small programs these amounts are trivial. In larger, long running programs, or programs under high loads, these amounts can have impact on the performance and scalability of your program.

Wrap Up

Phew (and we didn’t even cover endianness)! Not a lot of programming this week, but it’s better to get the memory conversation over sooner rather than later. When you start thinking about how any particular variable is represented in memory, you’re starting to think like a C programmer.

It’s also worth noting — if you’ve used any programming or database system in the past 20 years, you’re probably familiar with the limits on the size of your numbers or how many characters you can store in a field. These limits aren’t the arbitrary whims of the programmers creating that application. Instead, these programs were implemented in C (or a C like language), and the limits of each variable type are passed on to system users.

Next time we’ll talk a bit about how variable types interact in C programs, and cover a few basic techniques for debugging your program. It’ll be a heck-of-a-lot lighter than this time, we promise.

Series Navigation<< Just Enough C for PHP: Running C ProgramsJust Enough C for PHP: C Macros >>

PHP Internals

Permalink: https://alanastorm.com/just-enough-c-for-php-variables-and-types/

Originally Posted: 6th December 2017

Categories

Archives

Recent Posts