A Tour of C++: Arrays, Pointers, and References Under the Hood

Introduction

Continuing in my series on Bjarne Stroustrup’s A Tour of C++, 2nd Edition, I elaborate in-depth on his treatment of arrays, pointers, and references. Many aspects of arrays, pointers, and references that need to be internalized by the developer are discussed in-depth from first principles, then illustrated in exercises by code, program output, and diagrams.

Stroustrup does an amazing job of clearly presenting the main aspects of arrays, pointers, and references in a very concise fashion, but here I expand on his treatment, especially with outputs of variable representations and memory dumps to show what is going on “under the hood” in a very clear way with lots of simple but revealing illustrated code exercises at the end.

This is by no means an exhaustive survey of arrays, pointers and references. It takes it’s cues for content from what the book covers. Look for further articles to cover areas not tackled in this article.

The Data Structure

I use a struct data structure variable named st to neatly pack my 4 byte array, integer, pointer (4 byte for 32 bit x86 programs), and reference elements in memory, one right after the other. I use #pragma pack to make sure there is no padding after the 4 byte array element when compiled as 64 bit (x64), or anywhere else within the data structure. Here is the code that simultaneously declares, initializes, and instantiates the struct and st:

#pragma pack(push, 4)
struct {
  char a[4] { 0, 1, 2, 3 };
  char* pa{ a };

  int x{ 0x04030201 };
  int y{ 0x08070605 };

  int* p1{ &x };
  int* p2{ &y };

  int& rx{ x };
  int& ry{ y };
} st;
#pragma pack(pop)

NOTE: At this point, an experienced programmer might want to skip ahead to the Deep Dive below, unless you want a good refresher!

There is a lot going on here. Briefly, in the declaration, I declare each element inside the data structure then initialize them as well, e.g.:

struct {
  ...
  int x{ 0x04030201 };
  ...
  int* p1{ &x };
  ...
  int& rx{ x };
} st;

Just above the last #pragma pop line at the bottom, I instantiate (i.e., create an instance of) the declared struct by following the closing curly brace with the name st. This actually allocates the memory storage for the struct and fills it with the initial values, becoming available in code that follows it for use through the name st.

When the struct variable st is instantiated, the integer x is initialized to equal 0x04030201, the pointer variable p1 is initialized with the address of the object x so that it points to x, and the reference variable rx is initialized to also to reference the object x. We will build up to a full understanding of all that is going on in this initialization code as we work through the article.

Below is are some examples of using st and the variables within it:

int i{ st.x };    // i now equals 0x04030201, which is the value in x
int* p3{ st.p1 }; // p3 points to the same thing as p1 (that is, st.x)
*p3 = *st.p2;     // The contents of what p3 points to now equals the contents
                  // of what st.p2 points to (that is, st.x == st.y)

Array and Pointer Basics

The first two elements declared and initialized within the struct st variable are an array of four characters and a pointer to a character. We elaborate on arrays and pointers below. Here is a quick example code snippet:

struct {
  char a[4] { 0, 1, 2, 3 };
  char* pa{ a };
...
} st;
...
st.a[1] = 5; // second array element (st.a[1]) now equals 5
*st.pa = 12; // first array element (st.a[0]) now equals 12

Arrays

An array is group of objects of the same type (and hence size). In machine memory as it is presented to the program, they are stored contiguously.

char a[4]{ 'a', 'b', 'c', 'd' };

In the code above, using the built-in type char (a 1 byte character), the name a, and the unary suffix declarator operator [ ] with a 4 inside it, we tell the compiler that we want to declare a variable named a that is an array of 4 characters (has a length of 4 elements). The initializer list between the curly braces { } to the right tells the compiler to initialize each of the four elements in the array with its respective values separated by commas.

We can leave out the 4 and let the compiler figure out the number of elements in the array as well based on the number of elements in the initializer list:

char a[] { 'a', 'b', 'c', 'd' };

However, when declared inside an enclosing struct, we cannot do this because the declaration must be completely specified:

struct {
    char a[]{ 'a', 'b', 'c', 'd' }; // "Error(active) E0070 incomplete type is not allowed" - Visual Studio
  } st;

In an expression, as opposed to a declaration, the [ ] unary suffix operator means “contents of,” allowing you to get or set the value stored in the array element indicated by the zero-based index expression inside it. So, a[0] is the first element in the above array, a[1] the second, etc., etc.:

char a[]{ 'a', 'b', 'c', 'd' }; // declare and initialize a 4 element char array
char c{ a[0] }; // initialize the char variable c to the contents of the first element in array a
a[3] = 'q';     // a[3], the fourth element in the array a, now equals 'q'
c = a[3];       // c now equals 'q'

Pointers

char a[]{ 'a', 'b', 'c', 'd' }; // declare and initialize a 4 element char array
char* pa{ a }; // declare pointer-to-char variable pa and point it to a[0]
pa = &a[3];    // pa now contains the address of a[3], hence, pointing to fourth element in a

Above, we declare a pointer to a char type using the unary suffix declarator operator * which simply means “pointer to.” So, we have just declared a variable of type pointer-to-char. We then point it to the beginning of the array variable a by directly referencing a, which puts the address of a[0] in pa. We then use the & unary prefix operator, which means “address of,” to tell the compiler to change the address in pa to the address of the fourth element in the char array variable a indicated by a[3].

We could have also used the & unary prefix operator with a[0] in the second line above to put the exact same address in the pointer pa as we do using the array variable a only, pointing it to the same place. They are equivalent:

char a[]{ 'a', 'b', 'c', 'd' }; // declare and initialize a 4 element char array
char* pa{ &a[0] }; // declare pointer-to-char variable pa and point it to a[0]
char c{ *pa };     // declare a char variable c, initialize it with a[0], which is the character 'a'
pa = a;            // no change to address in pointer pa - a equivalent to &a[0]
c = *pa;           // contents of a[0], the character 'a', just reassigned to c - no change in c

When used in an expression as a prefix, as opposed to a declaration as a suffix, the * unary prefix operator “dereferences” the pointer. That is, it changes it from a reference (a pointer to something), and instead makes it indicate the contents of the object referred to. This is similar to the previously discussed [ ] unary suffix operator for arrays. However, no index is used - it just gets the “contents of” whatever the pointer is pointing to at the time used:

char a[]{ 'a', 'b', 'c', 'd' }; // declare and initialize a 4 element char array
char* pa1{ a };     // pa1 now points to a[0]
*pa1 = 'q';         // The contents of a[0] is now 'q'
pa1 = &a[3];        // pa1 now points to a[3], the fourth element of array a
char c{ *pa1 };     // Declare a char variable c and initialize it with the contents of a[3]
char* pa2{ &a[2] }; // Declare a pointer to char variable pa2 and initialize it with the address of a[2]
pa1 = pa2;          // Put the address in pa2 in pa1.  pa1 now points to a[2]

NOTE: I use the term “reference” above, but not in the same sense I use it below for reference types. Both pointers and references “reference” something else, or “point to” something else, but how you use them is different.

To help with the diagram below, we note that characters are actually stored as numbers in the machine storage accessed by the array variable or pointers to the array. For simple one byte char arrays, these are the ASCII values of the characters that would be output to the console. We can assign regular numbers directly to the char elements in the array instead of ASCII characters:

char a[]{ 0x05, 0x11, 0x07, 0x12, 0x39, 0x4B, 0x7F, 0x02 }; // Declare and initialize array of 8 numbers (char type)
char* pa{ &a[2] }; // pa points to a[2], the third element in array a
char c = *pa;      // char variable c == 0x07, the "contents of" what pa points to

See the diagram below to help visualize this:

Array and Pointer Basics Diagram

Pointers are variables with addresses in them, and those addresses are just numbers. You can manipulate them like numbers with increment operators and addition:

char a[]{ 0x05, 0x11, 0x07, 0x12, 0x39, 0x4B, 0x7F, 0x02 }; // Declare and initialize array of 8 numbers (char type)
char* pa{ a };  // pa points to first element of a, a[0]
pa++;           // increment address in pa by 1 char (byte) - points to a[1]
char c{ *pa };  // c == 0x11 - the contents of a[1]
char* pa2{ a }; // pa points to a[0], first element of a
pa2 += 7;       // pa points to a[7], eighth (last) element of a
*pa2 = 3;       // "contents of" pa (a[7]) now equals 3

Finally, when doing pointer addition, incrementing adds the number of bytes of the size of the type of variable the pointer is declared to point to. Above, the type was char, so increment incremented by one byte, the size of a char. However, if we have an array of integers (int), it will increment by 4 bytes each time because integers are 4 bytes in size. Addition will add the number added multiplied by 4 to the address stored in the pointer:

int a[]{ 0x04020100, 0x07060504, 0x0B0A0908 }; // Declare and initialize array of 3 integers
int* pa{ a };  // pointer to int pa points to first element of a, a[0]
pa++;          // increment the address in pa by 1 int (4 bytes) - points to a[1]
int i{ *pa };  // i == 0x07060504 - the contents of a[1]
int* pa2{ a }; // pa2 points to a[0], first element of a
pa2 += 2;      // pa2 points to a[2], the third element of a.  The address stored in
                // pa2 is now 8 (2 * 4) bytes higher than it was before the addition
++(*pa2);      // "contents of" pa (in a[2]) now equals 0x0B0A0909 (notice ending 9)
                // address in the pointer variable p2 stays the same

Reference Basics

We now delve into reference types. References are similar to pointers and simpler to use, but somewhat more opaque, hiding some of the details that are explicit with pointers.

They can therefore be less error prone to use, although not always as we will see in a follow on article about pointers and references as parameters I will undertake in the future.

In the declaration we use the unary suffix declarator operator &, which means “reference to:”

int x;        // declare integer variable x
int& rx{ x }; // rx is a reference variable referring to the integer variable x

Unlike a pointer, you don’t need the unary prefix operator * to dereference (get the “contents of”) the reference:

int x;        // declare x
int& rx{ x }; // declare the rx reference member object and initialize to to refer to x
rx = 3;       // x (not rx) now equals 3
int y{ rx };  // Declare an integer y and initialize it to 3, the contents of x

Also, the reference must be initialized with the variable it refers to and cannot be changed to refer to a different variable later:

int x{ 1 };    // declare x, initialize value to 1
int y{ 2 };    // declare y, initialize value to 2
int& rx{ x };  // declare and initialize reference to x
int& ry{ y };  // declare and initialize reference to y
rx = y;        // doesn't change the reference rx - puts the value of y in x
rx = 3;        // doesn't change the reference rx - puts 3 in x
rx = ry;       // does not make rx reference y, just puts value of y in x
int* px{ &x }; // declare and initialize a pointer to x
ry = px;       // Error(active) E0513 a value of type "int *" cannot be assigned to an entity of type "int" - Visual Studio

Finally, incrementing or adding to a reference does not have the same effect it has on a pointer. The reference cannot itself be manipulated - only the object referred to. So, if the reference points to a number variable, we can do any normal mathematical manipulation allowed for the number variable referred to but it will not affect the reference variable itself:

int x{ 1 };   // declare x and initialize to 1
int& rx{ x }; // declare and initialize reference to x
rx++;         // x == 2; rx unchanged
rx *= 6;      // x == 12; rx unchanged

The Data Structure Again

We will now, armed with our new understanding, delve into the struct again:

#pragma pack(push, 4)
struct {
  char a[4] { 0, 1, 2, 3 };
  char* pa{ a };

  int x{ 0x04030201 };
  int y{ 0x08070605 };

  int* p1{ &x };
  int* p2{ &y };

  int& rx{ x };
  int& ry{ y };
} st;
#pragma pack(pop)

A struct is a container that encapsulates member objects. Its member elements must be referred to by prefixing the name of the struct variable they are in:

struct myStruct { // declare the struct and give it a name
  int x;          // single member variable inside the struct
} st;             // instantiate the declared struct as the variable st

struct myStruct* stp{ &st }; // declare a pointer of type myStruct and point it to st
st.x = 3;      // set the member object x within the struct st to 3 using the name
int i{ 5 };    // declare a variable of type int and initialize it with the value 5
i = stp->x;    // using the pointer to the struct st, put the contents of member element x in i (i == 3)
(*stp).x = 12; // illustrates explicitly what the arrow pointer above does - dereference the struct
               // then access the member inside by name - puts the value 12 in st.x

When the reference to the struct variable members is directly by the name of the struct variable, you use the name plus a dot as the prefix to refer to its inner elements. This is referred to as dot notation. The dot itself is called the member of object operator. When a reference to a struct is by a pointer to the struct, we use the name followed by the arrow operator -> (more properly, the member of pointer operator) as the prefix to reference its inner elements. Above, we gave the struct a name (myStruct) so we can use that name to make a pointer variable (stp) of the type “pointer-to-myStruct”.

The last line in the code above is illustrative only - it would not really be good form to use in production code. It simply shows explicitly what the arrow pointer operator -> effectively does - it dereferences the stp pointer variable, allowing access to the contents of the element of the struct to the right of it using the dot notation.

NOTE: I will be leaving off the st. prefix in most of the discussions and diagrams below because it is cumbersome to include in the diagrams. So, st.x becomes x, and st.ry becomes ry, etc. Just remember that every variable I discuss, except for the ii index variable and the struct st variable itself, needs the st. in front of it in the real code.

Below is the diagram that shows the data structure I will be using for all the exercises (see the full struct declaration above). To the left is a memory dump for an x86 (32 bit Intel) program, and to the right are the values of each member as they would be output in hex along with the addition of the 0x prefix to indicate the numbers are hex.

I do not dereference the pointer variables, so they show the actual memory address of the objects in memory the pointers point to. These addresses are stored in the pointer variables themselves:

Arrays, Pointers and References Representations Diagram

Why struct? And `#pragma pack`

The reason I use a struct for my purposes will become clearly illustrated in the full, deep dive treatment and exercises below: I want to pack all my variables together so I can do a memory dump of all of them for purposes of exploration and understanding machine storage.

When we use a struct, we guarantee that the variables inside it will be together in memory (not out of order or separated by other intervening variables). I also use a #pragma pack to ensure 4 byte packing, which leaves no padding after the 4 byte array or any other 4 (or 8, for x64) byte variables:

#pragma pack(push, 4)
struct {
  ...
} st;
#pragma pack (pop)

The starting #pragma pack pushes whatever the current pack value is on a stack, then selects a 4 byte packing value so all the 4 byte variables will not have any padding between them. When compiled for x64, even though the pointers and addresses stored for the references are 64 bit, or 8 bytes, there is still no padding because 8 is a multiple of 4. The ending #pragma pack simply pops the pushed pack value off the stack and back into the active pack value, returning everything to what it was before the starting #pragma pack.

Contents of the `struct`

The following code snippets are all taken from member element declarations inside the struct declaration above. I leave out the surrounding code to avoid repetition.

We will use the following to explore arrays and pointers used with arrays:

char a[4] { 0, 1, 2, 3 }; // declare and initialize the array element
char* pa{ a };            // declare and point the array pointer element
                          // to the first array element

Next, we declare the member integer variables we use to further explore pointers and also explore references:

int x{ 0x04030201 }; // declare and initialize member object int x
int y{ 0x08070605 }; // and int y

Next, the two pointers that will variously point to x and y:

int* p1{ &x }; // declare a pointer to int and initialize with address of st.x
int* p2{ &y }; // declare a pointer to int and initialize with address of st.y

Finally, the two references to x and y:

int& rx{ x }; // declare a reference to x and initialize as reference to x
int& ry{ y }; // declare a reference to y and initialize as reference to y

If any of this is opaque to you, refer to the discussions above about arrays, pointers, references and structs.

Deep Dive

Now, we will get into some more exploratory coding and do a deep dive, discussing the basics mentioned above in greater detail and bringing in subjects such as how the objects are stored in machine memory (from the point of view of what the application sees). I purposefully do not delve into virtual memory and paging here as that will be for another article.

Initial Discussion

Throughout this discussion, and below in the illustrated exercises, we use an index variable ii to get the contents of a specific char in the 4 byte array a in the subroutine that outputs everything. The size_t type is used for this purpose to choose the appropriate index type for the target program/machine architecture (32 bit x86 or 64 bit x64 in my case, both Intel):

size_t ii = 0;         // declare and initialize index variable ii
st.a[ii] = 'x';        // set a[ii] to printable character
std::cout << st.a[ii]; // example only: output first char array element at index 0

// output - contents of a[0]
x

In my test and illustration code, I use a subroutine called printAll() to print everything out after we manipulate various member objects in the struct st. It calls a utility function numToHexStr() to turn numbers into zero-padded hex strings, and stHexDump(), which prints out a hex formatted memory dump of the struct st, which calls numToHexStr() itself as well. See the links to my Github repo and edit and run the code in coliru if you want to see and experiment with all the code.

Initial Printout and Explanation

Below is the printout of the struct st immediately after declaration - instantiation - initialization with no changes made yet (remember, I leave off the struct prefix st. in my printouts and diagrams for clarity and conciseness in the diagrams , so st.x becomes just x):

printAll("Initial");

// output
Initial
 pa: 0x00D7755C  ii: 0x00000000
*pa: 0x00     a[ii]: 0x00
  x: 0x04030201   y: 0x08070605
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x04030201 *p2: 0x08070605
 rx: 0x04030201  ry: 0x08070605
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|

Here is a graphical depiction of this (again):

Representations Diagram

Array and Array Pointer

First, I discuss the array a, the pointer to its various elements pa, and the index variable ii:

 pa: 0x00D7755C  ii: 0x00000000
*pa: 0x00     a[ii]: 0x00
  x: 0x04030201   y: 0x08070605
...
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|

pa contains an address: 0x00D7755C, which points to the first element of the first variable in the memory dump of struct st at the bottom, to the far left, the array a (note the values 0001020304 in the memory dump - a[0] equals 0, a[1] equals 1, etc., from lowest address to highest left to right).

Each char object in the array a is represented by two hexadecimal digits, each single digit being a 4 bit half byte (or nibble), both making up the 1 byte (8 bit) char. ii is a 4 byte integer 0 (0x00000000), *pa is the contents of the pointer to the first element in the array a (0x00), and a[ii] is the contents of the first element as well since ii is initially zero.

Data Storage and Representations

In the printout of the values of the variables above, formatted as regular hex numbers, the direction is most-significant byte (MSB) to least-significant byte (LSB) from left to right, opposite the memory dump direction, which is to LSB left to MSB right. Therefore, you will note that the third group of hex digits in the dump, which show the contents of the member object x, are in the opposite order of the variable number representation printout of x.

The architecture on my machine is Intel, so it is little-endian. That is, the LSB (01 in 0x04030201) occupies the lowest address spot in machine memory. Since the memory dump printout is low address left to high address right, we get (… 01020304 …) as the dump image of the int x.

In contrast, there are big-endian machines where the MSB occupies the the lowest addressed spot (to the left in the memory dump), which is 04 in 0x04030201, so we would get (… 04030201…) in the memory dump if I ran my program on big-endian hardware. Hence, in that sense, big-endian memory dumps are less confusing visually because they are printed in the same order as the standard numeric representation of the variable, not in opposite order, when looking at the memory dump.

However, for little-endian, notice that the two integers in the array in the memory dump numerically flow from lowest to highest even between the two separate integers: (… 01020304 05060708 …). So, in that sense, little endian can be actually less confusing.

Little-endian does seem to me to model reality better than big-endian, though, since the LSB (the least “important” byte) is in the low position and the MSB is in the high position. The number printouts are big-endian always because we naturally write number values from left to right (most significant digit left to least significant digit right, hence, essentially big-endian).

Endian-ness mainly comes into play when communicating between machines that differ in endian-ness from each other, or when dealing with networked communications on little-endian machines. Network protocols typically speak in big-endian orientation, so big-endian is also called network byte order.

Each two hex-digit byte (nibble “couplet”) is in the same order in both the variable number representation and the memory dump (most significant nibble to the left), even on a little-endian memory dump printout. Therefore, the digits within the individual couplets are backward in arrangement when compared to overall direction of the memory dump which goes from low address left to high address right on a little-endian machine.

You could say the individual byte nibble couplets are always in big-endian format. This is because they are printed out as hex numbers, and as discussed above, when we write a number, we write it out most significant digit left to least significant digit right.

Note that the array initialization list and referencing of the array of chars, which are 1 byte each, agrees with the memory dump of it in direction (low left to high right). This is because with arrays, the lower indexed values occupy the lower addresses, and the higher indexed values occupy higher addresses.

If the array were of a larger type, like int, each individual int in a little-endian machine would be backward from its number representation, but the lowest indexed int as a whole would be the lowest in memory (to the left in the memory dump), then higher indexed ints would be at higher addresses (to the right in the memory dump).

All these oddities are not just my program, though - this is typically how it is done in debugger dumps as well, and other programs that do memory dumps, and this reflects the realities of how machines store the data and how we are used to writing out numbers as humans. Dealing with these oddities gets easier with practice.

All variable values (number representations) are padded with zeroes to the left to form 8 digit printouts for clarity, and have the 0x prefix indicating that they are in hex format. Note that the hex dump of memory reflects, in a still human readable way (hex digits), the actual way the machine presents the values for the various member objects in memory to the program. In the machine they are stored as individual bits (1s and 0s). The variable value printouts reflect an even more human friendly way to visualize these contents in the way we grew up reading and writing numbers (albeit, here in hex!!).

`x` and `y` - The Integers

We now turn to the int variables x and y, which are used later in exercises for pointers and references:

  x: 0x04030201   y: 0x08070605
...
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|

You can see clearly that the storage allocated for the member objects x and y holds the values 0x04030201 and 0x08070605, respectively. The nibble couplets (04, 03, 02, 01, etc.) go from largest to smallest, left to right, in the variable printouts (0x04030201 and 0x08070605, respectively), but from smallest to largest, left to right, in the memory dump (01020304 and 05060708, respectively), which prints out from lowest to highest address, left to right, again because I am using little-endian hardware.

`p1` and `p2` - The Pointers

We now turn to the pointers p1 and p2, which point variously to x and y throughout the exercises:

 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x04030201 *p2: 0x08070605
...
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|

Initially, p1 points to x and p2 points to y. First, we note *p1 and *p2 return the contents of what the pointers point to (0x04030201 and 0x08070605 for x and y, respectively, coinciding with both the variable printouts and the memory dumps).

Now we get to some real meat: what is actually stored in memory for these pointers is the memory address of x and y. The address in p1 is 0x00D77564 - pointing to x, which occupies 4 bytes. p2, pointing to y, is 0x00D77568 - exactly 4 bytes higher in memory than the address in p1.

We can see clearly in the memory dump that x is adjacent to y, and y is four bytes higher than x (the width of x higher). So, we see how the addresses stored in the locations held by the pointers p1 and p2 initially actually do hold the real address in memory of the objects x and y.

Now, remember the array and pointer to it above?

 pa: 0x00D7755C  ii: 0x00000000
...
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|

That array pointer, and hence address of the beginning of the array, is 0x00D7755C - exactly 8 bytes lower in memory than the address of x (0x00D77564). This is the 4 byte width of the array itself plus the four byte width of the pointer to that array. Starting to make sense?

`rx` and `ry` - The References

Now we deal with the final members of the struct st - the references rx and ry:

 rx: 0x04030201  ry: 0x08070605
...
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|

Here, when we print out the reference variables rx and ry, the values printed are the values they point to (x and y, or 0x04030201 and 0x08070605, respectively), not the addresses of variables x and y, like the pointers p1 and p2 print out above. There is no need for a unary prefix operator * to dereference them, and they can be directly assigned to a variable of type int.

But here is where it gets interesting: we can see in the hex memory dump that clearly rx and ry hold the exact same address as p1 and p2 in their respective memory locations. rx, like p1, holds an address that is four bytes lower than ry (64 in the lowest, or far left, nibble couplet compared for rx and 68 for ry). For rx and ry, here we only know this from the memory dump since printing them out as variables prints out the values of what they reference, not the addresses contained in their memory slots.

Even though the usage of the references is simpler and less cumbersome, at least in never needing to deal with addresses like pointers or the dereference operator, the compiler implementation (under the hood) places the address for the referenced variable in a memory slot for the reference variable itself just like a pointer variable.

You cannot initialize references with the & unary prefix operator that returns the address, nor do you normally dereference it with the * unary prefix operator, and you cannot reassign a reference to a different location.

Note that you can get at the address stored in the actual memory slot held by the reference variable using the & unary prefix operator on the reference variable itself, like &rx, for instance, to assign it to a pointer variable. You will see this in one of the exercises below, where I discuss it a bit more.

Arrays-Pointers Exercises

We now embark on the last section - the illustrated exercises. First we illustrate arrays and pointers. Let’s jump right in:

Exercise 1: Arrays-Pointers

ii = 1; st.pa = &st.a[2];
printAll("ii = 1; st.pa = &st.a[2];");

// output
ii = 1; st.pa = &st.a[2];
 pa: 0x00D7755E  ii: 0x00000001
*pa: 0x02     a[ii]: 0x01
  x: 0x04030201   y: 0x08070605
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x04030201 *p2: 0x08070605
 rx: 0x04030201  ry: 0x08070605
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5E75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|

We set the array index variable ii to 1 (the second array member) then point pa, the pointer to the character array, to the address of the third array member (a[2]). In the variable printout, we see that a[ii] shows 0x01 as the contents, and *pa (the contents of the third array element) as 0x02.

The pointer pa memory contents jumped up by two bytes, from the address 0x00D7755C to 0x00D7755E, as we would expect going from the 1st element to 3rd. See the diagram below for a graphic of all this:

Exercise 1 Diagram: Arrays-Pointers

Exercise 2: Arrays-Pointers

st.a[ii] = 4; *st.pa = 5;
printAll("st.a[ii] = 4; *st.pa = 5;");

// output
st.a[ii] = 4; *st.pa = 5;
 pa: 0x00D7755E  ii: 0x00000001
*pa: 0x05     a[ii]: 0x04
  x: 0x04030201   y: 0x08070605
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x04030201 *p2: 0x08070605
 rx: 0x04030201  ry: 0x08070605
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00040503 5E75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|

We see above that setting a[ii] to 4 results in 0x04 being placed in the second element of array a (zero-based index 1, ii having been set in the previous exercise). Also, setting the contents of the element pointed to by pa to 5 results in the third element being 5. This is reflected both in the memory dump and in the variable printouts:

Exercise 2 Arrays-Pointers Diagram

Exercise 3: Arrays-Pointers

st.a[1] = 1; st.pa = &st.a[0]; *(st.pa + 2) = 2;
printAll("st.a[1] = 1; st.pa = &st.a[0]; *(st.pa + 2) = 2;");

// output
st.a[1] = 1; st.pa = &st.a[0]; *(st.pa + 2) = 2;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x04030201   y: 0x08070605
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x04030201 *p2: 0x08070605
 rx: 0x04030201  ry: 0x08070605
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|

First, we directly address the second element in the array using a number (a[1]), setting that element’s value back to 1. Above, we had set it to 4 using ii. Note that the value of a[ii], with ii still 1, is 1 again, and in the memory dump the second nibble couplet in the array is now 01 again.

We then point pa back to to a[0] using &st.a[0] to get the address of the first array element, then set the contents of (st.pa + 2), that is, 2 elements beyond a[0], which is the same as a[2], the third element, back to 2 from 5:

Exercise 3 Diagram: Arrays-Pointers

Pointers-References Exercises

Now we start on the pointer and reference exercises which use the integer struct st members x and y as well as the pointers p1 and p2 and references rx and ry that reference them.

First, we set x and y to 1 and 2, respectively, to make the diagrams easier to deal with:

st.x = 1; st.y = 2;
printAll("st.x = 1; st.y = 2;");

// output
st.x = 1; st.y = 2;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000001   y: 0x00000002
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x00000001 *p2: 0x00000002
 rx: 0x00000001  ry: 0x00000002
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01000000 02000000 6475D700 6875D700 6475D700 6875D700|

Here is a diagram that shows the state of affairs of these data members immediately after this is done:

Exercise Initial Diagram: Pointers-References

Exercise 1: Pointers-References

*st.p2 = *st.p1;
printAll("*st.p2 = *st.p1;");

// output
*st.p2 = *st.p1;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000001   y: 0x00000001
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x00000001 *p2: 0x00000001
 rx: 0x00000001  ry: 0x00000001
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01000000 01000000 6475D700 6875D700 6475D700 6875D700|

Here, we just use the * unary suffix operator to set the contents of what p2 points to to the contents of what p1 points to (p1 points to x, p2 points to y, so now x == y == 0x00000001).

Of note is that the variables x and y print out as the number 0x00000001, the LSB 01 nibble couplet on the far right, but in the memory dump, shown as 01000000, with the LSB 01 nibble couplet on the far left in the lowest memory address position. So, the difference in direction on my little-endian machine is very clear here.

See discussion about little-endian and big-endian number and data representations in the deep-dive section above if you are still unclear as to why. See diagram below:

Exercise 1 Diagram: Pointers-References

Exercise 2: Pointers-References

st.y = 2;
printAll("st.y = 2;");

// output
st.y = 2;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000001   y: 0x00000002
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x00000001 *p2: 0x00000002
 rx: 0x00000001  ry: 0x00000002
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01000000 02000000 6475D700 6875D700 6475D700 6875D700|

In this simple exercise: we just set the integer y to 2. y, *p2, ry and the memory dump for y all reflect this change. Here is a diagram:

Exercise 2 Diagram: Pointers-References

Exercise 3: Pointers-References

st.p2 = st.p1;
printAll("st.p2 = st.p1;");

// output
st.p2 = st.p1;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000001   y: 0x00000002
 p1: 0x00D77564  p2: 0x00D77564
*p1: 0x00000001 *p2: 0x00000001
 rx: 0x00000001  ry: 0x00000002
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 01000000 02000000 6475D700 6475D700 6475D700 6875D700|

Here we see a lot more change. The address in pointer p2 is now set equal to the address in pointer p1. So, both p2 and p1 now point to the integer x, and both contain the memory address of x (0x00D77564).

No longer do we have p2 pointing to 0x00D77568 (4 bytes higher, or one int-length higher than p1). *p1 and *p2 now equal each other (0x00000001) and both equal x, because they are showing the contents of x that they now both point to.

y, ry and the memory dump for y all remain the same. See diagram below:

Exercise 3 Diagram: Pointers-References

Exercise 4: Pointers-References

*st.p2 = 3; st.y = 4;
printAll("*st.p2 = 3; st.y = 4;");

// output
*st.p2 = 3; st.y = 4;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000003   y: 0x00000004
 p1: 0x00D77564  p2: 0x00D77564
*p1: 0x00000003 *p2: 0x00000003
 rx: 0x00000003  ry: 0x00000004
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 03000000 04000000 6475D700 6475D700 6475D700 6875D700|

Now I use the “contents of” unary prefix operator * to change the contents of what p2 points to to 3 (which is the contents of x), then set y equal to 4. *p1, *p2 (which both give the contents of x), x, rx, and the memory dump of x are all now 0x00000003, and y, ry, and the dump of y are now all 0x00000004. See diagram below:

Exercise 4 Diagram: Pointers-References

Exercise 5: Pointers-References

st.p2 = &st.y;
printAll("st.p2 = &st.y;");

// output
st.p2 = &st.y;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000003   y: 0x00000004
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x00000003 *p2: 0x00000004
 rx: 0x00000003  ry: 0x00000004
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 03000000 04000000 6475D700 6875D700 6475D700 6875D700|

Here, I point p2 back to y. The address in p2 is now 0x00D77568, 4 bytes higher than the address of x (0x00D77564). *p2, ry, y and the memory dump of y all now coincide with each other, with a value of 0x00000004 (memory dump showing 04000000).

Exercise 5 Diagram: Pointers-References

Exercise 6: Pointers-References

st.ry = st.rx;
printAll("st.ry = st.rx;");

// output
st.ry = st.rx;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000003   y: 0x00000003
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x00000003 *p2: 0x00000003
 rx: 0x00000003  ry: 0x00000003
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 03000000 03000000 6475D700 6875D700 6475D700 6875D700|

Here, we use the references rx and ry to access the variables they point to (x and y, respectively). Assigning rx to ry does not change the reference ry to refer to what rx refers to, but the contents y (which we access through ry) is set to the value of the value in x (accessed through rx).

x, y, rx, ry, and the memory dumps of x and y now have the same value in them. Note that the address in the memory dumps of rx and ry do not change to be equal to each other. The address in the memory dump of rx is still the same as p1, which points to x, and that of ry is still the same as p2, which points to y, and is four bytes (one int width) higher than that of rx.

Exercise 6 Diagram: Pointers-References

Exercise 7: Pointers-References

st.x = 5; st.ry = 6;
printAll("st.x = 5; st.ry = 6;");

// output
st.x = 5; st.ry = 6;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000005   y: 0x00000006
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x00000005 *p2: 0x00000006
 rx: 0x00000005  ry: 0x00000006
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 05000000 06000000 6475D700 6875D700 6475D700 6875D700|

Here, I set x to 5 directly. Then, I set y through ry to 6. Notice that the changes flow throughout all the variable, pointer contents, and reference printouts as well as the memory dumps. No addresses change in pointers or references.

Exercise 7 Diagram: Pointers-References

Exercise 8: Pointers-References

st.p2 = &st.rx;
printAll("st.p2 = &st.rx;");

// output
st.p2 = &st.rx;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000005   y: 0x00000006
 p1: 0x00D77564  p2: 0x00D77564
*p1: 0x00000005 *p2: 0x00000005
 rx: 0x00000005  ry: 0x00000006
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 05000000 06000000 6475D700 6475D700 6475D700 6875D700|

Here, we just place the address held in the rx itself in p2, pointing p2 to x instead of y. The addresses for both p2 and p1 now equal each other, and are the same as the address in the memory dump for rx. The contents of x, *p1, *p2, rx, and the memory dump of x all reflect the same number, the contents of x (5).

This is very interesting. Even though rx is a reference and cannot be manipulated or viewed like a pointer such as p1 or p2, using the unary prefix operator &, we can indeed get at the underlying address stored in the implementation of rx and assign it to p2.

If we took the address of p1 or p2 instead, it would have given us the address of the pointers themselves, not that address of x or y that they store. So, the address of operator does still work differently on references than it does on pointers. There is one level of indirection more when getting the actual address value stored in references than pointers.

Exercise 8 Diagram: Pointers-References

Exercise 9: Pointers-References

*st.p2 = 7; st.ry = 8;
printAll("*st.p2 = 7; st.ry = 8;");

// output
*st.p2 = 7; st.ry = 8;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000007   y: 0x00000008
 p1: 0x00D77564  p2: 0x00D77564
*p1: 0x00000007 *p2: 0x00000007
 rx: 0x00000007  ry: 0x00000008
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 07000000 08000000 6475D700 6475D700 6475D700 6875D700|

We directly put the value of 7 in the variable pointed to by p2 (currently x), then using the reference ry, put 8 into y. Note how all the variable printouts and memory dumps now coincide with x == 7 and y == 8. No pointers or reference addresses change.

Exercise 9 Diagram: Pointers-References

Exercise 10: Pointers-References

st.p2 = &st.y; st.rx = 9; *st.p2 = 10;
printAll("st.p2 = &st.y; st.rx = 9; *st.p2 = 10;");

// output
st.p2 = &st.y; st.rx = 9; *st.p2 = 10;
 pa: 0x00D7755C  ii: 0x00000001
*pa: 0x00     a[ii]: 0x01
  x: 0x00000009   y: 0x0000000A
 p1: 0x00D77564  p2: 0x00D77568
*p1: 0x00000009 *p2: 0x0000000A
 rx: 0x00000009  ry: 0x0000000A
Hex Dump of Structure Memory - Low Address ----> High Address
 a        pa       x        y        p1       p2       rx       ry
|00010203 5C75D700 09000000 0A000000 6475D700 6875D700 6475D700 6875D700|

Finally, in this last exercise, we point p2 back to y, put 9 in x through rx, and 10 in y through p2.

Exercise 10 Diagram: Pointers-References

Conclusion

This has been quite a ride! This article turned out to be much longer than I initially thought it would be. The best value found here is in the in-depth section and in the exercises, with their code printouts and diagrams, especially in seeing how the machine stores, and how code and printouts represent, the various types of data.

Find the complete code I used for this article and file bug reports on my Github repo, and edit and run the code on coliru.

Stay tuned for other exciting and elucidating insights as I work through this great book and do further explorations in the future.