A Tour of C++: Arrays, Pointers, and References Under the Hood
Introduction
Continuing in my series on Bjarne Stroustrup’s A Tour of C++, 2nd Edition, I elaborate in-depth on his treatment of arrays, pointers, and references. Many aspects of arrays, pointers, and references that need to be internalized by the developer are discussed in-depth from first principles, then illustrated in exercises by code, program output, and diagrams.
Stroustrup does an amazing job of clearly presenting the main aspects of arrays, pointers, and references in a very concise fashion, but here I expand on his treatment, especially with outputs of variable representations and memory dumps to show what is going on “under the hood” in a very clear way with lots of simple but revealing illustrated code exercises at the end.
This is by no means an exhaustive survey of arrays, pointers and references. It takes it’s cues for content from what the book covers. Look for further articles to cover areas not tackled in this article.
The Data Structure
I use a struct
data structure variable named st
to neatly pack my 4 byte array, integer, pointer (4 byte for 32 bit x86 programs), and reference elements
in memory, one right after the other. I use #pragma pack
to make sure there is no padding after the 4 byte array element when compiled as 64 bit (x64), or anywhere
else within the data structure. Here is the code that simultaneously declares, initializes, and instantiates the struct
and st
:
NOTE: At this point, an experienced programmer might want to skip ahead to the Deep Dive below, unless you want a good refresher!
There is a lot going on here. Briefly, in the declaration, I declare each element inside the data structure then initialize them as well, e.g.:
Just above the last #pragma pop
line at the bottom, I instantiate (i.e., create an instance of) the declared struct
by following the closing curly brace with the name st
.
This actually allocates the memory storage for the struct
and fills it with the initial values, becoming available in code that follows it for use through the name st
.
When the struct
variable st
is instantiated, the integer x
is initialized to equal 0x04030201
, the pointer variable p1
is initialized with the address of the object x
so that it points to x
, and the reference variable rx
is initialized to also to reference the object x
. We will build up to a full understanding of all that is
going on in this initialization code as we work through the article.
Below is are some examples of using st
and the variables within it:
Array and Pointer Basics
The first two elements declared and initialized within the struct
st variable are an array of four characters and a pointer to a character. We elaborate
on arrays and pointers below. Here is a quick example code snippet:
Arrays
An array is group of objects of the same type (and hence size). In machine memory as it is presented to the program, they are stored contiguously.
In the code above, using the built-in type char
(a 1 byte character), the name a
, and the unary suffix declarator operator [ ]
with a 4 inside
it, we tell the compiler that we want to declare a variable named a
that is an array of 4 characters (has a length of 4 elements).
The initializer list between the curly braces { }
to the right tells the compiler to initialize each of the four elements in the array with its respective
values separated by commas.
We can leave out the 4 and let the compiler figure out the number of elements in the array as well based on the number of elements in the initializer list:
However, when declared inside an enclosing struct
, we cannot do this because the declaration must be completely specified:
In an expression, as opposed to a declaration, the [ ]
unary suffix operator means “contents of,” allowing you to get or set the value stored in the
array element indicated by the zero-based index expression inside it. So, a[0]
is the first element in the above array, a[1]
the second, etc., etc.:
Pointers
Above, we declare a pointer to a char
type using the unary suffix declarator operator *
which simply means “pointer to.” So, we have just declared
a variable of type pointer-to-char. We then point it to the beginning of the array variable a
by directly referencing a
, which puts the address of
a[0]
in pa
. We then use the &
unary prefix operator, which means “address of,” to tell the compiler to change the address in pa
to the address of
the fourth element in the char
array variable a indicated by a[3]
.
We could have also used the &
unary prefix operator with a[0]
in the second line above to put the exact same address in the pointer pa
as
we do using the array variable a
only, pointing it to the same place. They are equivalent:
When used in an expression as a prefix, as opposed to a declaration as a suffix, the *
unary prefix operator “dereferences” the pointer.
That is, it changes it from a reference (a pointer to something), and instead makes it indicate the contents of the object referred to.
This is similar to the previously discussed [ ]
unary suffix operator for arrays. However, no index is used - it just gets the
“contents of” whatever the pointer is pointing to at the time used:
NOTE: I use the term “reference” above, but not in the same sense I use it below for reference types. Both pointers and references “reference” something else, or “point to” something else, but how you use them is different.
To help with the diagram below, we note that characters are actually stored as numbers in the machine storage accessed by the array variable or
pointers to the array. For simple one byte char
arrays, these are the ASCII values of the characters that would be output to the console.
We can assign regular numbers directly to the char
elements in the array instead of ASCII characters:
See the diagram below to help visualize this:
Pointers are variables with addresses in them, and those addresses are just numbers. You can manipulate them like numbers with increment operators and addition:
Finally, when doing pointer addition, incrementing adds the number of bytes of the size of the type of variable the pointer is declared to point to.
Above, the type was char
, so increment incremented by one byte, the size of a char
. However, if we have an array of integers (int
), it will increment by
4 bytes each time because integers are 4 bytes in size. Addition will add the number added multiplied by 4 to the address stored in the pointer:
Reference Basics
We now delve into reference types. References are similar to pointers and simpler to use, but somewhat more opaque, hiding some of the details that are explicit with pointers.
They can therefore be less error prone to use, although not always as we will see in a follow on article about pointers and references as parameters I will undertake in the future.
In the declaration we use the unary suffix declarator operator &
, which means “reference to:”
Unlike a pointer, you don’t need the unary prefix operator *
to dereference (get the “contents of”) the reference:
Also, the reference must be initialized with the variable it refers to and cannot be changed to refer to a different variable later:
Finally, incrementing or adding to a reference does not have the same effect it has on a pointer. The reference cannot itself be manipulated - only the object referred to. So, if the reference points to a number variable, we can do any normal mathematical manipulation allowed for the number variable referred to but it will not affect the reference variable itself:
The Data Structure Again
We will now, armed with our new understanding, delve into the struct again:
A struct is a container that encapsulates member objects. Its member elements must be referred to by prefixing the name of the struct
variable they are in:
When the reference to the struct
variable members is directly by the name of the struct
variable, you use the name plus a dot as the
prefix to refer to its inner elements. This is referred to as dot notation. The dot itself is called the member of object operator. When
a reference to a struct
is by a pointer to the struct
, we use the name followed by the arrow operator ->
(more properly,
the member of pointer operator) as the prefix to reference its inner elements. Above, we gave the struct
a name (myStruct
)
so we can use that name to make a pointer variable (stp
) of the type “pointer-to-myStruct”.
The last line in the code above is illustrative only - it would not really be good form to use in production code. It simply shows
explicitly what the arrow pointer operator ->
effectively does - it dereferences the stp
pointer variable, allowing access to the contents
of the element of the struct
to the right of it using the dot notation.
NOTE: I will be leaving off the st.
prefix in most of the discussions and diagrams below because it is cumbersome to include in the diagrams.
So, st.x
becomes x
, and st.ry
becomes ry
, etc. Just remember that every variable I discuss, except for the ii
index variable and the struct st
variable itself, needs the st.
in front of it in the real code.
Below is the diagram that shows the data structure I will be using for all the exercises (see the full struct
declaration above). To the left is a memory
dump for an x86 (32 bit Intel) program, and to the right are the values of each member as they would be output in hex along with the addition of the
0x
prefix to indicate the numbers are hex.
I do not dereference the pointer variables, so they show the actual memory address of the objects in memory the pointers point to. These addresses are stored in the pointer variables themselves:
Why struct? And #pragma pack
The reason I use a struct
for my purposes will become clearly illustrated in the full, deep dive treatment and exercises below: I want to pack all my
variables together so I can do a memory dump of all of them for purposes of exploration and understanding machine storage.
When we use a struct
, we guarantee that the variables inside it will be together in memory (not out of order or separated by other intervening variables).
I also use a #pragma pack
to ensure 4 byte packing, which leaves no padding after the 4 byte array or any other 4 (or 8, for x64) byte variables:
The starting #pragma pack
pushes whatever the current pack value is on a stack, then selects a 4 byte packing value so all the 4 byte variables will not have
any padding between them. When compiled for x64, even though the pointers and addresses stored for the references are 64 bit, or 8 bytes, there is still
no padding because 8 is a multiple of 4. The ending #pragma pack
simply pops the pushed pack value off the stack and back into the active pack value,
returning everything to what it was before the starting #pragma pack
.
Contents of the struct
The following code snippets are all taken from member element declarations inside the struct
declaration above. I leave out the surrounding code to
avoid repetition.
We will use the following to explore arrays and pointers used with arrays:
Next, we declare the member integer variables we use to further explore pointers and also explore references:
Next, the two pointers that will variously point to x
and y
:
Finally, the two references to x
and y
:
If any of this is opaque to you, refer to the discussions above about arrays, pointers, references and struct
s.
Deep Dive
Now, we will get into some more exploratory coding and do a deep dive, discussing the basics mentioned above in greater detail and bringing in subjects such as how the objects are stored in machine memory (from the point of view of what the application sees). I purposefully do not delve into virtual memory and paging here as that will be for another article.
Initial Discussion
Throughout this discussion, and below in the illustrated exercises, we use an index variable ii
to get the contents of a specific char
in the
4 byte array a
in the subroutine that outputs everything. The size_t
type is used for this purpose to choose the appropriate index type for
the target program/machine architecture (32 bit x86 or 64 bit x64 in my case, both Intel):
In my test and illustration code, I use a subroutine called printAll()
to print everything out after we manipulate various member objects in the struct st
.
It calls a utility function numToHexStr()
to turn numbers into zero-padded hex strings, and stHexDump()
, which prints out a hex formatted memory
dump of the struct st
, which calls numToHexStr()
itself as well. See the links to my Github repo and edit and run the code in coliru
if you want to see and experiment with all the code.
Initial Printout and Explanation
Below is the printout of the struct st
immediately after declaration - instantiation - initialization with no changes made yet (remember, I leave off
the struct prefix st
. in my printouts and diagrams for clarity and conciseness in the diagrams , so st.x
becomes just x
):
printAll("Initial");
// output
Initial
pa: 0x00D7755C ii: 0x00000000
*pa: 0x00 a[ii]: 0x00
x: 0x04030201 y: 0x08070605
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x04030201 *p2: 0x08070605
rx: 0x04030201 ry: 0x08070605
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|
Here is a graphical depiction of this (again):
Array and Array Pointer
First, I discuss the array a
, the pointer to its various elements pa
, and the index variable ii
:
pa: 0x00D7755C ii: 0x00000000
*pa: 0x00 a[ii]: 0x00
x: 0x04030201 y: 0x08070605
...
a pa x y p1 p2 rx ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|
pa
contains an address: 0x00D7755C
, which points to the first element of the first variable in the memory dump of struct st
at the bottom,
to the far left, the array a
(note the values 0001020304 in the memory dump - a[0]
equals 0, a[1]
equals 1, etc., from lowest address
to highest left to right).
Each char
object in the array a
is represented by two hexadecimal digits, each single digit being a 4 bit half byte (or nibble), both making up
the 1 byte (8 bit) char
. ii
is a 4 byte integer 0 (0x00000000
), *pa
is the contents of the pointer to the first element in the array a
(0x00
),
and a[ii]
is the contents of the first element as well since ii
is initially zero.
Data Storage and Representations
In the printout of the values of the variables above, formatted as regular hex numbers, the direction is most-significant byte (MSB) to
least-significant byte (LSB) from left to right, opposite the memory dump direction, which is to LSB left to MSB right.
Therefore, you will note that the third group of hex digits in the dump, which show the contents of the member object x
, are in the
opposite order of the variable number representation printout of x
.
The architecture on my machine is Intel, so it is little-endian. That is, the LSB (01 in 0x04030201
) occupies the lowest address spot
in machine memory. Since the memory dump printout is low address left to high address right, we get (… 01020304 …) as the dump image of the int x
.
In contrast, there are big-endian machines where the MSB occupies the the lowest addressed spot (to the left in the memory dump),
which is 04 in 0x04030201
, so we would get (… 04030201…) in the memory dump if I ran my program on big-endian hardware.
Hence, in that sense, big-endian memory dumps are less confusing visually because they are printed in the same order as the
standard numeric representation of the variable, not in opposite order, when looking at the memory dump.
However, for little-endian, notice that the two integers in the array in the memory dump numerically flow from lowest to highest even between the two separate integers: (… 01020304 05060708 …). So, in that sense, little endian can be actually less confusing.
Little-endian does seem to me to model reality better than big-endian, though, since the LSB (the least “important” byte) is in the low position and the MSB is in the high position. The number printouts are big-endian always because we naturally write number values from left to right (most significant digit left to least significant digit right, hence, essentially big-endian).
Endian-ness mainly comes into play when communicating between machines that differ in endian-ness from each other, or when dealing with networked communications on little-endian machines. Network protocols typically speak in big-endian orientation, so big-endian is also called network byte order.
Each two hex-digit byte (nibble “couplet”) is in the same order in both the variable number representation and the memory dump (most significant nibble to the left), even on a little-endian memory dump printout. Therefore, the digits within the individual couplets are backward in arrangement when compared to overall direction of the memory dump which goes from low address left to high address right on a little-endian machine.
You could say the individual byte nibble couplets are always in big-endian format. This is because they are printed out as hex numbers, and as discussed above, when we write a number, we write it out most significant digit left to least significant digit right.
Note that the array initialization list and referencing of the array of chars, which are 1 byte each, agrees with the memory dump of it in direction (low left to high right). This is because with arrays, the lower indexed values occupy the lower addresses, and the higher indexed values occupy higher addresses.
If the array were of a larger type, like int
, each individual int
in a little-endian machine would be backward from its number representation, but the
lowest indexed int
as a whole would be the lowest in memory (to the left in the memory dump), then higher indexed ints would be at higher addresses
(to the right in the memory dump).
All these oddities are not just my program, though - this is typically how it is done in debugger dumps as well, and other programs that do memory dumps, and this reflects the realities of how machines store the data and how we are used to writing out numbers as humans. Dealing with these oddities gets easier with practice.
All variable values (number representations) are padded with zeroes to the left to form 8 digit printouts for clarity, and have the 0x prefix indicating that they are in hex format. Note that the hex dump of memory reflects, in a still human readable way (hex digits), the actual way the machine presents the values for the various member objects in memory to the program. In the machine they are stored as individual bits (1s and 0s). The variable value printouts reflect an even more human friendly way to visualize these contents in the way we grew up reading and writing numbers (albeit, here in hex!!).
x
and y
- The Integers
We now turn to the int
variables x
and y
, which are used later in exercises for pointers and references:
x: 0x04030201 y: 0x08070605
...
a pa x y p1 p2 rx ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|
You can see clearly that the storage allocated for the member objects x
and y
holds the values 0x04030201
and 0x08070605
, respectively. The nibble couplets
(04, 03, 02, 01, etc.) go from largest to smallest, left to right, in the variable printouts (0x04030201
and 0x08070605
, respectively), but from smallest to largest,
left to right, in the memory dump (01020304 and 05060708, respectively), which prints out from lowest to highest address, left to right, again because I am using
little-endian hardware.
p1
and p2
- The Pointers
We now turn to the pointers p1
and p2
, which point variously to x
and y
throughout the exercises:
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x04030201 *p2: 0x08070605
...
a pa x y p1 p2 rx ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|
Initially, p1
points to x
and p2
points to y
. First, we note *p1
and *p2
return the contents of what the pointers point to
(0x04030201
and 0x08070605
for x
and y
, respectively, coinciding with both the variable printouts and the memory dumps).
Now we get to some real meat: what is actually stored in memory for these pointers is the memory address of x
and y
. The address in p1
is
0x00D77564
- pointing to x
, which occupies 4 bytes. p2
, pointing to y
, is 0x00D77568
- exactly 4 bytes higher in memory than the address in p1
.
We can see clearly in the memory dump that x
is adjacent to y
, and y
is four bytes higher than x
(the width of x
higher). So, we see how the addresses stored
in the locations held by the pointers p1
and p2
initially actually do hold the real address in memory of the objects x
and y
.
Now, remember the array and pointer to it above?
pa: 0x00D7755C ii: 0x00000000
...
a pa x y p1 p2 rx ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|
That array pointer, and hence address of the beginning of the array, is 0x00D7755C
- exactly 8 bytes lower in memory than the address of x (0x00D77564
).
This is the 4 byte width of the array itself plus the four byte width of the pointer to that array. Starting to make sense?
rx
and ry
- The References
Now we deal with the final members of the struct st
- the references rx
and ry
:
rx: 0x04030201 ry: 0x08070605
...
a pa x y p1 p2 rx ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|
Here, when we print out the reference variables rx
and ry
, the values printed are the values they point to
(x
and y
, or 0x04030201
and 0x08070605
, respectively), not the addresses of variables x
and y
, like the pointers p1
and p2
print out above.
There is no need for a unary prefix operator *
to dereference them, and they can be directly assigned to a variable of type int
.
But here is where it gets interesting: we can see in the hex memory dump that clearly rx
and ry
hold the exact same address as p1
and p2
in their respective
memory locations. rx
, like p1
, holds an address that is four bytes lower than ry
(64 in the lowest, or far left, nibble couplet compared for rx
and 68 for ry
).
For rx
and ry
, here we only know this from the memory dump since printing them out as variables prints out the values of what they reference,
not the addresses contained in their memory slots.
Even though the usage of the references is simpler and less cumbersome, at least in never needing to deal with addresses like pointers or the dereference operator, the compiler implementation (under the hood) places the address for the referenced variable in a memory slot for the reference variable itself just like a pointer variable.
You cannot initialize references with the &
unary prefix operator that returns the address, nor do you normally dereference it with the *
unary prefix operator,
and you cannot reassign a reference to a different location.
Note that you can get at the address stored in the actual memory slot held by the reference variable using the &
unary prefix operator on the
reference variable itself, like &rx
, for instance, to assign it to a pointer variable. You will see this in one of the exercises below, where
I discuss it a bit more.
Arrays-Pointers Exercises
We now embark on the last section - the illustrated exercises. First we illustrate arrays and pointers. Let’s jump right in:
Exercise 1: Arrays-Pointers
ii = 1; st.pa = &st.a[2];
printAll("ii = 1; st.pa = &st.a[2];");
// output
ii = 1; st.pa = &st.a[2];
pa: 0x00D7755E ii: 0x00000001
*pa: 0x02 a[ii]: 0x01
x: 0x04030201 y: 0x08070605
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x04030201 *p2: 0x08070605
rx: 0x04030201 ry: 0x08070605
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5E75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|
We set the array index variable ii
to 1 (the second array member) then point pa
, the pointer to the character array, to the address of the third
array member (a[2]
). In the variable printout, we see that a[ii]
shows 0x01
as the contents, and *pa
(the contents of the third array element) as 0x02
.
The pointer pa
memory contents jumped up by two bytes, from the address 0x00D7755C
to 0x00D7755E
, as we would expect going from the 1st element to 3rd.
See the diagram below for a graphic of all this:
Exercise 2: Arrays-Pointers
st.a[ii] = 4; *st.pa = 5;
printAll("st.a[ii] = 4; *st.pa = 5;");
// output
st.a[ii] = 4; *st.pa = 5;
pa: 0x00D7755E ii: 0x00000001
*pa: 0x05 a[ii]: 0x04
x: 0x04030201 y: 0x08070605
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x04030201 *p2: 0x08070605
rx: 0x04030201 ry: 0x08070605
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00040503 5E75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|
We see above that setting a[ii]
to 4 results in 0x04
being placed in the second element of array a (zero-based index 1, ii
having been set
in the previous exercise). Also, setting the contents of the element pointed to by pa
to 5 results in the third element being 5. This is
reflected both in the memory dump and in the variable printouts:
Exercise 3: Arrays-Pointers
st.a[1] = 1; st.pa = &st.a[0]; *(st.pa + 2) = 2;
printAll("st.a[1] = 1; st.pa = &st.a[0]; *(st.pa + 2) = 2;");
// output
st.a[1] = 1; st.pa = &st.a[0]; *(st.pa + 2) = 2;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x04030201 y: 0x08070605
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x04030201 *p2: 0x08070605
rx: 0x04030201 ry: 0x08070605
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 01020304 05060708 6475D700 6875D700 6475D700 6875D700|
First, we directly address the second element in the array using a number (a[1]
), setting that element’s value back to 1. Above, we had set it to 4 using ii
.
Note that the value of a[ii]
, with ii
still 1, is 1 again, and in the memory dump the second nibble couplet in the array is now 01 again.
We then point pa
back to to a[0]
using &st.a[0]
to get the address of the first array element, then set the contents of (st.pa + 2
), that is,
2 elements beyond a[0]
, which is the same as a[2]
, the third element, back to 2 from 5:
Pointers-References Exercises
Now we start on the pointer and reference exercises which use the integer struct st
members x
and y
as well as the pointers p1
and p2
and references
rx
and ry
that reference them.
First, we set x
and y
to 1 and 2, respectively, to make the diagrams easier to deal with:
st.x = 1; st.y = 2;
printAll("st.x = 1; st.y = 2;");
// output
st.x = 1; st.y = 2;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000001 y: 0x00000002
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x00000001 *p2: 0x00000002
rx: 0x00000001 ry: 0x00000002
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 01000000 02000000 6475D700 6875D700 6475D700 6875D700|
Here is a diagram that shows the state of affairs of these data members immediately after this is done:
Exercise 1: Pointers-References
*st.p2 = *st.p1;
printAll("*st.p2 = *st.p1;");
// output
*st.p2 = *st.p1;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000001 y: 0x00000001
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x00000001 *p2: 0x00000001
rx: 0x00000001 ry: 0x00000001
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 01000000 01000000 6475D700 6875D700 6475D700 6875D700|
Here, we just use the *
unary suffix operator to set the contents of what p2
points to to the contents of what p1
points to
(p1
points to x
, p2
points to y
, so now x
== y
== 0x00000001
).
Of note is that the variables x
and y
print out as the number 0x00000001
, the LSB 01 nibble couplet on the far right, but in the memory dump,
shown as 01000000, with the LSB 01 nibble couplet on the far left in the lowest memory address position. So, the difference in direction on
my little-endian machine is very clear here.
See discussion about little-endian and big-endian number and data representations in the deep-dive section above if you are still unclear as to why. See diagram below:
Exercise 2: Pointers-References
st.y = 2;
printAll("st.y = 2;");
// output
st.y = 2;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000001 y: 0x00000002
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x00000001 *p2: 0x00000002
rx: 0x00000001 ry: 0x00000002
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 01000000 02000000 6475D700 6875D700 6475D700 6875D700|
In this simple exercise: we just set the integer y
to 2. y
, *p2
, ry
and the memory dump for y
all reflect this change.
Here is a diagram:
Exercise 3: Pointers-References
st.p2 = st.p1;
printAll("st.p2 = st.p1;");
// output
st.p2 = st.p1;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000001 y: 0x00000002
p1: 0x00D77564 p2: 0x00D77564
*p1: 0x00000001 *p2: 0x00000001
rx: 0x00000001 ry: 0x00000002
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 01000000 02000000 6475D700 6475D700 6475D700 6875D700|
Here we see a lot more change. The address in pointer p2
is now set equal to the address in pointer p1
. So, both p2
and p1
now point
to the integer x
, and both contain the memory address of x
(0x00D77564
).
No longer do we have p2
pointing to 0x00D77568
(4 bytes higher, or one int-length higher than p1
). *p1
and *p2
now equal each other
(0x00000001
) and both equal x
, because they are showing the contents of x
that they now both point to.
y
, ry
and the memory dump for y
all remain the same. See diagram below:
Exercise 4: Pointers-References
*st.p2 = 3; st.y = 4;
printAll("*st.p2 = 3; st.y = 4;");
// output
*st.p2 = 3; st.y = 4;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000003 y: 0x00000004
p1: 0x00D77564 p2: 0x00D77564
*p1: 0x00000003 *p2: 0x00000003
rx: 0x00000003 ry: 0x00000004
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 03000000 04000000 6475D700 6475D700 6475D700 6875D700|
Now I use the “contents of” unary prefix operator *
to change the contents of what p2
points to to 3 (which is the contents of x
),
then set y equal to 4. *p1
, *p2
(which both give the contents of x
), x
, rx
, and the memory dump of x
are all now 0x00000003
, and y
, ry
,
and the dump of y
are now all 0x00000004
. See diagram below:
Exercise 5: Pointers-References
st.p2 = &st.y;
printAll("st.p2 = &st.y;");
// output
st.p2 = &st.y;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000003 y: 0x00000004
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x00000003 *p2: 0x00000004
rx: 0x00000003 ry: 0x00000004
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 03000000 04000000 6475D700 6875D700 6475D700 6875D700|
Here, I point p2
back to y
. The address in p2
is now 0x00D77568
, 4 bytes higher than the address of x
(0x00D77564
). *p2
, ry
, y
and the memory dump
of y
all now coincide with each other, with a value of 0x00000004
(memory dump showing 04000000).
Exercise 6: Pointers-References
st.ry = st.rx;
printAll("st.ry = st.rx;");
// output
st.ry = st.rx;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000003 y: 0x00000003
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x00000003 *p2: 0x00000003
rx: 0x00000003 ry: 0x00000003
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 03000000 03000000 6475D700 6875D700 6475D700 6875D700|
Here, we use the references rx
and ry
to access the variables they point to (x
and y
, respectively). Assigning rx
to ry
does not change
the reference ry
to refer to what rx
refers to, but the contents y
(which we access through ry
) is set to the value of the value in x
(accessed through rx
).
x
, y
, rx
, ry
, and the memory dumps of x
and y
now have the same value in them. Note that the address in the memory dumps of rx
and ry
do not
change to be equal to each other. The address in the memory dump of rx
is still the same as p1
, which points to x
, and that of ry
is still the
same as p2
, which points to y
, and is four bytes (one int width) higher than that of rx
.
Exercise 7: Pointers-References
st.x = 5; st.ry = 6;
printAll("st.x = 5; st.ry = 6;");
// output
st.x = 5; st.ry = 6;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000005 y: 0x00000006
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x00000005 *p2: 0x00000006
rx: 0x00000005 ry: 0x00000006
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 05000000 06000000 6475D700 6875D700 6475D700 6875D700|
Here, I set x
to 5 directly. Then, I set y
through ry
to 6. Notice that the changes flow throughout all the variable, pointer contents,
and reference printouts as well as the memory dumps. No addresses change in pointers or references.
Exercise 8: Pointers-References
st.p2 = &st.rx;
printAll("st.p2 = &st.rx;");
// output
st.p2 = &st.rx;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000005 y: 0x00000006
p1: 0x00D77564 p2: 0x00D77564
*p1: 0x00000005 *p2: 0x00000005
rx: 0x00000005 ry: 0x00000006
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 05000000 06000000 6475D700 6475D700 6475D700 6875D700|
Here, we just place the address held in the rx
itself in p2
, pointing p2
to x
instead of y
. The addresses for both p2
and p1
now equal each other, and are the same as the address in the memory dump for rx
. The contents of x
, *p1
, *p2
, rx
, and the memory
dump of x
all reflect the same number, the contents of x
(5).
This is very interesting. Even though rx
is a reference and cannot be manipulated or viewed like a pointer such as p1
or p2
, using the unary
prefix operator &
, we can indeed get at the underlying address stored in the implementation of rx
and assign it to p2
.
If we took the address of p1
or p2
instead, it would have given us the address of the pointers themselves, not that address of x
or y
that they store.
So, the address of operator does still work differently on references than it does on pointers. There is one level of indirection more when getting the actual
address value stored in references than pointers.
Exercise 9: Pointers-References
*st.p2 = 7; st.ry = 8;
printAll("*st.p2 = 7; st.ry = 8;");
// output
*st.p2 = 7; st.ry = 8;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000007 y: 0x00000008
p1: 0x00D77564 p2: 0x00D77564
*p1: 0x00000007 *p2: 0x00000007
rx: 0x00000007 ry: 0x00000008
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 07000000 08000000 6475D700 6475D700 6475D700 6875D700|
We directly put the value of 7 in the variable pointed to by p2
(currently x
), then using the reference ry
, put 8 into y
. Note how all the
variable printouts and memory dumps now coincide with x
== 7
and y
== 8. No pointers or reference addresses change.
Exercise 10: Pointers-References
st.p2 = &st.y; st.rx = 9; *st.p2 = 10;
printAll("st.p2 = &st.y; st.rx = 9; *st.p2 = 10;");
// output
st.p2 = &st.y; st.rx = 9; *st.p2 = 10;
pa: 0x00D7755C ii: 0x00000001
*pa: 0x00 a[ii]: 0x01
x: 0x00000009 y: 0x0000000A
p1: 0x00D77564 p2: 0x00D77568
*p1: 0x00000009 *p2: 0x0000000A
rx: 0x00000009 ry: 0x0000000A
Hex Dump of Structure Memory - Low Address ----> High Address
a pa x y p1 p2 rx ry
|00010203 5C75D700 09000000 0A000000 6475D700 6875D700 6475D700 6875D700|
Finally, in this last exercise, we point p2
back to y
, put 9 in x
through rx
, and 10 in y
through p2
.
Conclusion
This has been quite a ride! This article turned out to be much longer than I initially thought it would be. The best value found here is in the in-depth section and in the exercises, with their code printouts and diagrams, especially in seeing how the machine stores, and how code and printouts represent, the various types of data.
Find the complete code I used for this article and file bug reports on my Github repo, and edit and run the code on coliru.
Stay tuned for other exciting and elucidating insights as I work through this great book and do further explorations in the future.