Introduction to Pointers
A pointer is nothing more than a variable that holds a memory address. When used properly, a pointer holds a valid memory address that contains an object, which is compatible with the type of the pointer. Like references in C#, all pointers in a particular execution environment have the same size, regardless of the type of data the pointer points to. For example, when a program is compiled for and run on a 32-bit operating system, a pointer will typically be 4 bytes (32 bits).
Pointers can point to any memory address. You can, and frequently will, have pointers to objects that are on the stack. You can also have pointers to static objects, to thread local objects, and, of course, to dynamic (i.e., heap allocated) objects. When programmers with only a passing familiarity with pointers think of them, it’s usually in the context of dynamic objects.
Because of potential leaks, you should never allocate dynamic memory outside of a smart pointer. The C++ Standard Library provides two smart pointers that you should consider: std::shared_ptr
and std::unique_ptr
.
By putting dynamic duration objects inside one of these, you guarantee that when the std::unique_ptr
, or the last std::shared_ptr
that contains a pointer to that memory goes out of scope, the memory will be properly freed with the correct version of delete (delete or delete[]) so it won't leak. That’s the RAII pattern from the previous chapter in action.
Only two things can happen when you do RAII right with smart pointers: The allocation succeeds, and therefore the memory will be properly freed when the smart pointer goes out of scope or the allocation fails, in which case there was no memory allocated and thus no leak. In practice, the last situation should be quite rare on modern PCs and servers due to their large memory and their provision of virtual memory.
If you don’t use smart pointers, you're just asking for a memory leak. Any exception between allocating the memory with new or new[] and freeing the memory with delete or delete[] will likely result in a memory leak. If you aren’t careful, you could accidentally use a pointer that was already deleted, but was not set equal to nullptr. You would then be accessing some random location in memory and treating it like it’s a valid pointer.
The best thing that could happen in that case is for your program to crash. If it doesn’t, then you’re corrupting data in strange, unknown ways and possibly saving those corruptions to a database or pushing them across the web. You could be opening the door to security problems too. So use smart pointers and let the language handle memory-management issues for you.
Const Pointer
A const pointer takes the form SomeClass* const someClass2 = &someClass1;
. In other words, the * comes before const. The result is that the pointer itself cannot point to anything else, but the data the pointer points at remains mutable. This is not likely to be very useful in most situations.
Pointer to Const
A pointer to const takes the form const SomeClass* someClass2 = &someClass1;
. In this case the * comes after const. The result is that the pointer can point to other things, but you cannot modify the data it points to. This is a common way to declare parameters that you simply want to inspect without modifying their data.
Const Pointer to Const
A const pointer to const takes the form const SomeClass* const someClass2 = &someClass1;
. Here, the * is sandwiched between two const keywords. The result is that the pointer cannot point to anything else, and you cannot modify the data it points to.
Const-Correctness and Const Member Functions
Const-correctness refers to using the const keyword to decorate both parameters and functions so the presence or absence of the const keyword properly conveys any potential side effects. You can mark a member function const by putting the const keyword after the declaration of the function’s parameters.
For example, int GetSomeInt(void) const;
declares a const member function—a member function that does not modify the data of the object it belongs to. The compiler will enforce this guarantee. It will also enforce the guarantee that when you pass an object into a function that takes it as const, that function cannot call any non-const member functions of that object.
Designing your program to adhere to const-correctness is easier when you start doing it from the beginning. When you adhere to const-correctness, it becomes easier to use multithreading, since you know exactly which member functions have side effects. It’s also easier to track down bugs related to invalid data states. Others who are collaborating with you on a project will also be aware of potential changes to the class’ data when they call certain member functions.
The *
, &
, and ->
Operators
When working with pointers, including smart pointers, three operators are of interest: *, &, and ->.
The indirection operator, *, de-references a pointer, meaning you work with the data that is pointed to, instead of the pointer itself. For the next few paragraphs, let’s assume that p_someInt is a valid pointer to an integer with no const qualifications.
The statement p_someInt = 5000000;
would not assign the value 5000000 to the integer that is pointed to. Instead, it would set the pointer to point to the memory address 5000000, 0X004C4B40 on a 32-bit system. What is at memory address 0X004C4B40? Who knows? It could be your integer, but chances are it is something else. If you are lucky, it is an invalid address. The next time you try to use p_someInt
properly, your program will crash. If it is a valid data address though, then you will likely corrupt data.
The statement *p_someInt = 5000000;
will assign the value 5000000 to the integer pointed to by p_someInt. This is the indirection operator in action; it takes p_someInt and replaces it with an L-value that represents the data at the address pointed to (we’ll discuss L-values soon).
The address-of operator, &, fetches the address of a variable or a function. This allows you to create a pointer to a local object, which you can pass to a function that wants a pointer. You don’t even need to create a local pointer to do that; you can simply use your local variable with the address-of operator in front of it as the argument, and everything will work just fine.
Pointers to functions are similar to delegate instances in C#. Given this function declaration: double GetValue(int idx);
this would be the right function pointer: double (*SomeFunctionPtr)(int);
.
If your function returned a pointer, say like this: int* GetIntPtr(void);
then this would be the right function pointer: int* (*SomeIntPtrDelegate)(void);
. Don’t let the double asterisks bother you; just remember the first set of parentheses around the * and function pointer name so the compiler properly interprets this as a function pointer rather than a function declaration.
The -> member access operator is what you use to access class members when you have a pointer to a class instance. It functions as a combination of the indirection operator and the . member access operator. So p_someClassInstance->SetValue(10);
and (*p_someClassInstance).SetValue(10);
both do the same thing.
L-values and R-values
It wouldn’t be C++ if we didn’t talk about L-values and R-values at least briefly. L-values are so called because they traditionally appear on the left side of an equal sign. In other words, they are values that can be assigned to—those which will survive the evaluation of the current expression. The most familiar type of L-value is a variable, but it also includes the result of calling a function that returns an L-value reference.
R-values traditionally appear on the right side of the equation or, perhaps more accurately, they are values that could not appear on the left. They are things such as constants, or the result of evaluating an equation. For example, a + b where a and b might be L-values, but the result of adding them together is an R-value, or the return value of a function that returns anything other than void or an L-value reference.
References
References act just like non-pointer variables. Once a reference is initialized, it cannot refer to another object. You also must initialize a reference where you declare it. If your functions take references rather than objects, you will not incur the cost of a copy construction. Since the reference refers to the object, changes to it are changes to the object itself.
Just like pointers, you can also have a const reference. Unless you need to modify the object, you should use const references since they provide compiler checks to ensure that you aren’t mutating the object when you think you aren’t.
There are two types of references: L-value references and R-value references. An L-value reference is marked by an & appended to the type name (e.g., SomeClass&), whereas an R-value reference is marked by an && appended to the type name (e.g., SomeClass&&). For the most part, they act the same; the main difference is that the R-value reference is extremely important to move semantics.
Pointer and Reference Sample
The following sample shows pointer and reference usage with explanations in the comments.
Sample: PointerSample\PointerSample.cpp
#include <memory> //// See the comment to the first use of assert() in _pmain below. //#define NDEBUG 1 #include <cassert> #include "../pchar.h" using namespace std; void SetValueToZero(int& value) { value = 0; } void SetValueToZero(int* value) { *value = 0; } int _pmain(int /*argc*/, _pchar* /*argv*/[]) { int value = 0; const int intArrCount = 20; // Create a pointer to int. int* p_intArr = new int[intArrCount]; // Create a const pointer to int. int* const cp_intArr = p_intArr; // These two statements are fine since we can modify the data that a // const pointer points to. // Set all elements to 5. uninitialized_fill_n(cp_intArr, intArrCount, 5); // Sets the first element to zero. *cp_intArr = 0; //// This statement is illegal because we cannot modify what a const //// pointer points to. //cp_intArr = nullptr; // Create a pointer to const int. const int* pc_intArr = nullptr; // This is fine because we can modify what a pointer to const points // to. pc_intArr = p_intArr; // Make sure we "use" pc_intArr. value = *pc_intArr; //// This statement is illegal since we cannot modify the data that a //// pointer to const points to. //*pc_intArr = 10; const int* const cpc_intArr = p_intArr; //// These two statements are illegal because we cannot modify //// what a const pointer to const points to or the data it //// points to. //cpc_intArr = p_intArr; //*cpc_intArr = 20; // Make sure we "use" cpc_intArr. value = *cpc_intArr; *p_intArr = 6; SetValueToZero(*p_intArr); // From <cassert>, this macro will display a diagnostic message if the // expression in parentheses evaluates to anything other than zero. // Unlike the _ASSERTE macro, this will run during Release builds. To // disable it, define NDEBUG before including the <cassert> header. assert(*p_intArr == 0); *p_intArr = 9; int& r_first = *p_intArr; SetValueToZero(r_first); assert(*p_intArr == 0); const int& cr_first = *p_intArr; //// This statement is illegal because cr_first is a const reference, //// but SetValueToZero does not take a const reference, only a //// non-const reference, which makes sense considering it wants to //// modify the value. //SetValueToZero(cr_first); value = cr_first; // We can initialize a pointer using the address-of operator. // Just be wary because local non-static variables become // invalid when you exit their scope, so any pointers to them // become invalid. int* p_firstElement = &r_first; *p_firstElement = 10; SetValueToZero(*p_firstElement); assert(*p_firstElement == 0); // This will call the SetValueToZero(int*) overload because we // are using the address-of operator to turn the reference into // a pointer. SetValueToZero(&r_first); *p_intArr = 3; SetValueToZero(&(*p_intArr)); assert(*p_firstElement == 0); // Create a function pointer. Notice how we need to put the // variable name in parentheses with a * before it. void (*FunctionPtrToSVTZ)(int&) = nullptr; // Set the function pointer to point to SetValueToZero. It picks // the correct overload automatically. FunctionPtrToSVTZ = &SetValueToZero; *p_intArr = 20; // Call the function pointed to by FunctionPtrToSVTZ, i.e. // SetValueToZero(int&). FunctionPtrToSVTZ(*p_intArr); assert(*p_intArr == 0); *p_intArr = 50; // We can also call a function pointer like this. This is // closer to what is actually happening behind the scenes; // FunctionPtrToSVTZ is being de-referenced with the result // being the function that is pointed to, which we then // call using the value(s) specified in the second set of // parentheses, i.e. *p_intArr here. (*FunctionPtrToSVTZ)(*p_intArr); assert(*p_intArr == 0); // Make sure that we get value set to 0 so we can "use" it. *p_intArr = 0; value = *p_intArr; // Delete the p_intArray using the delete[] operator since it is a // dynamic p_intArray. delete[] p_intArr; p_intArr = nullptr; return value; }
Volatile
I mention volatile only to caution against using it. Like const, a variable can be declared volatile. You can even have a const volatile; the two are not mutually exclusive.
Here’s the thing about volatile: It likely does not mean what you think it means. For example, it is not good for multithreaded programming. The actual use case for volatile is extremely narrow. Chances are, if you put the volatile qualifier on a variable, you are doing something horribly wrong.
Eric Lippert, a member of the C# language team at Microsoft, described the use of volatile as, “A sign that you are doing something downright crazy: You're attempting to read and write the same value on two different threads without putting a lock in place.” He's right, and his argument carries over perfectly into C++.
The use of volatile should be greeted with more skepticism than the use of goto. I say this because I can think of at least one valid general-purpose use of goto: breaking out of a deeply nested loop construct upon the completion of a non-exceptional condition. volatile, by contrast, is really only useful if you are writing a device driver or writing code for some type of ROM chip. On that point, you really should be thoroughly familiar with the ISO/IEC C++ Programming Language Standard itself, the hardware specs for the execution environment your code will be running in, and probably the ISO/IEC C Language Standard too.
Note: You should also be familiar with assembly language for the target hardware, so you can look at code that is generated and make sure the compiler is generating correct code (PDF) for your use of volatile.
I have been ignoring the existence of the volatile keyword and shall continue to do so for the remainder of this book. This is perfectly safe, since:
- It's a language feature that doesn't come into play unless you actually use it.
- Its use can safely be avoided by virtually everyone.
One last note about volatile: The one effect it is very likely to produce is slower code. Once upon a time, people thought volatile produced the same result as atomicity. It doesn’t. When properly implemented, atomicity guarantees that multiple threads and multiple processors cannot read and write an atomically accessed chunk of memory at the same time. The mechanisms for this are locks, mutexes, semaphones, fences, special processor instructions, and the like. The only thing volatile does is force the CPU to fetch a volatile variable from memory rather than use any value it might have cached in a register or on a stack. It is the memory fetching that slows everything down.
Conclusion
Pointers and references not only confuse a lot of developers, they are very important in a language like C++. It's therefore important to take your time to grasp the concept so that you don't run into problems down the road. The next article is all about casting in C++.
This lesson represents a chapter from C++ Succinctly, a free eBook from the team at Syncfusion.
Comments