Over on gamedev.net there have been several questions lately about RAII, or the philosophy of Resource Acquisition Is Initialization. The common discussion question is: What does initialized mean, exactly?
Memory Resources and The Bad Old Days
One of the most commonly acquired resources is memory. If you want a variable, like an int or char or float, that memory needs to get acquired. If you are creating a structure instance or a class instance or an array of objects, you need memory for that.
In most old languages and systems, when you acquired the memory it contained whatever happened to be there before. It could be all zeros or all ones. It could be exactly the same object from memory that was freed earlier. It could be data from some other object. Programmers were trusted to assume the memory was not initialized to any particular value.
Security flaw of uninitialized data
The uninitialized data problem has been the source of many security bugs over the years. One that was recently in the news was the “heartbleed” bug. Someone over the network could request a buffer, but the system would only initialize part of the buffer. The network system let you send a buffer to be stored, and then request the buffer back. But unfortunately those could be different sizes. You could initialize a tiny buffer, say a single byte, then request kilobytes of uninitialized data back. That block of uninitialized memory was the key to the bug.
Usually the uninitialized data had garbage, the result from previous operations or internal structures that have no meaningful data to anyone else. Uninitialized data could sometimes have critical data. It could have included account numbers and passwords. It could have data from documents like bank numbers, social security numbers, confidential document data, or other information that should not be exposed.
Over the years, operating systems and programming languages have adopted patterns of sanitizing the values by initializing it to a known state for you. Typically when the OS or language initializes memory the values are set to all zeros. Contrast this with back in the 1980s and earlier where you could request a block of data and it would be uninitialized, left with whatever happened to be hanging around. This comes with a small cost, there is a background task in modern operating system that picks up stale memory and writes zeros to it, but this only affects memory when it is first assigned by the operating system. Many programs will reuse memory internally rather than go the slow route of acquiring memory from the operating system. Either way, it is best to know what is going on so you can implement whatever your program needs.
Acquire and Initialize Memory
Initializing the memory in older languages and systems was typically a two-step process. One step to acquire the memory, another step to initialize the memory. In C this would often look something like this:
ptr = malloc(sizeof SomeStruct);
memset(ptr, 0, sizeof SomeStruct);
While that sometimes is the right action, the pattern is not universal. If you were going to write known values over the structure you didn’t need to zero the whole thing, since that would mean initializing it twice. So you might get:
ptr = malloc(sizeof SomeStruct);
ptr->price = pPrice;
ptr->count = pCount;
Unfortunately, if the contents of the structure changed it meant you could leave important pieces with bad data. As projects grow and time passes, other developers might add additional members that you don’t know about, so your code will use the structure with its uninitialized data, never knowing that the uninitialized value is present until someone sends you a bug report.
If the data were not initialized, count could be millions. Price could be negative. Programmers needed to remember to initialize memory with something appropriate.
Modern Programming Languages
In most modern programming languages, object constructors will initialize all the key values during an object constructor. The constructor is called automatically by the language immediately after the allocation takes place so there is no need to forget.
What should I initialize it to?
Generally it is good to have a default constructor or a default initializer. For containers that usually means “empty”. For data that generally means either “nothing” or “intentionally blank”.
You can provide additional constructors that do additional work, but they should not be the default. When someone creates an array of objects you want that creation to be near-instantaneous.
System Resources
So that works for memory, but what about system resources?
There are quite a few system resources out there. Mutex objects and semaphores, graphics buffers, shared memory objects, windowing system objects, buttons and images and cursor objects, and so on.
Default initialization should be empty, fast, and never fail.
As the heading says, default initialization should be empty, fast, and never fail. The biggest reason for that is programmers frequently want more than one thing. They want a lot of things. Maybe the programmer wants to create an array of 500 objects.
Let’s say the object is a file buffer. And the programmer wants to create 500 of them.
In this case we’ll assume the programmer did not think about the requirement to be empty, fast, and never fail. Let’s say each one opens up a file on disk. That will require trips out to the file allocation table, then trips out to spin up the disk and open the buffers and start loading data in to the buffers. These things take time. Instead of returning near-instantly, perhaps in a few nanoseconds, we have to wait for the disk. This disk happens to be an SSD on the local machine. The system looks up the file allocation table, figures out the locations on disk, loads up the buffers, and perhaps takes several hundred microseconds. While hundreds of microseconds may seem fast for the human sitting at the computer, it is ages for the computer. The processor is sitting idle for hundreds of millions of cycles waiting for the drive.
And that’s a fast disk. The disc could just have easily been located across the network or across the world, and the response time could be very slow, such as spinning up a DVD and waiting for the light to seek to the right location, or spinning wheels on a tape drive until the magnetic head eventually reaches the right location.
Initializing each object could take ages.
And what about failure? What happens if one of those 500 files happens to not be found, or has a read error? The array will be partially created, some objects valid but other objects invalid. The programmer has no good option other than to destroy the objects, but the programmer doesn’t know which objects have been placed in a valid state and which have uninitialized and potentially hazardous data.
Many modern languages take the solution of automatically destroying all the objects that were created successfully then throwing an exception for you to catch, but it is better to not be in a position where that happens in the first place.
Allocating system objects initialized to non-empty
Just like memory objects, it is okay to have a default empty constructor and a constructor that does work.
The file object is a great example. I can create an array of 500 file objects that returns instantly. Then I can call the .Open() function on every one of those objects giving it a file name, and if any one of them fails I can handle it appropriately. The key is that every one of the items is initialized to a known state, the “not associated with a file” state.
The programmer can provide a function, such as .AcquireMutex() or .GetVertexArray() or whatever other system object is needed, and in that second non-default constructor the programmer can call those methods which are non-empty, or slow, or allowed to fail. Those are not the default constructor, so that is okay.
As long as the programmer provides a default constructor that is empty, fast, and doesn’t fail, they can provide whatever alternative methods they want.
Cleaning Up, the flip side of RAII.
This article is getting longer than I prefer, so I’ll continue it in the next one.
I’ve read many times and talked with people who felt the main strength of RAII was not creation of objects, but the destruction of objects and cleaning up when done.
Continue to the next article, RAII is also about destruction.