Today at work we had a big discussion about annotations in some Java code. These wonderful little inventions save a time and make life much easier for programmers, but they are also the source of terrible nightmares. To get into the subject, I’ll need to explain a few things for the audience.
Example of Attributes and Annotations
In programming language like Java and C# we need to write code to accomplish a task. For example, if you want to save an object to a file, you might write a function that looks something like this:
public class Foo { public int a = 0; private int b = 0; private int c = 3; // DO NOT SAVE OR LOAD public String text = "placeholder"; // A bad example. This has a long way to go before becoming solid code. public void SaveMe( StreamWriter writer ) { writer.Write(a); writer.Write(b); // NOTE TO SELF: Do not save or load c writer.Write( text.Length ); writer.Write(text); } ... }
Then you’ll start building code to load the data from a reader. You’ll need to make sure you read them in the correct order. You write your first simple tests to save and load, and your three values are saved and loaded properly. Great.
Then you’ll try it out and quickly discover a bug: the code crashes when the text is null. So you fix the bug and try again. You move along.
A few days later you want to add another variable to your class, which means writing more save and load logic. … and you run in to a new problem, you need to provide default values for the older save files, and you need a way to mark the versions of the files.
Then you later add another type as a variable to your class, and you need to write even more logic to save and load the file.
In short order instead of spending your days writing useful code for your project, you discover that you are spending all your days writing code to save and load obscure data structures and internal parts of objects and other details that are a waste of your time.
So being a resourceful programmer you read up about how to save and load data automatically. You learn the process of reading and writing data is called “serialization”. Since this example is C#, you discover something called a “serializable” attribute. With joy you realize you don’t need to write all that code after all. You can mark the code with an attribute and the tools will take care of it all for you.
In C# these are marked with brackets, such as [Serializable]. In your reading you learn you can mark an entire class with the attribute and it will do everything your simple program needs:
[Serializable] public class Foo { public int a = 0; private int b = 0; [NonSerialized] private int c = 3; public String text = "placeholder"; ... } // Somewhere else where the object is saved IFormatter formatter = new BinaryFormatter(); Stream stream = new FileStream(filename, FileMode.Create, FileAccess.Write, FileShare.None); ... formatter.Serialize(stream, myFooInstance); // And somewhere the object is loaded IFormatter formatter = new BinaryFormatter(); Stream stream = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read); ... Foo myFooInstance = (Foo) formatter.Deserialize(stream);
Done! No need to write all that nasty code to save and load objects, or to make sure it all happens in the correct order. No need to worry if the value is null or not. No need to write special comments to remember that item c should not be saved or loaded. As long as you only include member objects that are also marked as [Serializable] they will automatically save and load properly.
Over time the code will grow and evolve and need more work, but for now that should be enough to illustrate the point.
Wonderful Magic! How does it work?
In C# these are called “attributes”, and are marked off with brackets above the class, or method, or member variable they apply to. In Java they are called “annotations”, and they start with an @ symbol. They can be used for many reasons, not just saving and loading code.
In our discussion at work, there was some Java code that was automatically loading data from a save file, using the annotation @Autowired.
These annotations and attributes are special markers in the languages that says there is more to the code than what is actually in the code. There is something extra the computer needs to add. It is called “metadata”, meaning information about other information.
When you mark a class with an annotation or attribute, the compiler inserts some extra data for the compiled file. Then at a later point when the code is running, other functions can peek at that special annotation or attribute, and do something special for it.
In the case of the Serializeable attribute mentioned in the earlier section, the system library has a bunch of code already written around saving and loading data. Functions like Serialize() and Deserialize() can peek at that special annotation or attribute, use that extra information. It can see that the member variable c is marked as NonSerialized so it skips over that value. More complicated classes like SerializableDictionary will include extra markers to indicate they need special handling. Since all this is built in, you can rely on an existing library to do all that work so you don’t have to.
In the case of the Autowired annotation at work, it meant that the value could be initialized automatically when the program starts. The exact details can be changed easily; a debug build can be automatically “wired” to point to one thing, and a test build can be automatically “wired” to point to another thing. It could be “wired” to automatically load a configuration value from a file. It could be “wired” to automatically equal an existing object. It could be “wired” to load a value from some other code. Using some rules that the code authors built into the code, the annotation is able to automatically figure out what values should be used for a default.
There are many excellent libraries out there that use annotations and attributes. They can simplify development of code by reusing something rather than writing it yourself. They can help reduce clutter by eliminating complex functions and automatically figuring out the details. They can be used to implement patterns without inheriting from an existing interface. They can solve many tricky coding nightmares.
Sounds neat. What’s the problem?
It seems every feature in programming has a cost associated. Annotations and attributes fit into the bucket with a cost. It is a cost that in most cases we programmers are willing to pay.
The obvious costs to annotations and attributes are extremely small. It includes a tiny marker in the compiled file, usually just a few bytes. The annotation usually lives in the static area and is a link to the annotation or attribute. It might be sixteen bytes or so, which isn’t much when you consider that the programs require tens of millions of bytes for all the rest of their executable.
But they have a hidden cost. A cost that is easily forgotten, even by experienced developers.
These features are a hidden dependency.
When the programmer writes the code, all they write is a little marker that says “This class uses this feature”. It might be: “This method needs to operate as a single transaction”, or it might be “This variable needs to be atomic so it works on multi-core computers”, or it might be the serialization functionality or autowire functionality discussed above.
When programmers are reviewing the code and maintaining the code, the annotations and attributes are nearly invisible. We normally don’t think about them. They just exist in the background.
The problem that showed up today was that a value was incorrect. I was asked to help look it over, so this really is a “we” and not the Royal We, where “we” means “me alone”. There was more than one experienced developer looking at this code, all of us collectively scratching our heads.
There was an object that was created and configured and seemed to have all the right values, but when we went to use it somewhere else in code, it had a different and unexpected value. Somewhere in the system, somewhere hidden that we could not see, that value was being modified.
We hunted high and low through that class. Everywhere the value was used it was being hit correctly. We put breakpoints on all the places it was used, it was never set to the wrong value, until suddenly and unexpectedly it was different We put breakpoints on the two places it was set, and it was set correctly. After more head-scratching, we dropped a memory breakpoint on the value so the program would stop any time the memory was modified.
Gotcha!
And with that memory breakpoint, we ran and and suddenly the program stopped. We saw that a completely unrelated system was modifying a completely unrelated variable, but the debugger assured us it was the same memory address.
We puzzled over the code for several minutes.
Why is it stopped here? How does this system have anything to do with the other system? The two systems didn’t have anything to do with each other. There were no references to other objects in the other area that we could see. The other system did not import the classes. There was no way it could know about the value.
Then somebody noticed the annotation. Among other things, the item was marked with @Autowired.
The @Autowired marker meant that the system was hooking up a value automatically according to its rules. Unfortunately for the programmers involved, it was in a location we did not expect. It was a perfectly valid location and once we understood the condition it was trivial to fix, but it took much longer to find and fix than it should have. The attribute caused a hidden dependency, a connection between systems that was not immediately obvious.
Beware of hidden dependencies.
Over the decades I’ve experienced all kinds of hidden dependencies. I generally recommend to other programmers that we eliminate hidden dependencies as much as possible.
Global values are one type of hidden dependency: Anybody can change them at any time, for any reason, and you cannot easily see why. The professional systems I’ve worked with usually avoid global variables, but occasionally one will slip through with some comments about how deadlines needed to be hit. The few times I’ve seen them used extensively, invariably there are bugs where some programmer accidentally modifies a value unexpectedly. In one case we recently discussed on Gamedev.net, one young programmer discovered his UI system was unexpectedly switching from Unicode mode to ASCII mode; with a bit of digging and memory debugging, the issue was tracked down to the UI state being stored in a global variable, a utility library resetting itself happened to include the UI representation format as one of the values getting reset.
Class-static and function-static variables are another hidden dependency: any code that modifies the value in one places changes it in every place. The old C libraries had a function called strtok(), a function that tokenizes a string, meaning it breaks down big strings like “Every good boy deserves fudge.” into segments “Every”, “good”, “boy”, “deserves” “fudge.” Unfortunately the specification for the function requires a static variable to store the temporary string. As long as no other code calls the function it works just fine. But if another part of code calls the function at the same time, perhaps with “Grizzly bears don’t fly airplanes!”, the results can be unexpected. The result for one side might be “Every”, “good”, “bears”, “fly”, the other location might get “Grizzly”, “don’t”, “airplanes!” The two can fight with each other, replacing the messages and consuming the tokens at an unexpected rate.
Singletons are another hidden dependency: there is an instance of the object shared everywhere that cannot be easily swapped out, cannot be readily replaced for testing, cannot be instanced, and can have any system mess with it at any time behind your back. Much has been written about the difficulties they can cause, I’ll let you search for that on your own.
Annotations and attributes introduce hidden dependencies as well. The programmer is relying on an external system to do work. The programmer may not be immediately aware of all the consequences of that work, and it is very easy to forget that the link exists at all. These links are very hard to break; you cannot reuse the code unless all the conditions for the link are present and working correctly.
Sometimes it is worth the cost to have these hidden dependencies. We all have deadlines. Programs need to get published. Many times these hidden dependencies are introduced as a way to save development time or to save costs. That is understandable. Players don’t buy a AAA blockbuster game and then complain, “This game uses a singleton pattern inside it, it uses too many globals, that makes it a terrible game”. All they care is the program works for them.
Whenever you implement a dependency in your game, keep in mind that it has costs. Sometimes those costs are steep.
Unfortunately, choosing to use hidden dependencies can sometimes have terrible costs. At some point in development you may discover that you need to remove that dependency. If you are late in development and used the dependency in thousands of locations, removing the dependency can require extensive changes and many days of work.
In our case today, counting all the people who looked over the code, nearly two full developer days were consumed.
Hidden dependencies can be useful, they can save time and effort. Just think carefully before you decide to add them to your code base, because sooner or later they will incur a cost when that dependency has a problem. Sometimes the cost is a few development hours, sometimes the cost is several development weeks.
Beware the hidden dependency. Select them carefully and understand their risks. Use them wisely.