Refactoring Legacy Code: Part 2 - Magic Strings & Constants

Old code. Ugly code. Complicated code. Spaghetti code. Jibberish nonsense. In two words, Legacy Code. This is a series that will help you work and deal with it.

We first met our legacy source code in our previous lesson. Then we ran the code to form an opinion about its basic functionality and created a Golden Master test suite to have a raw safety net for future changes. We will continue to work on this code and you can find it under the trivia/php_start folder. The other folder trivia/php_start is with this lesson's finished code.

The time for the first changes have come and what better way to understand a difficult code base than start to extract magic constants and strings into variables? These seemingly simple tasks will give us greater and sometimes unexpected insights into the inner workings of legacy code. We will need to figure out the intentions of the original code author and find the proper names for the pieces of code that we've never seen before.

Magic Strings

Magic strings are strings used directly in various expressions, without being assigned to a variable. This kind of string had a special meaning for the original author of the code, but instead of assigning them to a well named variable, the author thought the meaning of the string was obvious enough.

Identify the First Strings to Change

Let's start by looking at our Game.php and try to identify strings. If you are using an IDE (and you should) or a smarter text editor capable of highlighting source code, spotting the strings will be easy. Here is an image of how the code looks like on my display.

Everything with orange is a string. Finding each string in our source code is very easy this way. So, make sure highlighting is supported and enabled in your editor, whatever your application is.

The first orange part in our code is immediately at line three. However the string contains only a newline character. This should be obvious enough in my opinion, so we can move on.

When it comes to deciding what to extract and what to keep unchanged, there are few thumb-ups, but at the end it is your professional opinion that must prevail. Based on it, you will have to decide what to do with each piece of code you analyze.

So let's analyze lines 32 to 42, the snippet you can see above. For pop, science, and sports questions, there is just a simple concatenation. However, the action to compose the string for a rock question is extracted into a method. In your opinion, are these concatenations and strings clear enough so that we can keep all of them inside our for loop? Or, do you think extracting all strings into their methods would justify the existence of those methods? If so, how would you name those methods?

Update Golden Master and the Tests

Regardless of the answer, we will need to modify the code. It is time to put our Golden Master to work and write our test that actually runs and compares our code with the existing content.

We created another test to compare the outputs, made sure testGenerateOutput() only generates the output once and does nothing else. We also moved the golden master output file "gm.txt" into the current directory because "/tmp" may be cleared when the system reboots. For our actual results, we can still use it. On most UNIX like systems "/tmp" is mounted into the RAM so it is much faster than the file system. If you did well, the tests should pass without a problem.

It is very important to remember to mark our generator test as "skipped" for future changes. If you feel more comfortable with commenting or even deleting it altogether, please do so. It is important that our Golden Master will not change when we change our code. It was generated once and we do not want to modify it, ever, so that we can be sure our newly generated code always compares to the original. If you feel more comfortable doing a backup of it, please proceed to do so.

I will just mark the test as skipped. This will put our test result to "yellow", meaning all tests are passing but some are skipped or marked as incomplete.

Making Our First Change

With tests in place, we can start changing code. In my professional opinion, all the strings can be kept inside the for loop. We will take the code from the createRockQuestion() method, move it inside the the for loop, and delete the method altogether. This refactoring is called Inline Method.

"Put the method's body into the body of its callers and remove the method." ~ Martin Fowler

There are a specific set of steps to do this type of refactoring, as defined by Marting Fowler in Refactoring: Improving the Design of Existing Code:

  • Check that the method is not polymorphic.
  • Find all calls to the method.
  • Replace each call with the method body.
  • Compile and test.
  • Remove the method definition.

We don't have subclasses extending Game, so the first step validates.

There is only a single use of our method, inside the for loop.

We put the code from createRockQuestion() into the for loop in the constructor. We still have our old code. It is now time to run our test.

Our tests are passing. We can delete our createRockQuestion() method.

Finally we should run our tests again. If we missed a call to a method, they will fail.

They should pass again. Congrats! We are done with our first refactoring.

Other Strings to Consider

Strings in the methods add() and roll() are only used to output them using the echoln() method. askQuestions() compares strings to categories. This seems acceptable also. currentCategory() on the other hand returns strings based on a number. In this method, there are a lot of duplicated strings. Changing any category, except Rock would require changing its name in three places, only in this method.

I think we can do better. We can use a refactoring method called Introduce Local Variable and eliminate the duplication. Follow these guidelines:

  • Add a variable with the desired value.
  • Find all uses of the value.
  • Replace all uses with the variable.

This is much better. You probably already observed some future improvements that we could do to our code, but we are only at the beginning of our work. It is tempting to fix everything you find immediately, but please don't. Many times, especially before the code is well understood, tempting changes can lead to dead ends or even more broken code. If you think there is a chance you will forget your idea, just note it on a sticky note or create a task in your project management software. Now we need to continue with our string related problems.

In the rest of the file, all strings are output related, sent into echoln(). For the time being, we will leave them untouched. Modifying them would effect the printing and delivering logic of our application. They are part of the presentation layer mixed with business logic. We will deal with separating different concerns in a future lesson.

Magic Constants

Magic constants are very much like magic strings, but with values. These values can be boolean values or numbers. We will concentrate mostly on numbers used in if statements or return statements or other expressions. If these numbers have an unclear meaning, we need to extract them into variables or methods.

We'll start with GameRunner.php this time and our first focus is the roll() method that gets some random numbers. The previous author did not care to give those numbers a meaning. Can we? If we analyze the code:

It will return a number between one and six. The random part returns a number between zero and five to which we always add one. So it is surely between one and six. Now we need to consider the context of our application. We are developing a trivia game. We know there is some kind of board on which our players must move. And to do so, we need to roll the dice. A die has six faces and it can produce numbers between one and six. That seems like a reasonable deduction.

Isn't that nice? We used the Introduce Local Variable refactoring concept again. We named our new variable $dice and it represents the random number generated between one and six. This also made our next statement sound really natural: Game, roll dice.

Did you run your tests? I didn't mention it, but we need to run them as frequently as possible. If you haven't, this would be a good time to run them. And they should pass.

So, this was a case of nothing more than just exchanging a number with a variable. We took a whole expression that represented a number and extracted it into a variable. This can be technically considered a Magic Constant case, but not a pure case. What about our next random expression?

This is more tricky. What are zero, nine and seven in that expression? Maybe we can name them. At first glance, I have no good ideas for zero and nine, so let's try seven. If the number returned by our random function is equal to seven, we will enter the first branch of the if statement which produces a wrong answer. So maybe our seven could be named $wrongAnswerId.

Our tests are still passing and the code is somewhat more expressive. Now that we managed to name our number seven, it puts the conditional into a different context. We can think of some decent names for zero and nine also. They are just parameters to rand(), so the variables will probably be named min-something and max-something.

Now that is expressive. We have a minimum answer ID, a maximum one and another for the wrong answer. Mystery solved.

But notice that all the code is inside a do-while loop. Do we need to re-assign the answer ID variables each time? I think not. Let's try to move them out of the loop and see if our tests are passing.

Yes. The tests pass like this as well.

It's time to switch to Game.php and look for Magic Constants there also. If you have code highlighting, you surely have constants highlighted in some bright color. Mine are blue and they are pretty easy to spot.

Finding the magic constant 50 in that for loop was quite easy. And if we look at what the code does, we can discover that inside the for loop, elements are pushed to several arrays. So we have some kind of lists, each with 50 elements. Each list represents a question category and the variables are actually class fields defined above as arrays.

So, what can 50 represent? I bet you already have some ideas. Naming is one of the most difficult tasks in programming, if you have more than one idea and you feel uncertain which one to choose, don't be ashamed. I also have various names in my head and I am evaluating possibilities for choosing the best one, even while writing this paragraph. I think we can go with a conservative name for 50. Something along the lines of$questionsInEachCategory or $categorySize or something similar.

That looks decent. We can keep it. And the tests are of course are passing.

What is two? I am sure at this point the answer is clear to you. That is easy:

Do you agree? If you have a better idea, feel free to comment below. And your tests? Are they still passing?

Now, in the roll() method we have some numbers also: two, zero, 11 and 12.

That is pretty clear. We will extract that expression into a method, but not in this tutorial. We are still in the phase of understanding and hunting for magic constants and strings. So what about 11 and 12? They are buried inside the third level of if statements. It is quite difficult to understand what they stand for. Maybe if we look at the lines around them.

If the current player's place or position is greater than 11, then its position will be reduced to the current one minus 12. This sounds like a case of when a player reaches the end of the board or play field and it is repositioned in its initial position. Probably position zero. Or, if our game board is circular, going over the last marked position will put the player into the relative first position. So 11 could be the board size.

Don't forget to replace 11 in both places inside the method. This will force us to move the variable assignment outside of the if statements, right at the first indentation level.

But if 11 is the board's size, what is 12? We subtract 12 from the player's current position, not 11. And why don't we just set the position to zero instead of subtracting? Because that would make our tests fail. Our previous guess that the player will end up in position zero after the code inside the if statement is run, was wrong. Let's say a player is in position ten and rolls a four. 14 is greater than 11, so the subtraction will happen. The player will end up in position 10+4-12=2.

This drives us toward another possible naming for 11 and 12. I think it is more appropriate to call 12 $boardSize. But what does that leave us for 11? Maybe $lastPositionOnTheBoard? A little bit long, but at least it tells us the truth about the magic constant.

I know, I know! There is some code duplication there. It is quite obvious, especially with the rest of the code hidden. But please remember that we were after magic constants. There will be a time for duplicate code also, but not right now.

Final Thoughts

I left one last magic constant in the code. Can you spot it? If you look at the final code, it will be replaced, but of course that would be cheating. Good luck finding it and thanks for reading.



Related Articles