Refactoring and Code Reviews

I’ve been reading Refactoring: 2nd Edition by Martin Fowler as part of the Dev Book Club for the past three months. If you’re unfamiliar with the book, it covers a variety of techniques you can use to make changes to existing code that ultimately makes the code function exactly the same as it did before, but which should (hopefully) leave the code easier to read and understand for you and anyone else who needs to read it.

The 2nd Edition examples are all written in JavaScript, compared with the original book (published in 1999) which instead used Java. This hopefully makes it a little easier to digest for folks who do most of their programming for the web, and particularly for those who are coding primarily with JavaScript itself, which seems to be an ever-growing number of developers these days.

If you do any work developing applications with other people, then you’re probably familiar with the concept of a code review. Code reviews are an opportunity for other developers to look at the code you’ve written and to provide feedback around various aspects of your proposed changes. When providing feedback to others during a code review, outside of confirming that the functionality of the changes works as expected, I generally ask myself a few questions:

  • Do I understand what this function or block of code is doing?
  • Do I think this code will still be clear 1, 3, 6, or 12 months from now?
  • Is this code broken down into small enough parts (e.g., is it only doing one thing, and doing it well)?

A variation of these questions is “Are there opportunities to make this easier to understand?” If there are, then I look to make those recommendations.

This morning, I was poking around in completely random parts of the WordPress Core codebase for no particular reason, other than I was thinking about this Refactoring book, and I wanted to see if there was anything that would jump out at me. WordPress as a project doesn’t really do a lot of refactoring on the code base over time, only lightly changing parts of existing code if it’s specific to a bug or feature (which, as an aside, I think is an extremely good policy to have, and in general, I’m really glad that the contributors to the project do this). Looking at the WP_Widget class, something that’s existed in WordPress since version 2.8, I noticed one method that piqued my interest and led me to write this blog post – the get_settings method. Let’s take a look at it:


/**
 * Retrieves the settings for all instances of the widget class.
 *
 * @since 2.8.0
 *
 * @return array Multi-dimensional array of widget instance settings.
 */
public function get_settings() {

	$settings = get_option( $this->option_name );

	if ( false === $settings ) {
		if ( isset( $this->alt_option_name ) ) {
			$settings = get_option( $this->alt_option_name );
		} else {
			// Save an option so it can be autoloaded next time.
			$this->save_settings( array() );
		}
	}

	if ( ! is_array( $settings ) && ! ( $settings instanceof ArrayObject || $settings instanceof ArrayIterator ) ) {
		$settings = array();
	}

	if ( ! empty( $settings ) && ! isset( $settings['_multiwidget'] ) ) {
		// Old format, convert if single widget.
		$settings = wp_convert_widget_settings( $this->id_base, $this->option_name, $settings );
	}

	unset( $settings['_multiwidget'], $settings['__i__'] );
	return $settings;
}

Thinking through my short list of code review questions, and with the Refactoring book top-of-mind, I examined this method with a somewhat puzzled look. Take a look at the above again and see which questions come to mind for you – I’ll wait.

Ready?

For me, there were a few things here that caught my eye:

  1. We (of which I mean the WordPress community, because anyone can contribute to this project) mix terminology in here, “settings” and “options”.
  2. The block of code that checks if settings are false takes a bit of mental overhead to understand what it’s doing, in part because it’s checking for an alternate option (née “setting”) name for the get operation, and possibly saving the setting (née option) if it doesn’t exist, which is completely outside the responsibility of this function.
  3. The $settings variable potentially gets reassigned at least three times, making it harder to hold in my mind what values it might contain at any given step in the runtime.
  4. There are some complex conditional statements that might be more easily described with a method name.
  5. There is removal of some array keys every time, even if it may not be necessary, and it’s not entirely clear why they’re being removed.
  6. The single return point at the end of the method adds to the mental overhead of knowing when particular conditions are triggered.
  7. There are comments denoting information about what the code is doing that might be better served with a function that says what it does.

There are likely some other cases here to examine, but even six observations within a single 24-line block of code is quite a bit. This single method is doing quite a bit of work, even if its main job is to simply get the settings for a widget. In plain language, it’s asking the following questions:

  1. Have I saved options using my primary name or secondary name?
  2. Do I need to save options because they haven’t already been initialized?
  3. Are my options saved in a format I expect?
  4. Do I need to convert my options?

These are clear indicators that a function is likely doing too much. Additionally, because there are some questions about why some parts of the code exist, it means changing it to become easier to read has the potential to be fragile, because there’s some context here that maybe only the original programmer who wrote this function understands. In Refactoring, Fowler indicates that his first step is to write tests for code if the tests don’t already exist, and then running the tests in each step. This post won’t go into those details, in particular because of the complexity of getting unit tests up and running with WordPress, but I do want to cover the above questions and discuss suggestions for how this might get refactored to improve readability and reduce the number of questions about a function.

Let’s start with the last half of the function:

if ( ! is_array( $settings ) && ! ( $settings instanceof ArrayObject || $settings instanceof ArrayIterator ) ) {
	$settings = array();
}

if ( ! empty( $settings ) && ! isset( $settings['_multiwidget'] ) ) {
	// Old format, convert if single widget.
	$settings = wp_convert_widget_settings( $this->id_base, $this->option_name, $settings );
}

unset( $settings['_multiwidget'], $settings['__i__'] );
return $settings;

The first condition is sticking out to me, because it sets the $settings variable to an empty array in the cases where $settings is not already an array and it’s neither an instance of the ArrayObject or ArrayIterator classes. Basically, if it can’t be looped over, we want $settings to be an array here.

If that case is actually true, and $settings becomes an empty array, none of the rest of the function applies: since it’s empty, it won’t get converted, and there are no keys to unset. So why not simply return? We’re done, after all.

/**
 * Retrieves the settings for all instances of the widget class.
 *
 * @since 2.8.0
 *
 * @return array Multi-dimensional array of widget instance settings.
 */
public function get_settings() {

	$settings = get_option( $this->option_name );

	if ( false === $settings ) {
		if ( isset( $this->alt_option_name ) ) {
			$settings = get_option( $this->alt_option_name );
		} else {
			// Save an option so it can be autoloaded next time.
			$this->save_settings( array() );
		}
	}

	if ( ! is_array( $settings ) && ! ( $settings instanceof ArrayObject || $settings instanceof ArrayIterator ) ) {
		return array();
	}

	if ( ! empty( $settings ) && ! isset( $settings['_multiwidget'] ) ) {
		// Old format, convert if single widget.
		$settings = wp_convert_widget_settings( $this->id_base, $this->option_name, $settings );
	}

	unset( $settings['_multiwidget'], $settings['__i__'] );
	return $settings;
}

That is a subtle change, but it already significantly improves things in a couple of ways:

  1. $settings is not reassigned again, so we don’t have to think about what it holds anymore.
  2. Because we return early, we know that the rest of the function does not run in that case, so we can put it out of mind.

What if we took that a step further, then, and helped provide some context for the why we would return early in that case? Our complex condition is checking that we have something that we can loop over, so perhaps we could say that in a new method?

private function is_not_iterable( $settings ) {
    return ! is_array( $settings ) && ! ( $settings instanceof ArrayObject || $settings instanceof ArrayIterator );
}

Here, we make a new function that says what we’re actually checking, instead of leaving future readers of the code left to understand what we mean by the complex condition. Perhaps they’ve never used an ArrayObject or ArrayIterator class before, but they know what iteration is. So, if they’re reading get_settings to understand how it works, they might not need to look into the is_not_iterable function to understand what’s happening in get_settings. This is a clear separation of concerns, and leaves us with smaller methods to help make our functions a little more readable and with fewer responsibilities.

This does give us two methods. We’ll probably have more by the end of this exercise:

/**
 * Retrieves the settings for all instances of the widget class.
 *
 * @since 2.8.0
 *
 * @return array Multi-dimensional array of widget instance settings.
 */
public function get_settings() {

	$settings = get_option( $this->option_name );

	if ( false === $settings ) {
		if ( isset( $this->alt_option_name ) ) {
			$settings = get_option( $this->alt_option_name );
		} else {
			// Save an option so it can be autoloaded next time.
			$this->save_settings( array() );
		}
	}

	if ( $this->is_not_iterable( $settings ) ) {
		return array();
	}

	if ( ! empty( $settings ) && ! isset( $settings['_multiwidget'] ) ) {
		// Old format, convert if single widget.
		$settings = wp_convert_widget_settings( $this->id_base, $this->option_name, $settings );
	}

	unset( $settings['_multiwidget'], $settings['__i__'] );
	return $settings;
}	

private function is_not_iterable( $settings ) {
   	return ! is_array( $settings ) && ! ( $settings instanceof ArrayObject || $settings instanceof ArrayIterator );
}

Of course, it’s not a good practice to create methods that indicate they’re not something. That’s what the logical not operator is for. But to keep things working that way, we need to invert the logic and rename our method. Thus:

private function is_not_iterable( $settings ) {
    return ! is_array( $settings ) && ! ( $settings instanceof ArrayObject || $settings instanceof ArrayIterator );
}

Becomes:

private function is_iterable( $settings ) {
    	return is_array( $settings ) || $settings instanceof ArrayObject || $settings instanceof ArrayIterator;
}

Because arrays, ArrayObjects, and ArrayIterators can be looped over. Then we update our caller in get_settings:

if ( ! $this->is_iterable( $settings ) ) {
	return array();
}

This is already the smallest of changes, but now we’ve clarified some of the questions we had around variable reassignment and complex conditions, so hopefully get_settings is already easier to read than before. Next, I’m finding myself interested in the complex conditional block at the start of the method:

$settings = get_option( $this->option_name );

if ( false === $settings ) {
	if ( isset( $this->alt_option_name ) ) {
		$settings = get_option( $this->alt_option_name );
	} else {
		// Save an option so it can be autoloaded next time.
		$this->save_settings( array() );
	}
}

The main thing this block of code is trying to do is to get settings that are saved from the database. It’s potentially using two separate queries via get_option (checking alt_option_name if the first query failed), and it’s also saving the settings if both queries failed. The responsibility of the get_settings method is to retrieve them, so already save_settings feels out-of-place. But, of course, this is a long-standing method in an open-source application so it’s also possible it’s going to have to stay here. That said, I think there are opportunities to clean things up and clarify the responsibilities a little bit, and also potentially remove the duplicate variable assignment. Here’s one idea:

$settings = get_option( $this->option_name, get_option( $this->alt_option_name ) );

if ( false === $settings ) {
	// Save an option so it can be autoloaded next time.
	$this->save_settings( array() );
}

The get_option function in WordPress accepts a second parameter which is the default value if the value cannot be returned. $alt_option_name is null by default on the WP_Widget class, so we can in fact simply try to call get_option on the $alt_option_name property if the first call fails. get_option itself doesn’t actually make a database query if the passed in option is empty, so the performance implications here are small. The benefits here are that we avoid reassigning $settings if it was false the first time, and we get to remove the nested conditional check inside our first conditional. In theory, this is easier to read. Here’s our method now:

/**
 * Retrieves the settings for all instances of the widget class.
 *
 * @since 2.8.0
 *
 * @return array Multi-dimensional array of widget instance settings.
 */
public function get_settings() {

	$settings = get_option( $this->option_name, get_option( $this->alt_option_name ) );

	if ( false === $settings ) {
		// Save an option so it can be autoloaded next time.
		$this->save_settings( array() );
	}

	if ( ! $this->is_iterable( $settings ) ) {
		return array();
	}

	if ( ! empty( $settings ) && ! isset( $settings['_multiwidget'] ) ) {
		// Old format, convert if single widget.
		$settings = wp_convert_widget_settings( $this->id_base, $this->option_name, $settings );
	}

	unset( $settings['_multiwidget'], $settings['__i__'] );
	return $settings;
}

The next thing that sticks out to me is our two comparisons right after we assign settings. We know now that if we don’t have settings saved in the database that get_option will return false. We also know that false is not iterable. Even though I’d like to get rid of the call to save_settings altogether right now, it’s probably simply best to keep it for the moment, but we can slide that statement into the is_iterable check:

/**
 * Retrieves the settings for all instances of the widget class.
 *
 * @since 2.8.0
 *
 * @return array Multi-dimensional array of widget instance settings.
 */
public function get_settings() {

	$settings = get_option( $this->option_name, get_option( $this->alt_option_name ) );

	if ( ! $this->is_iterable( $settings ) ) {
		// Save an option so it can be autoloaded next time.
		$this->save_settings( array() );
		return array();
	}

	if ( ! empty( $settings ) && ! isset( $settings['_multiwidget'] ) ) {
		// Old format, convert if single widget.
		$settings = wp_convert_widget_settings( $this->id_base, $this->option_name, $settings );
	}

	unset( $settings['_multiwidget'], $settings['__i__'] );
	return $settings;
}	

Now that we’re down to two conditions, I’m noticing that the second condition performs the conversion if $settings is not empty. The rest of the function, in fact, performs logic with the assumption that settings is not empty, so it seems like something we should include in our early return. For now, let’s slide that condition up to the early return. We have a side-effect of saving an empty setting every time, even if it’s empty, but it’s worth the change to make the rest of the method readable for now:

if ( ! $this->is_iterable( $settings ) || empty( $settings ) ) {
	// Save an option so it can be autoloaded next time.
	$this->save_settings( array() );
	return array();
}

if ( ! isset( $settings['_multiwidget'] ) ) {
	// Old format, convert if single widget.
	$settings = wp_convert_widget_settings( $this->id_base, $this->option_name, $settings );
}

Now we know we only convert the widget settings if the _multiwidget index is not set.

This is now the state of the refactor:

/**
 * Retrieves the settings for all instances of the widget class.
 *
 * @since 2.8.0
 *
 * @return array Multi-dimensional array of widget instance settings.
 */
public function get_settings() {

	$settings = get_option( $this->option_name, get_option( $this->alt_option_name ) );

	if ( ! $this->is_iterable( $settings ) || empty( $settings ) ) {
		// Save an option so it can be autoloaded next time.
		$this->save_settings( array() );
		return array();
	}

	if ( ! isset( $settings['_multiwidget'] ) ) {
		// Old format, convert if single widget.
		$settings = wp_convert_widget_settings( $this->id_base, $this->option_name, $settings );
	}

	unset( $settings['_multiwidget'], $settings['__i__'] );
	return $settings;
}	

private function is_iterable( $settings ) {
    	return is_array( $settings ) || $settings instanceof ArrayObject || $settings instanceof ArrayIterator;
}

Clearly, there are still unanswered questions and additional opportunities to continue to refine this, but I think the code changes result in some general improvements:

  1. We’re down to only one reassignment of $settings.
  2. We return an empty array in cases where widget settings could not be located or when they were already empty.
  3. Failing cases return early, avoiding any additional processing.
  4. We separated complex conditional logic into a named method that (hopefully) more clearly defines the purpose of the conditional check.

Of course, it’s possible this change raises questions, as well, and it’s possible the logic may have some performance considerations that would require additional refactoring:

  1. Does it seem weird to call get_option as the default to the first get_option call? If not, perhaps this could be extracted into another private method with simple if conditions, returning settings after the first call if it’s not false, then returning false if $alt_option_name is not set, then returning the call to get_option on the $alt_option array property.
  2. I don’t love that we’re calling save_settings on an empty array, but I also don’t love that save_settings is inside the get_settings method to begin with.
  3. It’s still not clear why _multiwidget and __i__ indices on the $settings array need to get unset here. That parsing, again, seems outside the scope of actually getting the widget settings, but clearly it’s an important part of the process of getting the settings for a widget. But, because that behavior isn’t described and it doesn’t have anything to do with getting the settings, refactoring it to be done elsewhere seems like a good idea.
  4. Clearly, I would need to add documentation to the new is_iterable method, and any other methods that get extracted from logic elsewhere. That’s not the point of this exercise, but feel it’s important to call that out. :)

Summary

First, I’d like to encourage you to check out the Refactoring book if you haven’t read it before. This sample exercise on a single method within WordPress core hopefully demonstrates how going through the process of altering code with the goal of improving readability is an important and viable part of the development process. This post just touches on a couple of approaches that Fowler describes in his book, and it’s a worthwhile read for anyone who wants to improve the quality of their code.

Second, I hope this post helps you think about some new questions you may not have considered when writing your own code or reviewing the code of your friends and colleagues. Structuring code into a working state can often be a big enough hurdle, but going the extra mile to make it so that it’s easily understood by you and others today and down the road pays huge dividends in terms of maintainability.