Refactoring the Core Logic

We'll refactor the core logic of multi-git into our new and shiny directory structure in this lesson.

In this lesson, we will not make changes to the functionality, instead, we’ll just move files around. This is an important best practice. You should always separate structural, formatting, and style changes from functional changes. We will do some planning and then refactor most of the logic into a repo_manager package.

Planning the refactoring

Here is the plan. We will extract the code that performs the interaction with git and manage the repos into a separate package called repo_manager. The repo_manager package will also contain unit tests to make sure it really works. The original multi-git didn’t have any tests. In addition, we will create a helpers package for general-purpose operations like creating directories and adding files.

The main() function will remain in the root directory for now. In the next lesson we will move it into its own directory under the cmd top-level directory.

The reason for these choices is to clearly identify what the core logic is (dealing with multiple repos and executing commands on all repos) and what the command-line interface used to access the core logic is.

Separating them in two steps, first the core logic and later the command-line interface, will eliminate subtle issues of interactions between these two refactorings.

This step is not really necessary for a small program like multi-git, but if you ever try to break a large real-world ball of spaghetti monolith, you will need all the help you can get. Following a methodical approach of refactoring one well-defined aspect at a time can save a lot of time, mistakes, and confusion.

Moving files around

Let’s get to the business of actually refactoring the code. As you recall in version 0.1 of multi-git, the entire program consisted of a single file called main.go.

The repo_manager package

The core logic is all about running a git command against a list of git repositories.

for _, r := range repos {
		// Go to the repo's directory
		os.Chdir(r);

		// Print the command
		fmt.Printf("[%s] %s\n", r, command_string)

		// Execute the command
		out, err := exec.Command("git", git_components...).CombinedOutput()

		// Print the result
		fmt.Println(string(out))

		// Bail out if there was an error and NOT ignoring errors
		if err != nil && !*ignoreErros {
			os.Exit(1)
		}
	}

Now, we’ll create a package called repo_manager to handle this task. The main file is called repo_manager.go. Here are the package declaration and the imports we need

package repo_manager

import (
    "errors"
    "fmt"
    "os"
    "os/exec"
    "strings"
)

Next, let’s define a RepoManager struct to store the list of directories that contain the git repositories and a Boolean flag that determines if errors should be ignored.

type RepoManager struct {
    repos        []string
    ignoreErrors bool
}

The choice of a struct signals that we’re going with an object-oriented approach. An alternative approach, which is also valid in this case, is a functional approach. I like this object-oriented approach here because I anticipate running multiple commands against the same set of git repositories. Therefore, it’s convenient to have an object (struct) that is initialized once with the list of directories and the error handling strategy.

NewRepoManager()

Once we have our struct we can create a constructor to instantiate new instances of the RepoManager. It’s a function called NewRepoManager(), which follows Go naming conventions.

The NewRepoManager() function accepts a base directory, a list of repo names, and the ignoreErrors Boolean flag. It returns a pointer to a RepoManager object and an error. The first part of the function does various sanity checks, ensuring that the base dir exists, and the repo list is not empty. It returns an error if anything is wrong. Note the named return values.

func NewRepoManager(baseDir string, 
                    repoNames []string, 
                    ignoreErrors bool) (repoManager *RepoManager, err error) {
    _, err = os.Stat(baseDir)
    if err != nil {
        if os.IsNotExist(err) {
            err = errors.New(fmt.Sprintf("base dir: '%s' doesn't exist", baseDir))
        }
        return
    }

    if baseDir[len(baseDir)-1] != '/' {
        baseDir += "/"
    }

    if len(repoNames) == 0 {
        err = errors.New("repo list can't be empty")
        return
    }

Once it verifies the input, it can create an instance of the RepoManager struct with the ignoredErrors Boolean flag:

    repoManager = &RepoManager{
        ignoreErrors: ignoreErrors,
    }

Then get to work constructing the fully qualified directory names from the base directory and the repository names.

    for _, r := range repoNames {
        path := baseDir + r
        repoManager.repos = append(repoManager.repos, path)
    }

    return

Since the function signature has named the return values repoManager and err, the return statement is naked. The values of repoManager and err are returned from the function.

The RepoManager struct has two methos: GetRepos() and Exec()

GetRepos()

GetRepos() just returns the list of repos managed by the repo manager. Nothing to write home about…

func (m *RepoManager) GetRepos() []string {
    return m.repos
}

Exec()

The Exec() method is much more interesting because it accepts the git command to execute as a string, executes it in each of the managed repos, and then returns either the output from each repository or an error.

The first part is focused on parsing the git command into its components. During the refactoring, I discovered a bug in multi-git v0.1 where the parsing logic failed for multi-word components (e.g. `git commit -m “a comment with more than one word”). Now, this code correctly parses multi-word components that are surrounded by double-quotes.

func (m *RepoManager) Exec(cmd string) (output map[string]string, err error) {
    output = map[string]string{}
    var components []string
    var multiWord []string
    for _, component := range strings.Split(cmd, " ") {
        if strings.HasPrefix(component, "\"") {
            multiWord = append(multiWord, component[1:])
            continue
        }

        if len(multiWord) > 0 {
            if !strings.HasSuffix(component, "\"") {
                multiWord = append(multiWord, component)
                continue
            }

            multiWord = append(multiWord, component[:len(component)-1])
            component = strings.Join(multiWord, " ")
            multiWord = []string{}
        }

        components = append(components, component)
    }

Before executing commands in various directories the code ensures that the original working directory is preserved. The defer statement executes its code after the method it is declared in finishes. Note that the arguments to defer, wd in this case, are evaluated at the time of declaration. If wd changes later it will not affect the execution of the deferred code at the end.

    // Restore working directory after executing the command
    wd, _ := os.Getwd()
    defer os.Chdir(wd)

With all the preliminaries out of the way, the Exec() method now iterates over each repository and runs the git command in its directory. If an error occurred and the ignoreErrors flag is false, it returns immediately without continuing to other repositories.

    var out []byte
    for _, r := range m.repos {
        // Go to the repo's directory
        os.Chdir(r)

        // Execute the command
        out, err = exec.Command("git", components...).CombinedOutput()
        // Store the result
        output[r] = string(out)

        // Bail out if there was an error and NOT ignoring errors
        if err != nil && !m.ignoreErrors {
            return
        }
    }
    return
}

In the application below, you can browse around and explore how the directories and files are organized. Press the Run button to run the refactored multi-git using the repo_manager package.

Get hands-on with 1200+ tech skills courses.