Back

Performing a code review

Earlier today a user with the handle /aliezsid made a post on Reddit asking for a code review. I needed something to do for my cofee break…

by Percy Bolmér, December 30, 2020

go code reviews

Earlier today a user with the handle /aliezsid made a question on Reddit asking for a code review. I needed something to do for my cofee break, so I thought why not.

Aliezsid is asking for a code review for his new project commitlog. I visited his GitHub to see what t project was about, seems its a tool to generate change logs based on commit history for a repository on GitHub. I really like his idea, so I decided to do a small review.

You can read about his projects here.

Reconnaissance

The first thing I do when performing code reviews are always checking out the documentation. I need to gather a understanding of the library and what the idea behind it is.

The README.md specifies that this library is under heavy development, and it states that there is no tests so to be considered unstable. The second thing I usually do during a code review is checkout the tests to see usage examples. Tests are not only a great thing to have to make sure things don’t break, but I find them to be useful when understanding a libraries usage as well.

But as Aliezsid himself has written, aliezsid knows there are no tests at the moment, so leave that be.

Code analysis

The next thing I do is to grab the code.

mkdir codereview-gitcommit
cd codereview-gitcommit
git init
git pull https://github.com/barelyhuman/commitlog

Lets open up main.go. I am going to go see what is happening in the program and walk it through step by step.

path := os.Args[1]
r, err := git.PlainOpen(path)
if err != nil {
		log.Fatal("Error opening Repository: ", err)
}

The first thing we see is that aliezsid expects an Input argument to the program. Aliezsid will then use go-git to open that GitHub repository. If we check what PlainOpen does we will see the following.

// PlainOpen opens a git repository from the given path. It detects if the
// repository is bare or a normal one. If the path doesn't contain a valid
// repository ErrRepositoryNotExists is returned
func PlainOpen(path string) (*Repository, error) {
	return PlainOpenWithOptions(path, &PlainOpenOptions{})
}

So, the program expects an PATH to a GitHub repository, not a URL. That’s good to know, so lets modify this a bit, to make it more clear.

Lets use the golang Flag package to help us.

The reason why I want to use flags is that it will help us with generating an user error, it is also more scalable.

var path string
// Read user input
flag.StringVar(&path, "path", "", "A filepath to a folder containing a github repository")
// Parse Flags
flag.Parse()

// Make sure user has inserted the needed flags
if path == "" {
	flag.Usage()
	os.Exit(0)
}

Now its easier for users to get a grasp of how to use it, when executing the program we will now see

Error output when using gitcommit wrong.

Lets move on. What happens next in the code is that aliezsid opens up the git repository, grabs the git header reference and extracts the commit history.

This is pretty unclear since there is no comments.

	r, err := git.PlainOpen(path)
	if err != nil {
		log.Fatal("Error opening Repository: ", err)
	}
	ref, err := r.Head()

	if err != nil {
		log.Fatal("Unable to get repository HEAD:", err)
	} 
	cIter, err := r.Log(&git.LogOptions{From: ref.Hash()})

	if err != nil {
		log.Fatal("Unable to get repository commits:", err)
	}

Now this code is isolated to the main package and pretty unclear whats going on. I’d like to break this apart some. I’m going to create a new Struct called Repository which will hold these functions. This will make testing a lot easier, and code cleaner and more scalable.


// Repository is a struct that holds an Opened github repository and the reference
type Repository struct {
	path string
	repo *git.Repository
	ref  *plumbing.Reference
}

// Open will open up a git repository
// and load the Head reference
func Open(path string) (*Repository, error) {
	// Open the github repository
	r, err := git.PlainOpen(path)
	if err != nil {
		return nil, fmt.Errorf("%v:%w", "Error opening Repository: ", err)
	}

	// Grab the Git HEAD reference
	ref, err := r.Head()
	if err != nil {
		return nil, fmt.Errorf("%v:%w", "Unable to get repository HEAD:", err)
	}
	return &Repository{
		path: path,
		repo: r,
		ref:  ref,
	}, nil
}

// GetCommits will extract commit history from the git repository
func (r *Repository) GetCommits() ([]*object.Commit, error) {
	// Get the Commit history
	cIter, err := r.repo.Log(&git.LogOptions{From: r.ref.Hash()})

	if err != nil {
		return nil, fmt.Errorf("%v:%w", "Unable to get repository commits:", err)
	}

	var commits []*object.Commit

	err = cIter.ForEach(func(c *object.Commit) error {
		commits = append(commits, c)
		return nil
	})

	if err != nil {
		return nil, fmt.Errorf("%v:%w", "Error getting commits :", err)
	}
	return commits, nil
}

The next thing aliezsid does is to extract the Latest tags from the repository. This is done by extracting the latestTag with the utils package.

	logContainer := new(logcategory.LogsByCategory)
	latestTag, _, err := utils.GetLatestTagFromRepository(repo.repo)

	if err != nil {
		log.Fatal("Error Getting Tag Pairs", err)
	}

	tillLatest := false

	if latestTag != nil {
		if latestTag.Hash().String() == repo.ref.Hash().String() {
			tillLatest = false
		} else {
			tillLatest = true
		}
	}

Actually, we can remove the utils package now and insert this into a function in the Repository struct. Since utils only holds this one function, I feel its more appropriate to insert the logic into our Repository struct.

// GetLatestTag is used to check the latestTag
// it will return a reference to the LatestTag and the PreviousTag
func (r *Repository) GetLatestTag() (*plumbing.Reference, *plumbing.Reference, error) {
	tagRefs, err := r.repo.Tags()
	if err != nil {
		return nil, nil, err
	}

	var latestTagCommit *object.Commit
	var latestTagName *plumbing.Reference
	var previousTag *plumbing.Reference
	var previousTagReturn *plumbing.Reference

	err = tagRefs.ForEach(func(tagRef *plumbing.Reference) error {
		revision := plumbing.Revision(tagRef.Name().String())

		tagCommitHash, err := r.repo.ResolveRevision(revision)
		if err != nil {
			return err
		}

		commit, err := r.repo.CommitObject(*tagCommitHash)
		if err != nil {
			return err
		}

		if latestTagCommit == nil {
			latestTagCommit = commit
			latestTagName = tagRef
			previousTagReturn = previousTag
		}

		if commit.Committer.When.After(latestTagCommit.Committer.When) {
			latestTagCommit = commit
			latestTagName = tagRef
			previousTagReturn = previousTag
		}

		previousTag = tagRef

		return nil
	})
	if err != nil {
		return nil, nil, err
	}

	return latestTagName, previousTagReturn, nil
}

Next aliezsid checks if the latestTag is the same as the repository HEAD.

tillLatest := false

if latestTag != nil {
	if latestTag.Hash().String() == repo.ref.Hash().String() {
		tillLatest = false
	} else {
		tillLatest = true
	}
}

I’d also like to remove the tillLatest check and instead make that it’s own function. This is so that its easier to test and also easier to reuse in case the package grows. Aliezsid also reuses the exact same code in isCommitToNearest function later at line 116, every time we use code more than once, break it out into its own function to avoid duplicate code .

We need to see what’s wanted in both use cases and make sure we can use our new function in both cases.

isCommitToNearestTag also contains a small bug, It double checks an error and uses log.Fatal. The second error check is actually never accessible, since the first error check would cancel since its using log.Fatal. It seems the tillLatest boolean flag is used to force a check against the latestTag.Hash(). So the code will actually perform this check once, only to force it to be performed again later. This is something we can remove so its only performed once, not that we have a optimization problem, but no reason to rerun code.

The function looks like this

func isCommitToNearestTag(repo *git.Repository, commit *object.Commit, tillLatest bool) bool {
	latestTag, previousTag, err := utils.GetLatestTagFromRepository(repo)

	if err != nil {
		log.Fatal("Couldn't get latest tag...", err)
	}
	if err != nil {
		log.Fatal("Couldn't access tag...", err)
	}

	if latestTag != nil {
		if tillLatest {
			return latestTag.Hash().String() == commit.Hash.String()
		}
		return previousTag.Hash().String() == commit.Hash.String()

	}
	return false
}

I’m going to remove the code that does the first tillLatest check completely, it actually serves no purpose (Line 58–79). And when removing that, we will notice that we can remove the fetching of the latest tag altogether. Lets save that for later.

Our new function is going to be called IsCommitLatest and it will accept a pointer to object.Commit.

// IsCommitNearest will check if a commit tag Hash is equal to the current repository HEAD tag
// If the Hashes matches, it will return true
func (r *Repository) IsCommitNearest(commit *object.Commit) (bool, error) {
	latestTag, previousTag, err := r.GetLatestTag()

	if err != nil {
		return false, fmt.Errorf("%v:%w", "Couldn't get latest tag...", err)
	}

	if latestTag != nil {
		// Compare latest tag Hash with the repository HEAD hash
		// Hash() returns a Slice which can be compared without converting to string
		// Noticed by /OfficialTomCruise on reddit comments
		if latestTag.Hash() == r.ref.Hash() {
			return true, nil
		}
		return previousTag.Hash() == commit.Hash, nil

	}
	return false, nil
}

IsCommitNearest can now be used inside the Switch without running twice.

After this is a final iteration that ranges through all the commits and appends them into a log. The logs are handeld by the logcategory package, so lets digest it to learn what it does.

Its also a very small package like utils was. Actually, its one struct with one function attached.

package logcategory

// LogsByCategory - Type to hold logs by each's category
type LogsByCategory struct {
	CI       []string
	FIX      []string
	REFACTOR []string
	FEATURE  []string
	DOCS     []string
	OTHER    []string
}

// GenerateMarkdown - Generate markdown output for the collected commits
func (logContainer *LogsByCategory) GenerateMarkdown() string {
	markDownString := ""

	markDownString += "# Changelog  \n"

	if len(logContainer.CI) > 0 {
		markDownString += "\n\n## CI Changes  \n"

		for _, item := range logContainer.CI {
			markDownString += item + "\n"
		}
	}

	if len(logContainer.FIX) > 0 {
		markDownString += "\n\n## Fixes  \n"
		for _, item := range logContainer.FIX {
			markDownString += item + "\n"
		}
	}

	if len(logContainer.REFACTOR) > 0 {
		markDownString += "\n\n## Performance Fixes  \n"

		for _, item := range logContainer.REFACTOR {
			markDownString += item + "\n"
		}
	}

	if len(logContainer.FEATURE) > 0 {

		markDownString += "\n\n## Feature Fixes  \n"
		for _, item := range logContainer.FEATURE {
			markDownString += item + "\n"
		}
	}

	if len(logContainer.DOCS) > 0 {

		markDownString += "\n\n## Doc Updates  \n"
		for _, item := range logContainer.DOCS {
			markDownString += item + "\n"
		}
	}

	if len(logContainer.OTHER) > 0 {

		markDownString += "\n\n## Other Changes  \n"
		for _, item := range logContainer.OTHER {
			markDownString += item + "\n"
		}
	}

	return markDownString
}

The GenerateMarkDown function looks through the length of all its slices and creates strings based on them, along with a small header. It uses regular string concatenation (oldString += “hello ” + “world ”). It works in this case because the strings are pretty simple, but I’d actually recommend using a String.Builder. You can read about efficient strings.

It also repeats the same thing over and over for each Slice. And each slice in the struct is a []string. So lets instead create a function that does what we need.

// printCategory will output all items inside a Log slice and a title
func printCategory(output *strings.Builder, title string, logs []string) {
	if len(logs) > 0 {
		output.WriteString(fmt.Sprintf("\n\n## %s  \n", title))
		for _, item := range logs {
			output.WriteString(item + "\n")
		}
	}
}

And now the GenerateMarkdown would instead look like this

// GenerateMarkdown - Generate markdown output for the collected commits
func (logContainer *LogsByCategory) GenerateMarkdown() string {
	var output strings.Builder
	output.WriteString("# Changelog \n")

	printCategory(&output, "CI Changes", logContainer.CI)
	printCategory(&output, "Fixes", logContainer.FIX)
	printCategory(&output, "Performance Fixes", logContainer.REFACTOR)
	printCategory(&output, "Feature Fixes", logContainer.FEATURE)
	printCategory(&output, "Doc Updates", logContainer.DOCS)
	printCategory(&output, "Other Changes", logContainer.OTHER)

	return output.String()
}

Also, I’d rather have that in the main package, and remove the extra package of logcategory. So I’m putting all logcategory related logic into a new file in the main package called logcategories.go.

I also recommend keeping the main function clean and easy to view, so lets also create a file called repository.go.

All code related to the Repository struct goes in there.

So for the last part of the code review, the commit iteration mentioned before. It goes through all commits and appends those messages to the appropriate Slice in the logcategory.

You know, since I’d recommend keeping main clean. This is actually something that I’d move into the logcategories file. Also I want to remove it from the main so It can be tested easier.

I find it hard to test things residing in the main.

I recommend always breaking things out so its easier accessible in tests.

I’ve created a new function for LogsByCategory struct which will handle the logic for us.

// AddCommitLog will take a commitHash and a commitMessage and append them to their
// apropriate Slice
func (logContainer *LogsByCategory) AddCommitLog(commitHash, commitMessage string) {
	message := fmt.Sprintf("%s - %s", commitHash, normalizeCommit(commitMessage))

	switch {
	case strings.Contains(commitMessage, "ci:"):
		logContainer.CI = append(logContainer.CI, message)
	case strings.Contains(commitMessage, "fix:"):
		logContainer.FIX = append(logContainer.FIX, message)
	case strings.Contains(commitMessage, "refactor:"):
		logContainer.REFACTOR = append(logContainer.REFACTOR, message)
	// Golang Switch allows multiple values in cases with , separation
	case strings.Contains(commitMessage, "feat:"), strings.Contains(commitMessage, "feature:"):
		logContainer.FEATURE = append(logContainer.FEATURE, message)
	case strings.Contains(commitMessage, "docs:"):
		logContainer.DOCS = append(logContainer.DOCS, message)

	default:
		logContainer.OTHER = append(logContainer.OTHER, message)
	}
}

This is what our final main looks like.

package main

import (
	"flag"
	"fmt"
	"log"
	"os"
	"strings"
)

func main() {

	var path string
	// Read user input
	flag.StringVar(&path, "path", "", "A filepath to a folder containing a github repository")
	// Parse Flags
	flag.Parse()

	// Make sure user has inserted the needed flags
	if path == "" {
		flag.Usage()
		os.Exit(0)
	}

	repo, err := Open(path)
	if err != nil {
		log.Fatal(err)
	}

	commits, err := repo.GetCommits()
	if err != nil {
		log.Fatal(err)
	}

	logContainer := new(LogsByCategory)

	// we no longer need to fetch latestTag here to compare tillLatest.

	// itterate all commits and add them to the log based on hash and Message
	for _, c := range commits {

		logContainer.AddCommitLog(c.Hash.String(), c.Message)

		nearestTag, err := repo.IsCommitNearest(c)
		if err != nil {
			log.Fatal(err)
		}
		if nearestTag {
			break
		}
	}

	fmt.Println(logContainer.GenerateMarkdown())

}

func normalizeCommit(commitMessage string) string {
	var message string
	for i, msg := range strings.Split(commitMessage, "\n") {
		if i == 0 {
			message = msg
			break
		}
	}
	return strings.TrimSuffix(message, "\n")
}

Final thing, aliezsid mentions that aliezsid currently creates his own changelog in git actions. So lets modify .github/workflows/create-binaries.yml since we changed the way the program is executed. Remember we added the -path flag?

Line 30: ./commitlog . > CHANGELOG.txt
// Replaced with
.commitlog -path=. > CHANGELOG.txt

So! I’m done for now.

Hopefully you’ve enjoyed my small code review and my process of making it. PR request submitted.