Performing a code review
Earlier today a user with the handle /aliezsid made a post on Reddit asking for a code review. I needed something to do for my cofee break…
by Percy Bolmér, December 30, 2020
Earlier today a user with the handle /aliezsid made a question on Reddit asking for a code review. I needed something to do for my cofee break, so I thought why not.
Aliezsid is asking for a code review for his new project commitlog. I visited his GitHub to see what t project was about, seems its a tool to generate change logs based on commit history for a repository on GitHub. I really like his idea, so I decided to do a small review.
You can read about his projects here.
Reconnaissance
The first thing I do when performing code reviews are always checking out the documentation. I need to gather a understanding of the library and what the idea behind it is.
The README.md specifies that this library is under heavy development, and it states that there is no tests so to be considered unstable. The second thing I usually do during a code review is checkout the tests to see usage examples. Tests are not only a great thing to have to make sure things don’t break, but I find them to be useful when understanding a libraries usage as well.
But as Aliezsid himself has written, aliezsid knows there are no tests at the moment, so leave that be.
Code analysis
The next thing I do is to grab the code.
mkdir codereview-gitcommit
cd codereview-gitcommit
git init
git pull https://github.com/barelyhuman/commitlog
Lets open up main.go. I am going to go see what is happening in the program and walk it through step by step.
path := os.Args[1]
r, err := git.PlainOpen(path)
if err != nil {
log.Fatal("Error opening Repository: ", err)
}
The first thing we see is that aliezsid expects an Input argument to the program. Aliezsid will then use go-git to open that GitHub repository. If we check what PlainOpen does we will see the following.
// PlainOpen opens a git repository from the given path. It detects if the
// repository is bare or a normal one. If the path doesn't contain a valid
// repository ErrRepositoryNotExists is returned
func PlainOpen(path string) (*Repository, error) {
return PlainOpenWithOptions(path, &PlainOpenOptions{})
}
So, the program expects an PATH to a GitHub repository, not a URL. That’s good to know, so lets modify this a bit, to make it more clear.
Lets use the golang Flag package to help us.
The reason why I want to use flags is that it will help us with generating an user error, it is also more scalable.
var path string
// Read user input
flag.StringVar(&path, "path", "", "A filepath to a folder containing a github repository")
// Parse Flags
flag.Parse()
// Make sure user has inserted the needed flags
if path == "" {
flag.Usage()
os.Exit(0)
}
Now its easier for users to get a grasp of how to use it, when executing the program we will now see
Lets move on. What happens next in the code is that aliezsid opens up the git repository, grabs the git header reference and extracts the commit history.
This is pretty unclear since there is no comments.
r, err := git.PlainOpen(path)
if err != nil {
log.Fatal("Error opening Repository: ", err)
}
ref, err := r.Head()
if err != nil {
log.Fatal("Unable to get repository HEAD:", err)
}
cIter, err := r.Log(&git.LogOptions{From: ref.Hash()})
if err != nil {
log.Fatal("Unable to get repository commits:", err)
}
Now this code is isolated to the main package and pretty unclear whats going on. I’d like to break this apart some. I’m going to create a new Struct called Repository
which will hold these functions. This will make testing a lot easier, and code cleaner and more scalable.
// Repository is a struct that holds an Opened github repository and the reference
type Repository struct {
path string
repo *git.Repository
ref *plumbing.Reference
}
// Open will open up a git repository
// and load the Head reference
func Open(path string) (*Repository, error) {
// Open the github repository
r, err := git.PlainOpen(path)
if err != nil {
return nil, fmt.Errorf("%v:%w", "Error opening Repository: ", err)
}
// Grab the Git HEAD reference
ref, err := r.Head()
if err != nil {
return nil, fmt.Errorf("%v:%w", "Unable to get repository HEAD:", err)
}
return &Repository{
path: path,
repo: r,
ref: ref,
}, nil
}
// GetCommits will extract commit history from the git repository
func (r *Repository) GetCommits() ([]*object.Commit, error) {
// Get the Commit history
cIter, err := r.repo.Log(&git.LogOptions{From: r.ref.Hash()})
if err != nil {
return nil, fmt.Errorf("%v:%w", "Unable to get repository commits:", err)
}
var commits []*object.Commit
err = cIter.ForEach(func(c *object.Commit) error {
commits = append(commits, c)
return nil
})
if err != nil {
return nil, fmt.Errorf("%v:%w", "Error getting commits :", err)
}
return commits, nil
}
The next thing aliezsid does is to extract the Latest tags from the repository. This is done by extracting the latestTag with the utils
package.
logContainer := new(logcategory.LogsByCategory)
latestTag, _, err := utils.GetLatestTagFromRepository(repo.repo)
if err != nil {
log.Fatal("Error Getting Tag Pairs", err)
}
tillLatest := false
if latestTag != nil {
if latestTag.Hash().String() == repo.ref.Hash().String() {
tillLatest = false
} else {
tillLatest = true
}
}
Actually, we can remove the utils package now and insert this into a function in the Repository struct. Since utils only holds this one function, I feel its more appropriate to insert the logic into our Repository struct.
// GetLatestTag is used to check the latestTag
// it will return a reference to the LatestTag and the PreviousTag
func (r *Repository) GetLatestTag() (*plumbing.Reference, *plumbing.Reference, error) {
tagRefs, err := r.repo.Tags()
if err != nil {
return nil, nil, err
}
var latestTagCommit *object.Commit
var latestTagName *plumbing.Reference
var previousTag *plumbing.Reference
var previousTagReturn *plumbing.Reference
err = tagRefs.ForEach(func(tagRef *plumbing.Reference) error {
revision := plumbing.Revision(tagRef.Name().String())
tagCommitHash, err := r.repo.ResolveRevision(revision)
if err != nil {
return err
}
commit, err := r.repo.CommitObject(*tagCommitHash)
if err != nil {
return err
}
if latestTagCommit == nil {
latestTagCommit = commit
latestTagName = tagRef
previousTagReturn = previousTag
}
if commit.Committer.When.After(latestTagCommit.Committer.When) {
latestTagCommit = commit
latestTagName = tagRef
previousTagReturn = previousTag
}
previousTag = tagRef
return nil
})
if err != nil {
return nil, nil, err
}
return latestTagName, previousTagReturn, nil
}
Next aliezsid checks if the latestTag is the same as the repository HEAD.
tillLatest := false
if latestTag != nil {
if latestTag.Hash().String() == repo.ref.Hash().String() {
tillLatest = false
} else {
tillLatest = true
}
}
I’d also like to remove the tillLatest check and instead make that it’s own function. This is so that its easier to test and also easier to reuse in case the package grows. Aliezsid also reuses the exact same code in isCommitToNearest function later at line 116, every time we use code more than once, break it out into its own function to avoid duplicate code .
We need to see what’s wanted in both use cases and make sure we can use our new function in both cases.
isCommitToNearestTag
also contains a small bug, It double checks an error and uses log.Fatal
. The second error check is actually never accessible, since the first error check would cancel since its using log.Fatal
. It seems the tillLatest
boolean flag is used to force a check against the latestTag.Hash()
. So the code will actually perform this check once, only to force it to be performed again later. This is something we can remove so its only performed once, not that we have a optimization problem, but no reason to rerun code.
The function looks like this
func isCommitToNearestTag(repo *git.Repository, commit *object.Commit, tillLatest bool) bool {
latestTag, previousTag, err := utils.GetLatestTagFromRepository(repo)
if err != nil {
log.Fatal("Couldn't get latest tag...", err)
}
if err != nil {
log.Fatal("Couldn't access tag...", err)
}
if latestTag != nil {
if tillLatest {
return latestTag.Hash().String() == commit.Hash.String()
}
return previousTag.Hash().String() == commit.Hash.String()
}
return false
}
I’m going to remove the code that does the first tillLatest
check completely, it actually serves no purpose (Line 58–79). And when removing that, we will notice that we can remove the fetching of the latest tag altogether. Lets save that for later.
Our new function is going to be called IsCommitLatest
and it will accept a pointer to object.Commit.
// IsCommitNearest will check if a commit tag Hash is equal to the current repository HEAD tag
// If the Hashes matches, it will return true
func (r *Repository) IsCommitNearest(commit *object.Commit) (bool, error) {
latestTag, previousTag, err := r.GetLatestTag()
if err != nil {
return false, fmt.Errorf("%v:%w", "Couldn't get latest tag...", err)
}
if latestTag != nil {
// Compare latest tag Hash with the repository HEAD hash
// Hash() returns a Slice which can be compared without converting to string
// Noticed by /OfficialTomCruise on reddit comments
if latestTag.Hash() == r.ref.Hash() {
return true, nil
}
return previousTag.Hash() == commit.Hash, nil
}
return false, nil
}
IsCommitNearest
can now be used inside the Switch without running twice.
After this is a final iteration that ranges through all the commits and appends them into a log. The logs are handeld by the logcategory package, so lets digest it to learn what it does.
Its also a very small package like utils was. Actually, its one struct with one function attached.
package logcategory
// LogsByCategory - Type to hold logs by each's category
type LogsByCategory struct {
CI []string
FIX []string
REFACTOR []string
FEATURE []string
DOCS []string
OTHER []string
}
// GenerateMarkdown - Generate markdown output for the collected commits
func (logContainer *LogsByCategory) GenerateMarkdown() string {
markDownString := ""
markDownString += "# Changelog \n"
if len(logContainer.CI) > 0 {
markDownString += "\n\n## CI Changes \n"
for _, item := range logContainer.CI {
markDownString += item + "\n"
}
}
if len(logContainer.FIX) > 0 {
markDownString += "\n\n## Fixes \n"
for _, item := range logContainer.FIX {
markDownString += item + "\n"
}
}
if len(logContainer.REFACTOR) > 0 {
markDownString += "\n\n## Performance Fixes \n"
for _, item := range logContainer.REFACTOR {
markDownString += item + "\n"
}
}
if len(logContainer.FEATURE) > 0 {
markDownString += "\n\n## Feature Fixes \n"
for _, item := range logContainer.FEATURE {
markDownString += item + "\n"
}
}
if len(logContainer.DOCS) > 0 {
markDownString += "\n\n## Doc Updates \n"
for _, item := range logContainer.DOCS {
markDownString += item + "\n"
}
}
if len(logContainer.OTHER) > 0 {
markDownString += "\n\n## Other Changes \n"
for _, item := range logContainer.OTHER {
markDownString += item + "\n"
}
}
return markDownString
}
The GenerateMarkDown function looks through the length of all its slices and creates strings based on them, along with a small header. It uses regular string concatenation (oldString += “hello ” + “world ”). It works in this case because the strings are pretty simple, but I’d actually recommend using a String.Builder
. You can read about efficient strings.
It also repeats the same thing over and over for each Slice. And each slice in the struct is a []string
. So lets instead create a function that does what we need.
// printCategory will output all items inside a Log slice and a title
func printCategory(output *strings.Builder, title string, logs []string) {
if len(logs) > 0 {
output.WriteString(fmt.Sprintf("\n\n## %s \n", title))
for _, item := range logs {
output.WriteString(item + "\n")
}
}
}
And now the GenerateMarkdown would instead look like this
// GenerateMarkdown - Generate markdown output for the collected commits
func (logContainer *LogsByCategory) GenerateMarkdown() string {
var output strings.Builder
output.WriteString("# Changelog \n")
printCategory(&output, "CI Changes", logContainer.CI)
printCategory(&output, "Fixes", logContainer.FIX)
printCategory(&output, "Performance Fixes", logContainer.REFACTOR)
printCategory(&output, "Feature Fixes", logContainer.FEATURE)
printCategory(&output, "Doc Updates", logContainer.DOCS)
printCategory(&output, "Other Changes", logContainer.OTHER)
return output.String()
}
Also, I’d rather have that in the main package, and remove the extra package of logcategory
. So I’m putting all logcategory
related logic into a new file in the main package called logcategories.go.
I also recommend keeping the main function clean and easy to view, so lets also create a file called repository.go.
All code related to the Repository struct goes in there.
So for the last part of the code review, the commit iteration mentioned before. It goes through all commits and appends those messages to the appropriate Slice in the logcategory.
You know, since I’d recommend keeping main clean. This is actually something that I’d move into the logcategories file. Also I want to remove it from the main so It can be tested easier.
I find it hard to test things residing in the main.
I recommend always breaking things out so its easier accessible in tests.
I’ve created a new function for LogsByCategory
struct which will handle the logic for us.
// AddCommitLog will take a commitHash and a commitMessage and append them to their
// apropriate Slice
func (logContainer *LogsByCategory) AddCommitLog(commitHash, commitMessage string) {
message := fmt.Sprintf("%s - %s", commitHash, normalizeCommit(commitMessage))
switch {
case strings.Contains(commitMessage, "ci:"):
logContainer.CI = append(logContainer.CI, message)
case strings.Contains(commitMessage, "fix:"):
logContainer.FIX = append(logContainer.FIX, message)
case strings.Contains(commitMessage, "refactor:"):
logContainer.REFACTOR = append(logContainer.REFACTOR, message)
// Golang Switch allows multiple values in cases with , separation
case strings.Contains(commitMessage, "feat:"), strings.Contains(commitMessage, "feature:"):
logContainer.FEATURE = append(logContainer.FEATURE, message)
case strings.Contains(commitMessage, "docs:"):
logContainer.DOCS = append(logContainer.DOCS, message)
default:
logContainer.OTHER = append(logContainer.OTHER, message)
}
}
This is what our final main looks like.
package main
import (
"flag"
"fmt"
"log"
"os"
"strings"
)
func main() {
var path string
// Read user input
flag.StringVar(&path, "path", "", "A filepath to a folder containing a github repository")
// Parse Flags
flag.Parse()
// Make sure user has inserted the needed flags
if path == "" {
flag.Usage()
os.Exit(0)
}
repo, err := Open(path)
if err != nil {
log.Fatal(err)
}
commits, err := repo.GetCommits()
if err != nil {
log.Fatal(err)
}
logContainer := new(LogsByCategory)
// we no longer need to fetch latestTag here to compare tillLatest.
// itterate all commits and add them to the log based on hash and Message
for _, c := range commits {
logContainer.AddCommitLog(c.Hash.String(), c.Message)
nearestTag, err := repo.IsCommitNearest(c)
if err != nil {
log.Fatal(err)
}
if nearestTag {
break
}
}
fmt.Println(logContainer.GenerateMarkdown())
}
func normalizeCommit(commitMessage string) string {
var message string
for i, msg := range strings.Split(commitMessage, "\n") {
if i == 0 {
message = msg
break
}
}
return strings.TrimSuffix(message, "\n")
}
Final thing, aliezsid mentions that aliezsid currently creates his own changelog in git actions. So lets modify .github/workflows/create-binaries.yml
since we changed the way the program is executed. Remember we added the -path
flag?
Line 30: ./commitlog . > CHANGELOG.txt
// Replaced with
.commitlog -path=. > CHANGELOG.txt
So! I’m done for now.
Hopefully you’ve enjoyed my small code review and my process of making it. PR request submitted.
If you enjoyed my writing, please support future articles by buying me an Coffee