9  Branching and GithHub Team Workflow

Published

August 30, 2024

Keywords

Git, git branch, GitHub, clone, fork, pull request

9.1 Introduction

9.1.1 Learning Outcomes

  • Use Branching in Git to evolve your code without risking the Main branch.
    • Create new branches, update files, push to GitHub, and merge updates to the Main branch.
  • Apply both Fork and Pull and Shared Repository models for Team workflow.
  • Use Pull Requests Merge commands for coordinated development.
      • Resolve Merge Conflicts.
  • Select and apply a GitHub workflow model to a group project.

9.1.2 References:

9.1.2.1 Other References

9.2 Motivation for Branching

You have been using an individual workflow model for Git and GitHub for version control of your own work.

You have been building competency in …

  • Creating repositories with a single “Main” branch.
  • Editing the text and code in the files in the repo.
  • Executing the Save, Add, and Commit workflow to create a history.
  • Pushing committed changes to GitHub for cloud-based replication and sharing.

Every push has updated the Main branch baseline on GitHub, even if the new version is still in progress.

That has been fine so far as you are the only one using your code.

However, the individual workflow model can lead to challenges when others are using your code.

  • If you deploy a model for other people to use, every commit means overwriting the main branch, even if the version no longer works. That could make for unhappy users!
  • If you are part of a team, where your code has to work with other developers’ code, two people could update the same text or code or different parts of the same document. Without care, whichever version gets pushed last will overwrite the other persons work or create a “merge conflict” which keeps the baseline from updating.

To mitigate the risks of an individual workflow model, use a team workflow model.

Important

Git 2.23, released on 16 August 2019, added two experimental commands: git switch and git restore to augment and simplify the use of git checkout so older posts may refer to git checkout instead of what you see below.

Go to the terminal window and enter git --version to check which version is on your computer. If it is before 2.23, then update.

  • Go to git downloads to get the newest production version of Git.
  • Or, enter git clone https://github.com/git/git in the console pane to get the latest development version.

9.2.1 Solution Development and Deployment

Many data science solutions are deployed for continuous use by others. These include reporting models, optimization models, machine learning models, shiny applications, web sites, you name it.

If other people are using your deployed solution you want to make sure it always works. However, you usually also want to be able to update the solution as new requirements appear or, worse, portions of your code no longer work as packages change and functions get deprecated.

The software development community has a framework for managing the tension and risks between ensuring a continuously working version for users while also being able to update the version. No one likes to fix an airplane while it is in flight.

A simple form of the framework has a version (or baseline) of code in one of three stages: Development, Testing, or, Production or, in shorthand, “Dev-Test-Prod” as in Figure 9.1.

Figure 9.1: The three stages in a Dev-Test-Prod framework.
  • Development is the stage where the developers do all their updates to their section or module of code, be it fixes or new features.
    • They also do their own testing to make sure their code is working.
    • Once each developer has their code working, they promote it to the testing stage.
  • Testing is the stage where the testing team checks if the entire set of code works together.
    • They merge the working code from multiple developers into an integrated version.
    • They test if previous functionality still works (“regression testing”) and then test if the bug fixes or new features actually work.
    • Testing should occur in the exact same software/hardware environment that supports the production version.
    • Once the integrated version passes integrated testing, it is promoted to the production stage.
  • Production is the stage where the deployment team replaces the current release with the version promoted by the testing team.
    • The new baseline is staged in the production environment and deployed as the working version in a way to minimize disruption to the users.

There are many variations on this framework but the concept of separating the development baseline from the production baseline is a well-established “best practice.”

Git has additional capabilities to enable new development without disturbing the main branch baseline until the new version is ready.

9.3 Branching in Git and GitHub

Git has the capability to create and synchronize more branches than the Main branch.

  • Each branch is its own version of the code.

Branching allows users to make multiple changes to the code and test it prior to incorporating the new code into the Main branch baseline

Branching supports best practices for clear separation between Development, Testing, and Production versions of code.

  • Lowers risk of disrupting production code.
  • Allows parallel development of multiple features.

Branching leads naturally to supporting a workflow for Team projects in GitHub.

GitHub has two common workflow models for teams.

  • The Fork and Pull Model
  • The Shared Repository Model

Choosing a workflow model allows for consistent, efficient parallel development by teams while minimizing risk to the working Main code base (or its deployed version of production code).

Example Prep

Navigate in the Terminal Pane to the directory/repo for the life_exp_analysis.qmd file you created earlier in the course.

  • If you do not have a repo and file, create one as in Section 2.6 and create a new GitHub repo for it.
  • Render the .qmd file to html output.
  • Go to your terminal window and navigate so the working directory is set to that directory and repo.
  • We will use that repo for the rest of this section.

9.3.1 Creating a New Branch

Every repo has at least one Branch - the Main branch.

  • The Main is considered sacrosanct - it is the baseline or production version of code - it is known to work.

Branching is the Git functionality which allows you to add new branches (for additional analysis or code features) and then eventually merge them into your Main branch to create a new baseline version in the Main branch.

  • The repo is a collection of snapshots of your code that allow you to go back in time
  • Creating a branch creates a new, additional, set of snapshots that does not affect the Main branch snapshots.

Before you create a new branch:

  1. Ensure your Main branch is up to date with the remote repo Main branch using git pull.
    • You may have made changes to the Main from a different branch.
    • git pull runs git fetch followed by git merge.
    • If there are conflicts, you will have to resolve them and try again until you get a clean pull.
    • GitHub documentation and Google/Stack Overflow can be your friend.
  2. Check for existing branches with git branch -a. It may return something like:
    * main remotes/origin/main
  • The * main means the main branch is the current working branch.
  • remotes/origin.main means the remote repo on GitHub also has one branch and it is called Main as well.

Once you are ready to create a new branch.

  1. Create a new local branch with git branch new_branch_name and call it testname.
  • The new_branch_name should be short but descriptive of the contents of the branch
  • Local guidelines may ask you to connect the name to a code issue, a specification, or bug report ID.
  • Make sure you have committed any changes before switching to a new branch so your Main is up to date.
  • Close any open files before switching to a new branch.
  • If you run git branch -a now, after creating the branch called testname, you will get:
    * main testname remotes/origin/main
  • Note the * has stayed on main.
Important

When you create a new branch, Git is NOT copying and pasting all your files to a new directory.

You cannot see your branches in your file manager. They only exist in the Git history.

Git creates a branch by adding it to the tree structure Git maintains all the files that comprise the Git history.

It uses the Git history to create the files as needed in a branch based on all the bookkeeping that goes on to track the all the changes in the Git history across all branches.

9.3.2 Working within a Branch

  1. Change to any branch with git switch branch_name.
  • This moves you to the new branch (or an existing branch of that name).
  • It also changes the structure in the RStudio Files pane since you are now working on a new branch.
  • If you run git branch a now, after creating a branch called testname, you will get:
    main * testname remotes/origin/main
  • Note the * has moved to testname which is now your “working directory”.
  • If you have an open file that exists in the branch, it will switch to that version of the file.
  • If you have an open file that does not exist in the new branch, RStudio will ask you to close it since the directory “no longer exists” - you are on a different branch.
  1. Ensure your branch is up to date with the main with git merge main.
  • This will update your current branch with any code changes (perhaps from other branches) from the local main branch.
  • If there are not changes it will tell you Already up to date.
  • If you have an open file, it will update it with any changes from the Main file.
  1. Update your analysis/code files as desired. Remember to save, add, and commit as usual.
  • Only now your updates are going into snapshots for the branch

  • For the exercise, add some additional commentary on the graph and save the file - but don’t close it.

  • For the exercise, delete the html file.

  • Running git status may show something like the following:
    On branch testname
    Changes not staged for commit:
    (use "git add <file>..." to update what will be committed)
    (use "git restore <file>..." to discard changes in working directory)
    modified: 98_github_team_workflow/98a_github_team_workflow.Rmd

  • For the exercise, add all (git add -A) and commit with a comment.

  • For the exercise, switch back and forth between main and testname branches and observe the changes in the .qmd file and the RStudio Files pane.

  • Switch to the testname branch and check with git branch -a.

  • Now it’s time to upload the branch to your repo on GitHub.

9.3.3 Synching your Branch with GitHub

If you go to GitHub for your repo, you can see it shows it is on Main and there is only 1 branch as in Figure 9.2.

Figure 9.2: GitHub repo with only one branch.
  1. Create your new branch on GitHub with git push --set-upstream origin branchname where branchname is “testname”.
  • This creates the upstream (origin) branch and completes the first push to that branch.
  • Refresh your GitHub page you will see the new branch listed under branches as in Figure 9.3.
  • Use the pull down to switch to branch testname.
Figure 9.3: GitHub with a new branch.
  1. Update your files and push to GitHub as normal
  • You will see the comments on GitHub branch to show how many commits the branch is ahead of the Main branch as in Figure 9.4.
Figure 9.4: GitHub new branch commit history compared to Main.

9.3.4 Merging Your Updates From a Branch into the Main

  1. When ready, merge the updates in the testname branch into the local Main branch.
  • Save and close all files in the testname branch.
  • Execute a final git add, commit, and push.
  • git status should show:
    On branch testname Your branch is up to date with 'origin/testname'.
  • Switch to the main branch with git switch main.
  • Ensure your main is up to date with the remote main with git pull.
    • This step is critical is to ensure you have all changes from other people, or from other branches you already merged, to minimize the risk of merge conflicts on GitHub.
  • Merge the changes from the branch into the main baseline with git merge branchname.
    • This creates a single commit to the local main branch from the branchname history.
    • If there are merge conflicts they will show up at this step.
  • After some messages for the merge, running git status shows something like:
    On branch main
    Your branch is ahead of 'origin/main' by 3 commits.
    (use "git push" to publish your local commits)
  1. Update your upstream/origin main on GitHub with git push.
    • No need to do git add and git commit since the git merge branchname created a commit.
    • Refresh Github and it now shows the branch is up to date with main as in Figure 9.5.
Figure 9.5: GitHub with branch merged into Main.

9.3.5 Deleting a Branch

  1. Deleting the Local and Remote Branch when finished
  • Once all the branch changes have been merged, the branch is usually no longer needed.
    • On large scale projects where branches are tied to specific requirements or issues, it is common to delete branches that are completed from a housekeeping perspective.
    • For individual work you can keep reusing a branch for different requirements or you can delete it and create a new branch as needed. Just make sure you keep your branches in sync with the Main branch.
  • Make sure you are not in the branch you want to delete i.e., get to the Main with git switch main.
  • Delete a local branch with git branch -d localBranchName
  • Then Delete the remote (origin) branch with git push origin --delete remoteBranchName which for this is testname.

9.4 Team Workflows: Fork & Pull vs. Shared Repos

Teams operate from a shared repository or repo.

Two common models are differentiated by how the team members interact with the shared repo based on their write permissions for the repo Main branch.

  • A large project may have lots of analysts/developers but only a few with write privileges to the Main branch to protect the Main branch.
    • This often leads to a “Fork and Pull” workflow model for updates.
  • Smaller projects often use a “Shared Repository” workflow model where everyone has write privileges.
    • Everyone agrees to follow a common process for updating the text/code so they can avoid/minimize merge conflicts.

Forking and Cloning is a way to get any repo, to which you have access, on to your local computer.

  • If you don’t have write privileges to the Main you create your own copy or Fork of the repo on GitHub.
  • Forking a repo on GitHub creates a separate copy of the repo in the GitHub organization to which you are forking and for which you have write privileges.
    • You have write privileges to your forked copy of the repo
  • When you have write privileges to a GitHub repo you can create a local copy by Cloning.
  • Cloning a repo on Github creates a local copy of a GitHub repo to the location you chose, e.g., a directory on your computer or in the cloud.

9.4.1 The Fork & Pull Workflow Model

In a Fork & Pull workflow model, each user creates a forked version of the repo on GitHub under their own organization and then clones the forked repo to their local machine.

  • Users make all the changes they want locally (using branching of course).
  • Users push their updates to their forked repo on GitHub.
  • Users then initiate a Pull Request to ask the owners of the Original Repo to consider adding the update to the branches of the original repo., e.g., the Main.
  • This workflow model reduces the amount of friction for new contributors.
  • It is popular with open source projects because people can work independently, without upfront coordination, without putting the Main code baseline at risk.
  • Anyone can fork the repos for tidyverse packages, make their own changes locally, and then submit pull requests for consideration by the tidyverse development team.

9.4.2 The Shared Repository Workflow Model

In a Shared Repository workflow model, each user clones the shared repo to their local machine.

  • Everyone has write privileges to the Main branch.
  • The team agrees on who has responsibility for addressing which requirements.
  • Users create branches and make all the changes they want locally.
  • They could just merge to the Main and push if they wanted. However, …
  • Before merging with the Main, they should also create a Pull Request with others on the team. The pull request will enable others to review the code and discuss the proposed changes before the changes are merged into the Main branch.

9.5 Life Cycle of a Pull Request

Pull Requests initiate discussion about proposed code changes with others on a team.

  • The Pull Request workflow is tightly integrated with the underlying Git repository so anyone can see exactly what changes would be merged if they accept your request.

You can open a Pull Request at any point during the development process:

  • When you have little or no code but want to share some screenshots or general ideas,
  • When you’re stuck and need help or advice, or,
  • When you’re ready for someone to review your work.

By using GitHub’s @mention system in your Pull Request message, you can ask for feedback from specific people.

  • @GitHub_username anywhere in an issue or pull request notifies the person and subscribes them to future updates.
    • e.g., @rressler - what do you think?

Once a Pull Request has been opened, the persons reviewing your changes can enter questions or comments.

  • You can reply as part of the “conversation” about the Pull Request
  • As you get comments you can continue to push updates resolving the comments to your GitHub branch.
  • GitHub will show your new commits and any additional feedback you may receive in the unified Pull Request view.
  • When done someone with write privileges can either merge your code into the Main with the final changes or you can close the pull request.

9.5.1 Pull Request Life Cycle in Pictures

If you deleted the testname branch, recreate it.

  • Switch to the branch testname.
  • Edit the life_exp.qmd file, enter some comments, and save it.
  • Add, commit, and push to GitHub so the testname branch is again 1 commit ahead of main as in Figure 9.4.

9.5.1.1 Create a Pull Request

Go to your branch on GitHub and click on the green button Compare & Pull Request seen in Figure 9.6.

Figure 9.6: Start the Pull Request lifec cycle.
  • You can also click on Pull requests in the menu bar and then click on Compare & pull request or New pull request.
  • Either way will bring up the Open a pull request page.
Figure 9.7: Open a pull request page
  • Check that the pull in the correct direction - the Main should be on the left and the branch is on the right with arrow going from the branch to the Main.
    • This page is for pulling from one branch to another within a repo. You can also do a pull request to update a fork of a repo to get the latest changes from the original repo or submit proposed changes to the original repo owners.
  • GitHub automatically compares the two versions and provides its finding if they can be easily merged at the top.
  • You can compare the two versions, especially if there are merge conflicts, using the diff pane at the bottom as in Figure 9.8.
Figure 9.8: Pull Request full page with diff pane at the bottom.
  • Write comments for the reviewers.
  • To select reviewers, click on the gear icon to see a drop down list of team members.
  • Then click on Create pull request to distribute the request as in Figure 9.9.
Figure 9.9: Enter comments, select reviewers, and create the pull request.

9.5.1.2 Act on a Pull Request

If you are a reviewer of a pull request select Pull requests in the repo menu bar to see the list of open pull requests.

You now have several options as in Figure 9.10

  • Review: You can review several aspects of the pull request by using the tab panels
    • Conversation shows you all the comments that have been made on a pull request
    • Commits allows you to delve into the list of commits since the last merge and then the details of individual commits,
    • Checks shows you the results of any GitHub actions that affect the pull request.
    • Files opens up a diff of the changes in the files.
Figure 9.10: Pull Request Choices: review, merege, comment, or close.
  • When reviewing code you can comment on single or multiple lines by using the mouse to highlight them and clicking on the blue + sign that pops up on the left of the lines.
  • That will open up the comment box where you can make suggestions and share with others.
Figure 9.11: Click on a line of code to comment on it.
  • Merge Pull Request: Once you have review the pull request and you want to merge, go to the middle of the page where GitHub provides the output of automated checks.
    • Figure 9.10 shows the Pre-commit CI check has failed. IGNORE THAT. The failure is because this is a private repo and that normally requires a paid plan for pre-commit actions.
    • Github also shows that the branch has no merge conflicts with the Main or base branch.
    • When ready to merge, click on `Merge pull request.
      • There are three options, but just choose the default one for now. Other organizations may have guidance for which type of merge to use.
    • You will be asked to Confirm merge and enter comments so everyone can see them in the history.
    • Enter comments and click to confirm the merge.
    • The page will update as in Figure 9.12.
      • Note that GitHub also provides the option to automatically delete the branch that generated the pull reuqest from the repo.
      • If you click on it, it will delete the branch and make a commit on the GitHub repo.
    Figure 9.12: GitHub after Merge is confirmed.
  • Comment on or Close the Pull Request: If you decide not to merge, at the bottom of Figure 9.10 you can enter comments and share as to why, or you can enter comments and close the pull request without merging any code.
    • The Pull request will no longer show up in the open list.

9.5.2 After a Pull Request Updates the Baseline, Team Members must use Git to Pull and Merge the Updates Locally

  • Once the GitHub Main branch has been updated with a new merge, tell your teammates.
  • Each team member should pull the update to their local Main, and possibly to their local branch.
  • If in doubt about what the changes did, use git log to see what has changed - the snapshots.
    • By default, with no arguments, git loglists the commits in reverse chronological order, i.e., the most recent commits show up first.
    • git log shows each commit with its SHA-1 checksum, the author’s name and email, the date written, and the commit message.
  • To see what is in a commit, you can copy the SHA-1 checksum and use git checkout checksum to move to that commit.
    • BE SURE TO HAVE COMMITTED ALL OF YOUR CURRENT WORK FIRST.

9.5.3 When Things Go Wrong and There is a Merge Conflict

Merge conflicts happen when you try to merge a branch into Main (or another branch) that has competing commits.

  • As an example, the Main branch has changes on the same line that do not match your changes.

  • They may be from different branches or users (someone did not do a git pull before editing the same code someone else was changing).

  • Git will generally try to resolve changes, but when it cannot, it will force you to decide which changes to incorporate in the final merge before you can commit the merge.

A mid-merge failure will output an error message like the following:

error: Entry '<fileName>' would be overwritten by merge. Cannot merge. (Changes in staging area)

If you get the above message, Git should have added changes to your file to show the conflicts.

  • When you open a file, search the file for the conflict marker <<<<<<<.

  • You may get something like this:

    <<<<<<< HEAD
    User 1 changed code and/or text comment 
    User 1 changed code and/or text comment 
    =======
    User 2 code and/or comments
    User 2 code and/or comments
    >>>>>>> 

You will have to edit the proposed changes to select the ones to keep or decide to do more drastic changes as there could be multiple conflicts in the file.

  • The start line for the conflict is indicated by the line <<<<<<< HEAD.

  • The ======= line is the “center” of the conflict.

  • All the content between the <<<<<<< HEADline and the center ======= line is content that exists in the current Main branch which is where the Git HEAD reference is pointing.

  • All content between the center and >>>>>>> is content present in the (your) proposed merge branch.

  • The merge conflict ends with the >>>>>>>

  • As a drastic change, you can use Git commands to overwrite one file or the other. See How To Resolve Merge Conflicts in Git

To resolve a merge conflict one at a time, manually remove the no-longer wanted code or comments from the file from your branch and also remove the “conflict dividers” and save the file.

  • You should collaborate with the teammate who created the original change.

  • If you have more than one merge conflict in your file, scroll down to the next set of conflict markers and repeat the previous step to resolve the conflict.

  • Once all conflicts have been resolved, use the git add command to move the new changes to the staging area, and then git commit -m "fixed merge conflict" to commit the changes. Then git push.

  • If you are working on a simple merge conflict within GitHub, see Resolving a merge conflict on GitHub

9.6 Group Project Workflow

  • Either model will work for your group project
  • There are trade-offs between them in terms of level of planning and risk to the code - just like real-world work.
  • You have to decide early on as a group which approach to use so individual contributors are using the same method.
  • Both require establishing operating rules for the project to protect the working code.
    • Set clear ownership for each requirement and piece of the App or the code - to include the vignette
    • Agree on how the sections of code will interact with each other.
    • Agree on styles, shared variable/object/function names and even libraries to be used.
    • Agree on a testing approach for integrated testing
  • The Shared Repository model requires less collaboration on merging code but puts the app at risk of someone breaking something that was working before (if they skip the pull request).
    • Have to then go back to a working version.
    • No need for a final arbiter to approve merge requests - everyone has push rights.
    • Can result in merge conflicts
  • The Fork and Pull Model requires more collaboration for merging code, using pull requests with feedback prior to merging, but can reduce code and team conflicts.
    • Tends to drive more use of branching which reduces risk to a working baseline.

9.7 Git and GitHub Team Workflow Summary

  • Branching provides a more sophisticated method for using Git to evolve your analysis without putting working code at much risk.
  • It follows the best practice for separating development, testing, and production.
  • Branching allows you to work on multiple items at once, in different branches, without interfering with each other or the production Main branch.
  • Branching also sets up the ability to do team workflows for collaborative - not colliding work.
  • Large teams use the Fork and Pull model to minimize risk to the production baseline.
  • Small team s usually use a shared repository workflow model.
    • Requires collaboration on assignment of elements (or branches) to be worked.
  • GitHub Pull enable conversations about updates across the team before the Main baseline gets updated.
  • Use branches when you have a working baseline and you want to develop new code while retaining what works until the new code is tested and ready.