9 Branching and GithHub Team Workflow
Git, git branch, GitHub, clone, fork, pull request
9.1 Introduction
9.1.1 Learning Outcomes
- Use Branching in Git to evolve your code without risking the Main branch.
- Create new branches, update files, push to GitHub, and merge updates to the Main branch.
- Apply both Fork and Pull and Shared Repository models for Team workflow.
- Use Pull Requests Merge commands for coordinated development.
- Resolve Merge Conflicts.
- Select and apply a GitHub workflow model to a group project.
9.1.2 References:
- GitHub Flow GitHub (2023b)
- Git Help Git (2023)
- Collaborating with Pull Requests GitHub (2023a)
- Git and GitHub from the Terminal and RStudio Lucet (2023)
9.1.2.1 Other References
- Happy Git and GitHub for the useR Bryan and Hester (2023)
- Resolving a Merge Conflict on GitHub GitHub (2023c)
- Pull Request Helpers from {usethis} Wickham, Bryan, and Barrett (2022)
9.2 Motivation for Branching
You have been using an individual workflow model for Git and GitHub for version control of your own work.
You have been building competency in …
- Creating repositories with a single “Main” branch.
- Editing the text and code in the files in the repo.
- Executing the Save, Add, and Commit workflow to create a history.
- Pushing committed changes to GitHub for cloud-based replication and sharing.
Every push has updated the Main branch baseline on GitHub, even if the new version is still in progress.
That has been fine so far as you are the only one using your code.
However, the individual workflow model can lead to challenges when others are using your code.
- If you deploy a model for other people to use, every commit means overwriting the main branch, even if the version no longer works. That could make for unhappy users!
- If you are part of a team, where your code has to work with other developers’ code, two people could update the same text or code or different parts of the same document. Without care, whichever version gets pushed last will overwrite the other persons work or create a “merge conflict” which keeps the baseline from updating.
To mitigate the risks of an individual workflow model, use a team workflow model.
Git 2.23, released on 16 August 2019, added two experimental commands: git switch
and git restore
to augment and simplify the use of git checkout
so older posts may refer to git checkout
instead of what you see below.
Go to the terminal window and enter git --version
to check which version is on your computer. If it is before 2.23, then update.
- Go to git downloads to get the newest production version of Git.
- Or, enter
git clone https://github.com/git/git
in the console pane to get the latest development version.
9.2.1 Solution Development and Deployment
Many data science solutions are deployed for continuous use by others. These include reporting models, optimization models, machine learning models, shiny applications, web sites, you name it.
If other people are using your deployed solution you want to make sure it always works. However, you usually also want to be able to update the solution as new requirements appear or, worse, portions of your code no longer work as packages change and functions get deprecated.
The software development community has a framework for managing the tension and risks between ensuring a continuously working version for users while also being able to update the version. No one likes to fix an airplane while it is in flight.
A simple form of the framework has a version (or baseline) of code in one of three stages: Development, Testing, or, Production or, in shorthand, “Dev-Test-Prod” as in Figure 9.1.
- Development is the stage where the developers do all their updates to their section or module of code, be it fixes or new features.
- They also do their own testing to make sure their code is working.
- Once each developer has their code working, they promote it to the testing stage.
- Testing is the stage where the testing team checks if the entire set of code works together.
- They merge the working code from multiple developers into an integrated version.
- They test if previous functionality still works (“regression testing”) and then test if the bug fixes or new features actually work.
- Testing should occur in the exact same software/hardware environment that supports the production version.
- Once the integrated version passes integrated testing, it is promoted to the production stage.
- Production is the stage where the deployment team replaces the current release with the version promoted by the testing team.
- The new baseline is staged in the production environment and deployed as the working version in a way to minimize disruption to the users.
There are many variations on this framework but the concept of separating the development baseline from the production baseline is a well-established “best practice.”
Git has additional capabilities to enable new development without disturbing the main branch baseline until the new version is ready.
9.3 Branching in Git and GitHub
Git has the capability to create and synchronize more branches than the Main branch.
- Each branch is its own version of the code.
Branching allows users to make multiple changes to the code and test it prior to incorporating the new code into the Main branch baseline
Branching supports best practices for clear separation between Development, Testing, and Production versions of code.
- Lowers risk of disrupting production code.
- Allows parallel development of multiple features.
Branching leads naturally to supporting a workflow for Team projects in GitHub.
GitHub has two common workflow models for teams.
- The Fork and Pull Model
- The Shared Repository Model
Choosing a workflow model allows for consistent, efficient parallel development by teams while minimizing risk to the working Main code base (or its deployed version of production code).
Example Prep
Navigate in the Terminal Pane to the directory/repo for the life_exp_analysis.qmd
file you created earlier in the course.
- If you do not have a repo and file, create one as in Section 2.6 and create a new GitHub repo for it.
- Render the .qmd file to html output.
- Go to your terminal window and navigate so the working directory is set to that directory and repo.
- We will use that repo for the rest of this section.
9.3.1 Creating a New Branch
Every repo has at least one Branch - the Main branch.
- The Main is considered sacrosanct - it is the baseline or production version of code - it is known to work.
Branching is the Git functionality which allows you to add new branches (for additional analysis or code features) and then eventually merge them into your Main branch to create a new baseline version in the Main branch.
- The repo is a collection of snapshots of your code that allow you to go back in time
- Creating a branch creates a new, additional, set of snapshots that does not affect the Main branch snapshots.
Before you create a new branch:
- Ensure your Main branch is up to date with the remote repo Main branch using
git pull
.- You may have made changes to the Main from a different branch.
git pull
runsgit fetch
followed bygit merge
.- If there are conflicts, you will have to resolve them and try again until you get a clean pull.
- GitHub documentation and Google/Stack Overflow can be your friend.
- Check for existing branches with
git branch -a
. It may return something like:
* main
remotes/origin/main
- The
* main
means the main branch is the current working branch. remotes/origin.main
means the remote repo on GitHub also has one branch and it is called Main as well.
Once you are ready to create a new branch.
- Create a new local branch with
git branch new_branch_name
and call ittestname
.
- The new_branch_name should be short but descriptive of the contents of the branch
- Local guidelines may ask you to connect the name to a code issue, a specification, or bug report ID.
- Make sure you have committed any changes before switching to a new branch so your Main is up to date.
- Close any open files before switching to a new branch.
- If you run
git branch -a
now, after creating the branch calledtestname
, you will get:
* main
testname
remotes/origin/main
- Note the
*
has stayed onmain
.
When you create a new branch, Git is NOT copying and pasting all your files to a new directory.
You cannot see your branches in your file manager. They only exist in the Git history.
Git creates a branch by adding it to the tree structure Git maintains all the files that comprise the Git history.
It uses the Git history to create the files as needed in a branch based on all the bookkeeping that goes on to track the all the changes in the Git history across all branches.
9.3.2 Working within a Branch
- Change to any branch with
git switch branch_name
.
- This moves you to the new branch (or an existing branch of that name).
- It also changes the structure in the RStudio Files pane since you are now working on a new branch.
- If you run
git branch a
now, after creating a branch calledtestname
, you will get:
main
* testname
remotes/origin/main
- Note the
*
has moved totestname
which is now your “working directory”. - If you have an open file that exists in the branch, it will switch to that version of the file.
- If you have an open file that does not exist in the new branch, RStudio will ask you to close it since the directory “no longer exists” - you are on a different branch.
- Ensure your branch is up to date with the main with
git merge main
.
- This will update your current branch with any code changes (perhaps from other branches) from the local main branch.
- If there are not changes it will tell you
Already up to date.
- If you have an open file, it will update it with any changes from the Main file.
- Update your analysis/code files as desired. Remember to save, add, and commit as usual.
Only now your updates are going into snapshots for the branch
For the exercise, add some additional commentary on the graph and save the file - but don’t close it.
For the exercise, delete the html file.
Running
git status
may show something like the following:
On branch testname
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: 98_github_team_workflow/98a_github_team_workflow.Rmd
For the exercise, add all (
git add -A
) and commit with a comment.For the exercise, switch back and forth between
main
andtestname
branches and observe the changes in the .qmd file and the RStudio Files pane.Switch to the
testname
branch and check withgit branch -a
.Now it’s time to upload the branch to your repo on GitHub.
9.3.3 Synching your Branch with GitHub
If you go to GitHub for your repo, you can see it shows it is on Main and there is only 1 branch as in Figure 9.2.
- Create your new branch on GitHub with
git push --set-upstream origin branchname
wherebranchname
is “testname”.
- This creates the upstream (origin) branch and completes the first push to that branch.
- Refresh your GitHub page you will see the new branch listed under branches as in Figure 9.3.
- Use the pull down to switch to branch
testname
.
- Update your files and push to GitHub as normal
- You will see the comments on GitHub branch to show how many commits the branch is ahead of the Main branch as in Figure 9.4.
9.3.4 Merging Your Updates From a Branch into the Main
- When ready, merge the updates in the
testname
branch into the local Main branch.
- Save and close all files in the
testname
branch. - Execute a final
git add, commit,
andpush
. git status
should show:
On branch testname
Your branch is up to date with 'origin/testname'.
- Switch to the
main
branch withgit switch main
. - Ensure your
main
is up to date with the remotemain
withgit pull
.- This step is critical is to ensure you have all changes from other people, or from other branches you already merged, to minimize the risk of merge conflicts on GitHub.
- Merge the changes from the branch into the main baseline with
git merge branchname
.- This creates a single commit to the local
main
branch from the branchname history. - If there are merge conflicts they will show up at this step.
- This creates a single commit to the local
- After some messages for the merge, running
git status
shows something like:
On branch main
Your branch is ahead of 'origin/main' by 3 commits.
(use "git push" to publish your local commits)
- Update your
upstream/origin main
on GitHub withgit push
.- No need to do
git add
andgit commit
since thegit merge branchname
created a commit. - Refresh Github and it now shows the branch is up to date with main as in Figure 9.5.
- No need to do
9.3.5 Deleting a Branch
- Deleting the Local and Remote Branch when finished
- Once all the branch changes have been merged, the branch is usually no longer needed.
- On large scale projects where branches are tied to specific requirements or issues, it is common to delete branches that are completed from a housekeeping perspective.
- For individual work you can keep reusing a branch for different requirements or you can delete it and create a new branch as needed. Just make sure you keep your branches in sync with the Main branch.
- Make sure you are not in the branch you want to delete i.e., get to the Main with
git switch main
. - Delete a local branch with
git branch -d localBranchName
- Then Delete the remote (origin) branch with
git push origin --delete remoteBranchName
which for this istestname
.
9.5 Life Cycle of a Pull Request
Pull Requests initiate discussion about proposed code changes with others on a team.
- The Pull Request workflow is tightly integrated with the underlying Git repository so anyone can see exactly what changes would be merged if they accept your request.
You can open a Pull Request at any point during the development process:
- When you have little or no code but want to share some screenshots or general ideas,
- When you’re stuck and need help or advice, or,
- When you’re ready for someone to review your work.
By using GitHub’s @mention system in your Pull Request message, you can ask for feedback from specific people.
- @GitHub_username anywhere in an issue or pull request notifies the person and subscribes them to future updates.
- e.g., @rressler - what do you think?
Once a Pull Request has been opened, the persons reviewing your changes can enter questions or comments.
- You can reply as part of the “conversation” about the Pull Request
- As you get comments you can continue to push updates resolving the comments to your GitHub branch.
- GitHub will show your new commits and any additional feedback you may receive in the unified Pull Request view.
- When done someone with write privileges can either merge your code into the Main with the final changes or you can close the pull request.
9.5.1 Pull Request Life Cycle in Pictures
If you deleted the testname
branch, recreate it.
- Switch to the branch
testname
. - Edit the
life_exp.qmd
file, enter some comments, and save it. - Add, commit, and push to GitHub so the
testname
branch is again1 commit ahead of main
as in Figure 9.4.
9.5.1.1 Create a Pull Request
Go to your branch on GitHub and click on the green button Compare & Pull Request
seen in Figure 9.6.
- You can also click on
Pull requests
in the menu bar and then click onCompare & pull request
orNew pull request
. - Either way will bring up the
Open a pull request
page.
- Check that the pull in the correct direction - the Main should be on the left and the branch is on the right with arrow going from the branch to the Main.
- This page is for pulling from one branch to another within a repo. You can also do a pull request to update a fork of a repo to get the latest changes from the original repo or submit proposed changes to the original repo owners.
- GitHub automatically compares the two versions and provides its finding if they can be easily merged at the top.
- You can compare the two versions, especially if there are merge conflicts, using the diff pane at the bottom as in Figure 9.8.
- Write comments for the reviewers.
- To select reviewers, click on the gear icon to see a drop down list of team members.
- Then click on Create pull request to distribute the request as in Figure 9.9.
9.5.1.2 Act on a Pull Request
If you are a reviewer of a pull request select Pull requests
in the repo menu bar to see the list of open pull requests.
You now have several options as in Figure 9.10
- Review: You can review several aspects of the pull request by using the tab panels
Conversation
shows you all the comments that have been made on a pull requestCommits
allows you to delve into the list of commits since the last merge and then the details of individual commits,Checks
shows you the results of any GitHub actions that affect the pull request.Files
opens up a diff of the changes in the files.
- When reviewing code you can comment on single or multiple lines by using the mouse to highlight them and clicking on the blue
+
sign that pops up on the left of the lines. - That will open up the comment box where you can make suggestions and share with others.
- Merge Pull Request: Once you have review the pull request and you want to merge, go to the middle of the page where GitHub provides the output of automated checks.
- Figure 9.10 shows the Pre-commit CI check has failed. IGNORE THAT. The failure is because this is a private repo and that normally requires a paid plan for pre-commit actions.
- Github also shows that the branch has no merge conflicts with the Main or base branch.
- When ready to merge, click on `Merge pull request.
- There are three options, but just choose the default one for now. Other organizations may have guidance for which type of merge to use.
- You will be asked to
Confirm merge
and enter comments so everyone can see them in the history. - Enter comments and click to confirm the merge.
- The page will update as in Figure 9.12.
- Note that GitHub also provides the option to automatically delete the branch that generated the pull reuqest from the repo.
- If you click on it, it will delete the branch and make a commit on the GitHub repo.
- Comment on or Close the Pull Request: If you decide not to merge, at the bottom of Figure 9.10 you can enter comments and share as to why, or you can enter comments and close the pull request without merging any code.
- The Pull request will no longer show up in the open list.
9.5.2 After a Pull Request Updates the Baseline, Team Members must use Git to Pull and Merge the Updates Locally
- Once the GitHub Main branch has been updated with a new merge, tell your teammates.
- Each team member should pull the update to their local Main, and possibly to their local branch.
- If in doubt about what the changes did, use
git log
to see what has changed - the snapshots.- By default, with no arguments,
git log
lists the commits in reverse chronological order, i.e., the most recent commits show up first. git log
shows each commit with its SHA-1 checksum, the author’s name and email, the date written, and the commit message.
- By default, with no arguments,
- To see what is in a commit, you can copy the SHA-1 checksum and use
git checkout checksum
to move to that commit.- BE SURE TO HAVE COMMITTED ALL OF YOUR CURRENT WORK FIRST.
9.5.3 When Things Go Wrong and There is a Merge Conflict
Merge conflicts happen when you try to merge a branch into Main (or another branch) that has competing commits.
As an example, the Main branch has changes on the same line that do not match your changes.
They may be from different branches or users (someone did not do a
git pull
before editing the same code someone else was changing).Git will generally try to resolve changes, but when it cannot, it will force you to decide which changes to incorporate in the final merge before you can commit the merge.
A mid-merge failure will output an error message like the following:
error: Entry '<fileName>' would be overwritten by merge. Cannot merge. (Changes in staging area)
If you get the above message, Git should have added changes to your file to show the conflicts.
When you open a file, search the file for the conflict marker
<<<<<<<
.You may get something like this:
<<<<<<< HEAD
User 1 changed code and/or text comment
User 1 changed code and/or text comment
=======
User 2 code and/or comments
User 2 code and/or comments
>>>>>>>
You will have to edit the proposed changes to select the ones to keep or decide to do more drastic changes as there could be multiple conflicts in the file.
The start line for the conflict is indicated by the line
<<<<<<< HEAD
.The
=======
line is the “center” of the conflict.All the content between the
<<<<<<< HEAD
line and the center=======
line is content that exists in the current Main branch which is where the Git HEAD reference is pointing.All content between the center and
>>>>>>>
is content present in the (your) proposed merge branch.The merge conflict ends with the
>>>>>>>
As a drastic change, you can use Git commands to overwrite one file or the other. See How To Resolve Merge Conflicts in Git
To resolve a merge conflict one at a time, manually remove the no-longer wanted code or comments from the file from your branch and also remove the “conflict dividers” and save the file.
You should collaborate with the teammate who created the original change.
If you have more than one merge conflict in your file, scroll down to the next set of conflict markers and repeat the previous step to resolve the conflict.
Once all conflicts have been resolved, use the
git add
command to move the new changes to the staging area, and thengit commit -m "fixed merge conflict"
to commit the changes. Thengit push
.If you are working on a simple merge conflict within GitHub, see Resolving a merge conflict on GitHub
9.6 Group Project Workflow
- Either model will work for your group project
- There are trade-offs between them in terms of level of planning and risk to the code - just like real-world work.
- You have to decide early on as a group which approach to use so individual contributors are using the same method.
- Both require establishing operating rules for the project to protect the working code.
- Set clear ownership for each requirement and piece of the App or the code - to include the vignette
- Agree on how the sections of code will interact with each other.
- Agree on styles, shared variable/object/function names and even libraries to be used.
- Agree on a testing approach for integrated testing
- The Shared Repository model requires less collaboration on merging code but puts the app at risk of someone breaking something that was working before (if they skip the pull request).
- Have to then go back to a working version.
- No need for a final arbiter to approve merge requests - everyone has push rights.
- Can result in merge conflicts
- The Fork and Pull Model requires more collaboration for merging code, using pull requests with feedback prior to merging, but can reduce code and team conflicts.
- Tends to drive more use of branching which reduces risk to a working baseline.
9.7 Git and GitHub Team Workflow Summary
- Branching provides a more sophisticated method for using Git to evolve your analysis without putting working code at much risk.
- It follows the best practice for separating development, testing, and production.
- Branching allows you to work on multiple items at once, in different branches, without interfering with each other or the production Main branch.
- Branching also sets up the ability to do team workflows for collaborative - not colliding work.
- Large teams use the Fork and Pull model to minimize risk to the production baseline.
- Small team s usually use a shared repository workflow model.
- Requires collaboration on assignment of elements (or branches) to be worked.
- GitHub Pull enable conversations about updates across the team before the Main baseline gets updated.
- Use branches when you have a working baseline and you want to develop new code while retaining what works until the new code is tested and ready.