Slide Deck

Exercises

Initialize a repository

In this exercise, we will:

  • use git init to initalize a git repo in your project’s root directory
  • view contents of the project folder before and after git initialization

Open a terminal emulator and use cd to navigate to the root directory of the project that you would like to turn into a git repository.

You can use the chunk of code below, where <path/to/your/project> should be replaced with an appropriate absolute or relative file path to the root directory of your project.

cd <path/to/your/project>

Once you have navigated to the correct directory, use the ls command to list files in the directory. Take note of what files are included. Then use the ls -a command to list all files in the directory.

Now initialize a git repository using the following command.

git init

Again, use the ls and ls -a commands to list files in the directory. What has changed?

The output of ls should be the same as before. However, when you use ls -a you should now see a new directory called .git.

The .git directory is where all relevant information for version control is stored. There are several files and subdirectories included in this directory. In general, you should not need to worry about what is happening in this directory. However, you should note that:

  • the only thing that makes something a git repository is the presence of this .git directory
  • deleting this directory will remove the project from version control
  • if your project is also stored in a Sharepoint/Dropbox folder, you need to make sure that the contents of this directory stays synchronized across computers that work with. Strange behavior can result from incompletely synchronized .git folders.

If you would like to read more about what is inclued in the .git folder, you can check out the resources below:


First commit

In this exercise, we will:

  • use the git add to stage files for commit
  • use git commit to make a commit

Before you start, make sure that your terminal emulator is still in the correct directory by using pwd. If you are not in the correct folder, use cd to navigate to the root directory of your folder.

Once you are in the correct directory, execute the following command:

git status

The output will look something like this:

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
  .Renviron
  .Rhistory
  ...
  ...

Note that git is telling you that there are files in this directory that are not being tracked.

To make a new commit, we generally use the command

git add <file_name1> <file_name2> <...>

Exercise: Use this command to stage all of the R scripts in the project for commit.

We could use the following command:

git add *.R

If this command is executed from the projects root directory, then it will add all files with the .R extension included in your project.

Or if all your R scripts are contained in a directory, e.g., named source, you could add the folder.

git add source

Pattern matches can be useful for adding multiple files, but keep two things in mind:

First, you may consider avoiding commits with multiple files implementing multiple things at the same time. It may be better to split up the files across different commits.

Second, the behavior of * is somewhat unexpected in this circumstance:

git add *

This will add all files in your project except any dot-files (i.e., files whose name begins with .). This may miss some important files.

To add all files correctly use:

git add --all

Before committing, stage the analysis directory for commit as well.

git add analysis

Then make the commit using the following command:

git commit -m "<add-your-message-here>"

Be sure to replace <add-your-message-here> with an informative message!

Take a final look at git status. What has changed?

Run the following command to view the project history:

git log --oneline


Second commit

In this exercise, we will:

  • make a modification to a file
  • use git diff to see what modifications have been made to a file
  • commit a new version of the file
  • use git log to look at the history of a project

Open the source/04_data_visualization.R script in the R studio editor. Change something about the plot code. For example, you could add a different theme to a plot

plot1 = covid %>%
  mutate(county = factor(county),
         county = fct_reorder(county, concentration)) %>%
  ggplot(aes(county, log(concentration), group = county, fill = county)) +
  geom_boxplot() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45, hjust = 1)) + 
  theme_bw()

Save the file in R Studio. Confirm that the file has saved by ensuring that the name of the file has changed from red to black. (You would be surprised how much confusion can be caused by forgetting to save files)

Take a look at git status. What does it have to say about the modified file?

Use the following command to view the difference between the new version of the file and the version on the current commit.

git diff source/04_data_visualization.R

If all of the changes fit on one screen, the differences will simply print to the console.

If there are A LOT of changes, then you can scroll through the differences either using: enter, page up/down, or arrow keys. When you are done browsing the changes, press q to return to the command line.

Next, stage the file for commit:

git add source/04_data_visualization.R

Exercise: We are now going to do something a little bit scary. We are going to run the git commit command but forget to include a message 😱

git commit

More than likely you will see the following on the screen:

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.

But if you try to type nothing happens. What the…

The exercise is simple: escape!

What happened here? When you forget to include a commit message, git launches a text editor program on your computer. The default text editor that git uses is vim, a powerful, but painfully minimal and more painfully complex editor.

The key part of this solution is: DON’T PANIC. We will get through this together.

Follow these precise steps:

  • type i
    • you should see -- INSERT -- at the bottom of the screen
    • now when you type you will actually see letters appear!
  • use the keyboard as normal to type your commit message
    • notice that you cannot use the mouse to select the cursors position, you have to use the arrow keys
  • when you have finished your message press esc
    • you should no longer see -- INSERT -- at the bottom
  • type :x (colon and the x key)
    • you will see :x appear at the bottom of the screen
  • press enter to save the commit message and quit vim

This will complete the commit. But what the heck just happened?!

In vim, we use i to enter INSERT mode, which is where we actually get to type things.

Pressing esc exited from INSERT mode (back to what is called NORMAL mode).

The colon key takes us into command-line mode. This is where we can execute a host of commands (e.g., find and replace text, save a document, etc…). Think of this like a text-based tool bar for vim.

In command-line mode, we execute the command x which is to save and close the document.

This is exactly what git was asking us to do to complete the commit – save a message.

If all this was too scary for you, you have two options:

Option 1: abandon the commit! Before you do anything else in vim type :q!, which will quit vim without saving. This abandons the commit. Then you can use the one-liner git commit -m "<with-a-message-this-time-dummy>" to complete the commit.

Option 2: configure a different text editor for git! Read more here.


Third commit

In this exercise, we will:

  • add a temporary file to a new commit
  • demonstrate renaming files using git mv in a new commit
  • delete a file using git rm in a new commit

In R Studio, open a new file, add whatever you want to the file, and save it in the source directory. Name the file foo.R.

Use the following commands to commit this new file:

git add source/foo.R
git commit -m "Add temporary file to demonstrate renaming and removing"

To rename the file use:

git mv source/foo.R source/bar.R

Check git status to see what git thinks about this change! To commit, run:

git commit -m "Rename temporary file"

To remove the file use:

git rm source/bar.R

Check git status to see what git thinks about this change! To commit, run:

git commit -m "Remove temporary file"

Exercise: Repeat the above process of creating a new file in the source directory named foo.R. Add it to a new commit using git add and git commit. Now suppose you forget to use git mv to rename the file and instead use plain old mv:

mv source/foo.R source/bar.R

Check git status to see what git thinks has happened. What do you need to do to complete the process of committing the renamed file?

Notice that git status should show source/foo.R as being deleted and show source/bar.R as an Untracked file.

To complete the commit we need to run:

git add source/bar.R
git rm source/foo.R
git commit -m "Rename temporary file"


Exercise: In the previous exercise, you again renamed the file source/foo.R to source/bar.R. Now create a commit that removes source/bar.R but does not delete the file from your local directory.

This can be accomplished using the following (not very intuitively named) option:

git remove --cached source/bar.R
git commit -m "Remove temporary file"

Using ls should confirm that source/bar.R has not been deleted from your directory.

Checking git status should show that source/bar.R is now treated as any other untracked file. For example, it could be added back to a new commit.


Ignoring files

In this exercise, we will:

  • create a .gitignore file to ignore certain files

After completion of the previous exercise, you should have a file named source/bar.R that is untracked by git. If you did not complete the previous exercise or do not have such a file available, then create one by running:

touch source/bar.R

Now we are ready to create our .gitignore file. In R Studio, select File: New File: Text file.

Save the file as .gitignore by clicking File: Save as: and type .gitignore.

Now add the following text to the .gitignore file

# don't want credentials in git!
.Renviron
# ignoring this file to see behavior
source/bar.R

Save the file when you are done adding the text.

Now check git status again. You should no longer see source/bar.R appear as an untracked file.

Exercise: You may have also noticed that git is now treating the .gitignore file itself as an untracked file. Here’s what should be an easy exercise by now (hopefully!) – make a new commit that includes the .gitignore file

We can execute the following commands:

git add .gitignore
git commit -m "Add .gitignore to repo"


Pushing files to GitHub

In this exercise, we will:

  • create an empty GitHub repository
  • use git remote add to create a link between our local repository and the GitHub repository
  • create ssh credentials to allow us to push to GitHub
  • use git push to push files to GitHub

Follow these instructions to create an empty GitHub repository:

  • From GitHub dashboard, click on the + symbol in the upper right-hand corner and select New repository
  • Give the repository a name
  • DO NOT check the Add README file box
  • DO NOT add a .gitignore
  • DO NOT choose a license
  • Click Create repository

Now go back to your R Studio terminal emulator and run the following command to add the GitHub repository as a remote named origin to your local repository:

git remote add origin git@github.com:<github_user>/<repo_name>
  • <github_user> = your GitHub username
  • <repo_name> = the name of your GitHub repository

Confirm that the remote was added as expected by listing the remotes for your repository:

git remote –v

Now we are ready to push, except that we need to prove to GitHub that we have permission to push to a repository that we own. To do this we will use private/public keys.

First, check if any keys exist already:

ls -al ~/.ssh

Look for files named

  • id_rsa.pub
  • id_ecdsa.pub
  • id_ed25519.pub

If you do not have these files or if the .ssh directory does not exist, then you need to create a key pair.

To create a key pair, run the following command:

ssh-keygen -t ed25519 -C "<github_account@email.com>"

Be sure to replace <github_account@email.com> with the email address you used to create your GitHub account.

When prompted to "Enter a file in which to save the key", press Enter to accept the default file location.

For simplicity, when prompted to enter a passphrase, just press Enter twice to use no passphrase. If you would like the additional security of a password, read more about configuring authentication agents.

Re-run the ls --al ~/.ssh command to confirm the key was created successfully. You should see a file with .pub ending displayed.

Now head back to your web browser and GitHub. Navigate to the GitHub dashboard (i.e., your landing page once you sign in).

  • Click on the icon in the top right corner of the page (this will be your picture, if you’ve uploaded one to your profile, otherwise it will be a pixelated image).

  • Select Settings from the dropdown menu.

  • When the settings menu opens, on the left side of the screen, click “SSH and GPG keys”.

  • Click the green button “New SSH Key”

  • Give the key a title (can be arbitrary)

  • Open the file ~/.ssh/id_<something>.pub in R Studio.

    • Recall that ~ means your HOME directory. To remind yourself of what this directory is you can run echo $HOME in the terminal.
    • You may need to ensure that hidden files are shown in your file finder to see this file in order to open in in R Studio.
  • Copy the contents of the file to the Key box on GitHub

  • Click Add SSH key

  • Test your connection by executing ssh -T

You should see a message that reads:

Hi <user>! You've successfully authenticated, but GitHub does not provide shell access.

Now we are finally ready to push to GitHub. Use the command:

git push origin main

If you get an error about no branch named main, then try:

git push origin master

Once you have pushed, refresh your browser on your GitHub repository. You should be able to see your code!


Practicing add/commit/push

In this exercise, we will:

  • modify README.md file for your GitHub repository
  • add, commit, and push README.md to your GitHub repository

Exercise: Open the README.md using R Studio.

Add the following passage somewhere in the document.

This repo houses test code from the SISMID 2024 course Hitchhikers Guide to reproducibility.

Make a new commit that includes the README.md and then push the commit to GitHub.

To add and commit the file:

git add README.md
git commit -m "Add README to repo"

To push to GitHub:

# replace main with master as appropriate
git push origin main


Creating and merging branches

In this series of exercises, we will:

  • create a new branch
  • make a commit on the new branch
  • merge the new branch with main
  • delete the new branch

Exercise: Create a branch named plot. Checkout this branch and begin to make more minor modifications to the figure generated by source/04_data_visualization.R. Create a commit with the new changes.

To create a new branch we run:

git branch plot

To move the HEAD pointer to the new branch we run:

git checkout plot

Note that there is an option that can be used as a shortcut to do both of these steps at the same time!

git checkout -b plot

Now you can modify the plotting file as you please. E.g., you might change the axis labels. Save the file when you are done.

Make a commit on the plot branch:

git add source/04_data_visualization.R
git commit -m "Update plotting code"

Use git log --oneline to confirm that plot branch is now 1 commit ahead of main.

Now that we have made a commit on the plot branch, let’s pause to see how we would view the older version of the code that is still on the main branch.

To do this, we simply move the HEAD pointer by checking out main.

git checkout main

View the contents of source/04_data_visualization.R in R Studio. You should see the older version of the code.

Exercise: Merge plot into main.

Be sure that you have main checked out. If you are not sure, use git branch to see. The branch with the asterisk is your current branch. If you are not on main, use git checkout main to switch branches.

Once you are satisfied that you are on the main branch run:

git merge plot

Check git log --oneline to confirm that that main pointer has appropriately moved.


Resolving merge conflicts

In this exercise, we will:

  • create a new branch
  • make a commit on the new branch
  • checkout the main branch
  • make a commit on the main branch
  • attempt to merge when conflicts are present
  • resolve conflicts and complete the merge

We are going to create a new branch specifically to work on our README again. Let’s call the branch readme:

git branch readme
git checkout readme

Open README.md in R Studio. Modify the line:

This repo houses test code from the SISMID 2024 course Hitchhikers Guide to reproducibility.

to read

This repo houses test code from the *FANTASTIC* SISMID 2024 course Hitchhikers Guide to reproducibility.

Save the file, knit it, and commit the changes:

# use pattern matching to add both files
git add README.md
git commit -m "Update README with thoughts on course."

Now checkout the main branch. When you open README.md you should see that the text has reverted to

This repo houses test code from the SISMID 2024 course Hitchhikers Guide to reproducibility.

Change it to

This repo houses test code from the *absolutely miserable* SISMID 2024 course Hitchhikers Guide to reproducibility.

Save the file, knit it, and commit the changes:

# use pattern matching to add both files
git add README.md
git commit -m "Update README with new thoughts on course."

Exercise: Attempt to merge readme into main. Identify the merge conflicts by selecting your true feelings for this course. Commit the result.

There will be conflicts in README.md. In both files you should see something like this:

<<<<<<< HEAD
This repo houses test code from the *absolutely miserable* SISMID 2024 course Hitchhikers Guide to reproducibility.
=======
This repo houses test code from the *FANTASTIC* SISMID 2024 course Hitchhikers Guide to reproducibility.
>>>>>>> readme

Remember that we are on the main branch so HEAD is currently point to main. Thus, the text that appears first is that from the main branch; the text on the bottom is from readme.

Express your true feelings by removing the text that you do not want to keep (and the extra symbols). Save the file and commit the result:

git add README.md
git commit -m "Resolve my conflict(ed feelings)"


Pull requests and upstream tracking

This is a longer exercising involving a partner. Designate one partner as User A and the other as User B.

First, User B will:

  • fork User A’s existing repository
  • clone the repository to create a local repository
  • make changes to the local repository
  • push changes to GitHub
  • submit a pull request

Next, User A will:

  • add User B’s repository as a remote
  • fetch User B’s repository
  • merge User B’s changes to their local repository
  • push their changes to GitHub

Then, we will use upstream tracking by User B of User A’s repository. User A will:

  • make changes to their local repository
  • push those changes to GitHub

Finally, User B will:

  • add User A’s repository as an upstream remote
  • fetch User A’s repository
  • merge User A’s changes into their local repository
  • push their changes to GitHub

Specific instructions are given in the subheadings below.

User A: Confirm GitHub repository is accessible

User A should have a GitHub repository created associated with the exercises above. Share your GitHub user name and your GitHub repository name with User A.

User B: Fork and clone the repository

User B should now fork and clone User A’s repository on GitHub using the following steps.

To do this, User B should navigate to https://github.com/<user_a_name>/<user_a_repo> and click “Fork” to create a fork of User A’s repository.

  • replace <user_a_name> with User A’s GitHub user name
  • replace <user_a_repo> with User A’s GitHub repository name

Recall that this creates a copy of User A’s GitHub repository on User B’s GitHub. This repository is now viewable at https://github.com/<user_b_name>/<user_b_repo>.

User B is now going to clone their copy (not User A’s copy!) of the GitHub repository.

To do this, User B should use cd in their terminal to navigate to a directory where they wish to download User A’s repository.

User B should execute

# make sure this is User B's repo
# and NOT User A's!!!
git clone git@github.com:<user_b_name>/<user_b_repo>` 
  • this will clone User B’s fork of the repository
  • be sure to use the git@github.com:<user_b_name>/<user_b_repo> syntax and not https://github.com/<user_b_name>/<user_b_repo> syntax.
    • You can confirm what web address was used to add the remote by executing git remote -v.

If the output of git remote -v shows that you accidentally used https:// syntax in your git clone command then User B should remove the origin remote using git remote remove origin and then re-add the remote using “ssh”-style syntax: git remote add origin git@github.com:<user_b_name>/<user_b_repo>.

User B should confirm that a folder called <user_b_repo> was added to the current working directory of their terminal.

User B: Update the repository and submit a pull request

User B will now make a new branch and make updates to User A’s repo on that branch. User B should complete the following steps:

  • Create and checkout a new branch called feature by executing git checkout -b feature

    • recall that this simultaneously creates and checks out a new branch called feature
    • confirm that User B has switched to the new branch by executing git branch. You should see a star next to feature
  • Modify something about the analysis/final_report.Rmd script. E.g., modify section heading names or the title field in the yaml header.

  • After completing changes, appropriately use git add and git commit to make a new commit along the feature branch.

  • Push the feature branch to GitHub.

git push origin feature
  • Submit a pull request to User A’s repository.
    • the pull request should request that User B’s feature branch be merged into User A’s main branch.

User A: test out pull request code

User A will now fetch the code submitted by User B, test it out, and eventually merge it into their main branch, thereby closing the pull request. To do this User A should

  • Add User B’s repository as a remote named userb.
git remote add userb git@github.com:<user_b_name>/<user_b_repo>`
* replace `<user_b_name>` with *User B*'s GitHub user name
* replace `<user_b_repo>` with *User B*'s GitHub repository name
  • fetch from User B’s repository.
git fetch <user_b_remote_name>
  • Recall that this command downloads the contents of User B’s repository and creates a remote tracking branch called userb/feature.

  • User A should checkout the remote tracking branch to try out the new code

git checkout userb/feature 
  • User A should test out the code on the feature branch.
    • E.g., confirm that the report builds properly with the updated code
  • When User A is satisfied that the code works as expected, they should merge the userb/feature branch into main.
git checkout main
git merge userb/feature
  • User A should push the updated main branch to GitHub.
git push origin main

Both users should now see User B’s pull request as “merged” on GitHub.

User A: New commits

User A will now create new commits on main that User B will need to track in order to keep up with the current state of the repository.

  • User A should add some additional (trivial) code to the code base.

    • If you can’t think of anything interesting to do, just add a comment to an R script.
  • User A should appropriately add and commit this changes.

  • User A should appropriately push these changes to their GitHub repository.

User B update local repository

Now User B wishes to merge the changes made by User A into their main branch, so that their repository stays up to date with User A’s.

  • User B should add a remote called upstream that points to User A’s GitHub repository.
git remote add upstream git@github.com <user_a_name>/<user_a_repo>`

User B* should now fetch from the upstream remote and merge upstream/main into main

  • To download the current contents of User A’s repository (i.e., the upstream remote) to User B’s computer use:
git fetch upstream
  • User B should ensure that they are on their own main branch by checking git branch. If they are not on main, use git checkout main

  • Merge User A’s changes into User B’s main branch.

git merge upstream/main
  • Confirm that User B’s local files on the main branch now look like User A’s files from GitHub.

  • Finally, User B should push their main branch to their GitHub fork to make sure the fork is up to date as well:

git push origin main

If you have time, reverse roles of User A and B and repeat the exercise.