Git: Basics

Learning outcomes

In the first git part you will strengthen your earlier knowledge of version control. You will learn to explain the role of version control in modern software development. In this part you will revise the basic use of version control in single programmer projects: how new commits are created, how the contents of a commit are chosen, how to investigate version history and how the local repository is synced with the remote repository.

Version Control

In software development the role of versioning is twofold: The need to accumulate a project history over the development versions and to rollback changes. Versioning allows developers to experiment without affecting the rest of the project.

In programming versioning can be used for example to:

  • Make changes to a program but still have a working version of it

  • Share and sync program code between many developers

  • Refactor code and still have the original as a backup

  • Compare performance between two versions

Version control system is a tool to manage different versions. Its tasks are:

  • Creating version keys

  • Storing meta data (who changed and what)

  • Switching between versions

  • Rollback to a specific version of a file or files

  • The implementation of different work branches

Version Control in Software Development (duration 17:42)

Git

Git is widely used in the software industry. Another widely used version control tool is Mercurial, but with the knowledge of how to use Git Mercurial is quite straightforward to learn, albeit there are differencies between the two. Git is a distributed version control system. In a distributed version control system (DVCS) you clone a remote repository to your computer. Cloning the repository means that you’re creating a local copy of the repository. Often there are several local repositories in use simultaneously, possibly by different people. These local copies contain all of the metadata in the repository, and they are synced together by pushing changes to the remote repository.

The other type of version control is centralized, e.g. Subversion. This means that all users are connected to the same server which contains the project. When you’re making a change in a centralized system you directly commit it to the repository. Other users can then pull your changes to their workstations but all share the one centralized repository, there are no local copies.

How does Git make versions?

Versions in Git are based on the contents of the worktree. The worktree consists of files and directories that start from the project’s root-directory with the exception of the .git-directory which contains the data of the repository. A version in Git is known as a commit. A commit stores the whole file only the first time it’s added or if the file has been heavily modified. Later commits only store line changes that have been made since the previous commit. This allows the reconstruction of the worktree by simply following the chain of commits.

In practice the situation is a bit more complex.

  1. The entire contents of the worktree are not stored, only the parts selected.

  2. A copy of the directory is not stored in the commits, rather only the changes made since the last commit.

  3. Some metadata and a reference to the previous commit are added to the stored changes.

  4. When fetching a version the state of the worktree is recreated by rebuilding the files based on the history chain.

Other noteworthy things:

  • Everything in the repository is stored in a directory called .git, which typically is located at the root of the worktree.

  • The contents of the .git directory are what is moved between repositories.

  • All changes are stored as additions and deletions of character strings

Note!

Worktree is the directory being versioned. All files stored under the repository belong into the worktree. Typically all files are a part of the worktree except .git.

Commit

A commit is a version of the contents of the repository. It consists of the following data:

Who made the commit

User specified name and email

Timestamp

Date and time when the commit was created.

Message

User’s description of its contents

Reference to the previous commit

Reference to a version of the worktree which is modified in the current commit.

Blob- and tree-objects

What changes were done to files (blob) and where the files are located (tree).

SHA-1 hash

The unique identifier of the commit that can be referred to.

The commit’s creation consists of two phases:

  1. Selecting changes to be stored in the commit

  2. Finalizing the commit and adding a description

Commits are prepared in a staging area known also known as the index. The things added should be working as intended. In a good commit the changes form a small and an easily describable set.

Using Git

Git has more functions than commands. The commands are overloaded which means that their function can be altered with different parameters.

GUIs separate these functions to their own UI-elements, but the command line uses switches.

Here are some examples:

  • git add <path> adds the contents of path to index

  • git add --all adds all content to index

  • git add -p <path> starts an interactive addition to index based on the contents of path

Git has many security features that may block the user from using some of the commands. These features are in place in order to protect the version history and local changes from accidental destruction. Never use –force to go around these safety features.

For example: You can’t revert to an older commit if you have unsaved changes in the worktree. However, you can specify to return a certain version of a file even though there are unsaved changes to it.

The main difference between these two cases is that in the latter one the user specifies that he wants to change the modified file to something else. In the former one the user can’t be sure what files would be overwritten so the change to the modified file might be accidental.

Note!

If you can’t use some command in Git, always figure out why. Don’t try to go around the safety mechanisms unless you’re absolutely sure what it would destroy and you want to do it.

Note!

Files have four main states: committed, modified, staged and untracked:

* Committed - The current state of the file is stored in a previous commit.
* Modified - The file has unstored changes
* Staged - The file has changes and the changes are staged to index.
* Untracked - The file has been created but it's not versioned or staged.

Git might prevent executing some commands based on the main state of the file(s).

Data preservation in Git

Almost every operation in Git adds data to the version history. If you remove a file, Git still stores the previous versions of it and only marks that the file isn’t present in the current version of the worktree. This means that once you store something in Git, it stays there.

It is possible to remove all traces of a file in Git but this requires rewriting the history starting from the version the file was first created. DO NOT DO THIS LIGHTLY! This isn’t an easy task and it is prone to mistakes. There are even situations where rewriting is practically impossible, so be sure to never store anything unwanted in Git like passwords, private keys or other sensitive information, especially if the repository is open to the public. To prevent unintentionally adding files to the version history it is important to have a properly configured .gitignore file in your repository. The role of .gitignore is to make sure that files commonly found in the working tree but totally irrelevant from the point of view of the project do not end up versioned.

Examples of things to add to .gitignore:

build-*

Filters out all paths that start with “build-”

*.o

Filters out all paths that end with “.o”

local_configurations.py

Filters out a file called local_configurations.py

html/

Filters out the folder called html

!build*.py

Allows all paths starting with “build” and ending with “.py” if these paths were filtered out with previous rules

Usually you’ll create a single .gitignore to the project root folder, and add all your rules to this single file. The file should be committed to your repository to be shared with other developers.

What should or shouldn’t be stored in Git?

Git handles files in text-format and is only interested in line changes. This means that text-based files - like source-code - can be easily tracked. More complex files like pictures or videos are problematic. Even the smallest changes may seem like a completely different file to Git. This means that Git stores the whole file again which takes more space. This can quickly bloat a simple project to be +100MBs.

Neither should you store any files that are generated directly from the stored files. If they’re generated you are basically storing the same information twice. This forces you to maintain both versions. For example binary-, moc- and object files are generated from the source-code.

It’s even possible that generated files are in conflict with other systems and can make version control a nightmare.

There are some special exceptions which make it necessary to store a version of a generated file. (For example release documentation or patch-files)

What you should store:

  • Working source-code

  • Platform independent configurations and settings

  • Documentation

  • Required static material

What you shouldn’t store:

  • Broken source-code

  • Generated content

  • Passwords and keys

  • Sensitive information

  • Videos and graphics*

* Except in the case of small pictures that are required in the project. You should use some other system to distribute large and non-text-based files

In the context of this course it is sometimes necessary to push non-working code into the repository when asking for help with the coding exercises.

Git Commands for Basic Use

Important note: HEAD is a pointer that points to the commit which your current worktree is based upon.

  • You add changes to the index with git add. The file paths which will be added are given as parameters.

    • git add <path> Adds changes from path to the index

  • You can also add file removals to the index using git rm.

    • git rm <path> Removes path from the worktree and adds the removal to the index

  • After you’ve added all of the necessary changes to the index you can finalize them with git commit.

    • git commit Creates a new commit from the index. Opens up a text editor for writing a commit message

    • git commit -m “<message>” Creates a new commit from the index. Uses the given message as the commit message

  • You can recover files and versions with git checkout. It is also used to switch branches.

    • git checkout HEAD – <path> Recovers the version of path stored in HEAD

    • git checkout <commit> – <path> Recovers path from a commit and adds it to the index

    • git checkout <commmit> Moves HEAD to point to commit and updates the worktree. After this operation Git is in the detached HEAD mode, and you need to either checkout a branch (or create a new one) to make new commits

    • git checkout <branch> Moves HEAD to point to branch and updates the worktree. In basic use you need to only remember git checkout main

  • The most straighforward way to fetch changes from the remote reporitory into the local one is using pull

    • git pull <remote repository> If the branch that your HEAD points tracks a branch in the remote repository git will pull changes from the remote repository and merge the changes. git pull origin

    • git pull <remote repository> <branch> Git will pull changes from a branch in the remote repository and merge those changes. git pull origin main

  • You can update the remote repository with git push. The command will move changes from your local repository to the remote repository. This enables other developers to use your changes.

    • git push <remote repository> Push the branch which HEAD points to to the remote repository. git push origin

    • git push <remote repository> <branch> Push branch to the remote repository. git push origin main

Demo on Git use (duration 12:57)

Version Control on Your Own Computer

The examples given on the course are done on the command line. There are several reasons for this even though there are different graphical alternatives available, including IDEs either directly or through a plugin and separate graphical Git tools such as TortoiseGit and GitGUI. The main reason for using Git on the command line is that Git is a command line tool. Under the hood, all GUI alternatives still use the same commands that are written on the command line. If you know how to use Git on the command line, you understand the principles behind its functionality best and can most likely use it anywhere. All in all, learning to use Git from the command line:

  • Helps you understand Git’s functionality better

  • Allows the creation of your own tools and scripts that work with Git

  • Allows you to solve more difficult problems

  • Is mandatory for many roles in IT

In order to use Git on your own computer you’ll need Git itself of course. Instructions on how to install Git on different platforms can be found in the Git book:Pro Git. The book is in general a good reference on how to use Git. For Windows machines we recommend Git for Windows or installing PowerShell in addition to Git.

Command line

Generic GUI

- Hard for beginners

+ Easier for those who haven’t used the command line before

+ Problems are often easier to solve, because the internet is full of instructions for command line usage

- It’s harder to figure out what the commands are in a GUI

+ You can find all the functions in Git’s manual directly with commands

+ GUI’s usually give better feedback on what you’re doing

+ Very fast to use once you learn the routine

* Navigating for commands might be slow, but some operations are easier.

- Going through the history is more difficult

+ Going through the history is easy and fast

Posting submission...