Git: Basics

Osaamistavoitteet

In the first git part you will strengthen your earlier knowledge on version control. You will learn to explain the role of version control in modern software development. In this part you will revise the basic use of version control in one programmer projects: how new commits are created, how the contents of a commit are chosen, how to investigate version history and how the local repository is synced with the remote repository.

Version Control

In software development the role of versioning is twofold: The need to accumulate a project history over the development versions and to rollback changes. Versioning allows developers to experiment without affecting the rest of the project.

In programming versioning can be used for example to:

  • Make changes to a program but still have a working version of it

  • Share and sync program code between many developers

  • Refactor code and still have the original as a backup

  • Compare performance between two versions

Version control system is a tool to manage different versions. Its tasks are:

  • Creating version keys

  • Storing meta data (who changed and what)

  • Switching between versions

  • Rollback to a specific version of a file or files

  • Implementation of different work branches

Version Control in Software Development (duration 17:42)

Git

Git is widely used in the software industry. With knowledge on how to use Git-osaamisella another widely used version control tool Mercurial is quite straightforward to use albeit there are differencies between the two. Git is a distributed version control system. In distributed version control (DVCS) you’ll clone the remote repository to your computer. Cloning the repository means that you’re creating a local copy of the repository. There are several local repositories in use simultaneously. This local copy contains all of the metadata in the repository. The local copies are synced together by pushing changes to a remote repository.

The other type of version control is centralized e.g. Subversion. This means that all users are connected to the same server which contains the project. When you’re making a change you directly commit it to the repository. Other users can then pull your changes to their workstations but all share the one centralized repository.

How does Git make versions?

Versions in Git are based on the contents of the worktree. Worktree consists of files and directories that start from the project’s root-directory with the exception of .git-directory which contains the data of the repository. A version in Git is known as commit. A commit stores a whole version of a file only on the first save or if the file has been heavily modified. Later commits only store line changes that have been made since the previous commmit. This allows reconstruction of the worktree by simply following the chain of commits.

In practice the situation is a bit more complec.

  1. The entire contents of the worktree are not stored, only the parts selected.

  2. A copy of the directory is not stored in the commits, only the changes made to it since the last commit.

  3. Metadata and a reference to an earlier commit are added to the stored changes.

  4. When fetching a version the state of the worktree is created by building the files based on the history chain.

Other noteworthy things:

  • Everything in the repository is stored in a directory called .git -hakemistoon, which typically is located at the roor of the worktree

  • The contents of the .git is what is moved between repositories

  • All changed are stored and additions and deletions of character strings

Note!

Worktree is the directory versioned. All files stored under the repository belong into the worktree. Typically all files are a part of the worktree except .git.

Commit

Commit is a version of the contents of the repository. It consists of the following data:

Who made the commit

User specified name and email

Timestamp

Date and time when the commit is created.

Message

User’s description of it’s contents

Reference to previous commit

Reference to a version of worktree which is modified in current commit.

Blob- and tree-objects

What changes were done to files (blob) and where are the files located (tree).

SHA-1 hash

The unique identifier of the commit that can be referred to.

Commit’s creation consists of two phases:

  1. Selecting changes to be stored in commit

  2. Finalizing the commit and adding a description for it

Commits are prepared in staging area known also known as index. The things added must be working as intended. In a good commit, the changes form a small and easily describable set.

Using Git

Git has more functionalities than commands. Commands are overloaded which means that their functions can be altered with different parameters.

GUIs separate these functionalities to own UI-elements, but command line uses switches.

Here are some examples:

  • git add <path> adds paths content to index

  • git add --all adds all content to index

  • git add -p <path> starts interactive addition to index based on path’s content

Git has many security-feature that may block the user from using some of the commands. These features are in place in order to protect version history and local changes from accidental destruction. Never use –force to go aroung these safety features.

For example: You can’t revert to an older commit if you have unsaved changes in the worktree. However you can specify to return certain version of a file even though there are unsaved changes to it.

The main difference between these two cases is that in the latter one user specifies that he wants to change the modified file to something else. In the first one user can’t be sure what files would be overwritten so the change to modified file might be accidental.

Note!

If you can’t use some command in Git, always figure out why. Don’t try to go around the safety mechanisms unless you’re absolutely sure what it would destroy and you want to do it.

Note!

Files have four main states commited, modified, staged and untracked:

* Committed - The files current state is stored in previous commit.
* Modified - The file has changes that aren't stored
* Staged - The file has changes and the changes are staged to index.
* Untracked - The file is created but it's not versioned or staged.

Git might prevent some commands based on the main state.

Data preservation in Git

Almost evey operation in Git adds data to version history. If you remove a file, Git still stores the previous versions of it and only marks that the file isn’t required in the current version of a worktree. This means that once you store something in Git, it stays there.

It is possible to remove all traces of the file in Git but this requires rewriting the history starting from the version it was first created. DO NOT DO THIS LIGHTLY This isn’t an easy task and it is prone to mistakes. There are even situation where rewriting is practically impossible. So be sure you never store anything unwanted in Git like passwords, private keys or sensitive information. For this reason it is important to have the .gitignore file of the repository in place The role of .gitignore is to make sure that the files commonly in the working tree but totally irrelevant from the point of view of the project do not end up versioned.

Examples to .gitignore:

build-*

Filters out all paths that start with “build-”

*.o

Filters out all paths that end with “.o”

local_configurations.py

Filters out a file called local_configurations.py

html/

Filters out html folder

!build*.py

Allows all paths starting with “build” and end with “.py” if these paths were filtered out with previous rules

Usually you’ll create a single .gitignore to project root folder. You’ll add all your rules to that single file. You’ll commit this file to your repository to be shared with other developers.

What should or shouldn’t be stored in Git?

Git handles files in text-format and is only interested in line changes. This means that text-based files - like source-code - can be easily tracked. More complex files like pictures or videos are problematic. Even the smallest changes may seem like a completely different file to Git. This means that Git stores the whole file again which takes more space. This can quickly bloat a simple project to be +100MBs.

Neither should you store any files that are generated directly from the stored files. If they’re generated you are basically storing the same information twice. This forces you to maintain both versions. For example binary-, moc- and object files are generated from the source-code.

It’s even possible that generated files are in conflict with other systems and can make version control a nightmare.

There are some special exceptions where there needs to be saved version from a generated file. (For example release documentation or patch-files)

What you should store:

  • Working source-code

  • Platform independent configurations and settings

  • Documentation

  • Required static material

What you shouldn’t store:

  • Broken source-code

  • Generated content

  • Passwords and keys

  • Sensitive information

  • Videos and graphics*

* Exception with small pictures that are required in the project. You should use some other system to distribute large and non-text-based files

Git Commands for Basic Use

Important note: HEAD is a pointer that points to a commit to which your worktree is based upon.

  • You add changes to index with git add. You give file paths to add to index as parameters.

    • git add <path> Adds changes from path to index

  • You can also add file removals to index using git rm.

    • git rm <path> Removes path from worktree and adds the removal to index

  • After you’ve added all of the necessary changes to the index you can finalize them with git commit.

    • git commit Creates a new commit from index. Opens up a text editor for writing a commit message

    • git commit -m “<message>” Creates a new commit from index. Uses given message as commit message

  • You can recover files and versions with git checkout. Usually it is used to switch branches.

    • git checkout HEAD – <path> Recovers HEAD version from path

    • git checkout <commit> – <path> Recovers a path from a commit and adds it to the index

    • git checkout <commmit> Moves HEAD to point to a commit and updates worktree. After this operation Git is in detached HEAD mode.

    • git checkout <branch> Moves HEAD to point to a branch and updates worktree. In basic use you need to only remember git checkout main

  • The most straighforward way to get changes from the remore reporitory into the local one is using pull

    • git pull <remote repository> If the branch that your HEAD points to a branch which tracks remote repository git will pull changes from remote repository and merge the changes. git pull origin

    • git pull <remote repository> <branch>

    Git will pull changes for a branch from remote repository and merge those changes. git pull origin main

  • You can update remote repository with git push. The command will move changes from staging to the remote repository. This enables other developers to use your changes.

    • git push <remote repository> Push the branch which HEAD points to to remote repository. git push origin

    • git push <remote repository> <branch> Push branch to remote repository. git push origin main

Demo on Git use (duration 12:57)

Version Control on Your Own Computer

The examples given on the course are done on the command line. There are several grounds for this even though there are different graphical alternatives available. Such include IDEs either directly or through a plugin and separate graphical Git tools such as TortoiseGit and GitGUI. The main reason for using Git on the command line is that Git is a command line tool. All GUI alternatives still use the same commands that are written from the command line under the hood. So if you know how to use Git on command line, you understand the principles behind its functionality best and can most likely use it on any other interface. All in all it is worth noting that learning to use Git from command line:

  • Helps you understand Git’s functionality best

  • Allows creation of own tools and scripts that work with Git

  • Allows you to solve more difficult problems

  • Is mandatory for many roles in IT

In order to use Git on your owen computer you’ll need Git itself of course. Instructions on how to install Git on different platforms can be found in the Git book:Pro Git. The book is all in all a good source for how to use Git. To Windows machines we recommend Git for Windows or installing Powershell in addition to Git.

Command line

Generic GUI

- Hard for beginners

+ Easier for those who haven’t used command line before

+ Problems are often easier to solve, because internet is full of instructions for command line usage

- It’s harder to figure out what the commands are in GUI

+ You can find all the functions in Git’s manual directly with commands

+ GUI’s usually give better feedback on what you’re doing

+ Very fast to use once you learn the routine

* Navigating for commands might be slow, but some operations are easier.

- Going through history is more difficult

+ Going through history is easy and fast