Search This Blog

Thursday, February 17, 2022

Git Internals

Git is a content-addressable filesystem where objects are stored as simple key-values, which means you can pass any kind of content into Git and you will get a key that will allow you to retrieve the content.

The key of the object is going to be a 40-character SHA-1 checksum hash (It can be simulated by git hash-object command).

Git objects can be of the below 3 types:

1 - Blob - Zlib Compressed Header and File Content

2 - Tree -  One or more entries, each of which is the SHA-1 hash of a blob or subtree with its associated mode, type, and filename. 

4 - Commit - Top Level Tree, Parent Commit, Author and Commit Description

Git stores the objects inside .git/objects within subdirectories which is named with the first 2 characters of the SHA-1, and the filename is the remaining 38 characters.

git cat-file command is sort of a Swiss army knife for inspecting Git objects.

git cat-file -t [hash] gives the type of the object.

git cat-file -t [hash] will print the content of the object.

While working with git [hash] can be given as only first n chars (depending upon the size of the project) only rather than copying the whole 40 chars.

"Generally, eight to ten characters are more than enough to be unique within a project. One of the largest Git projects, the Linux kernel, is beginning to need 12 characters out of the possible 40 to stay unique." - Source

Source: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects