「Git 内核」实现 HEAD

「这是我参与11月更文挑战的第 12 天，活动详情查看：2021最后一次更文挑战」

实现 HEAD 查询

现在我们知道了 .git/HEAD 和 .git/refs/heads 下的文件如何协同工作。开始着手实现它，首先我们定义一些相关类型：

rust复制代码const HASH_BYTES: usize = 20;

// A (commit) hash is a 20-byte identifier.
// We will see that git also gives hashes to other things.
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)]
struct Hash([u8; HASH_BYTES]);

// The head is either at a specific commit or a named branch
enum Head {
  Commit(Hash),
  Branch(String),
}

接下来，我们将希望能够在40个字符的十六进制表示和紧凑的20字节表示之间来回转换哈希值。

rust复制代码use std::fmt::{self, Display, Formatter};
use std::io::Error;
use std::str::FromStr;

impl FromStr for Hash {
  type Err = Error;

  fn from_str(hex_hash: &str) -> io::Result<Self> {
    // Parse a hexadecimal string like "af64eba00e3cfccc058403c4a110bb49b938af2f"
    // into  [0xaf, 0x64, ..., 0x2f]. Returns an error if the string is invalid.
    // ...
  }
}

impl Display for Hash {
  fn fmt(&self, f: &mut Formatter) -> fmt::Result {
    // Turn the hash back into a hexadecimal string
    for byte in self.0 {
      write!(f, "{:02x}", byte)?;
    }
    Ok(())
  }
}

现在我们可以编写核心逻辑：读取.git/HEAD文件并确定其对应的提交哈希值。

rust复制代码fn get_head() -> io::Result<Head> {
  use Head::*;

  let hash_contents = fs::read_to_string(HEAD_FILE)?;
  // Remove trailing newline
  let hash_contents = hash_contents.trim_end();
  // If .git/HEAD starts with `ref: refs/heads/`, it's a branch name.
  // Otherwise, it should be a commit hash.
  Ok(match hash_contents.strip_prefix(REF_PREFIX) {
    Some(branch) => Branch(branch.to_string()),
    _ => {
      let hash = Hash::from_str(hash_contents)?;
      Commit(hash)
    }
  })
}

impl Head {
  fn get_hash(&self) -> io::Result<Hash> {
    use Head::*;

    match self {
      Commit(hash) => Ok(*hash),
      Branch(branch) => {
        // Copied from get_branch_head()
        let ref_file = Path::new(BRANCH_REFS_DIRECTORY).join(branch);
        let hash_contents = fs::read_to_string(ref_file)?;
        Hash::from_str(hash_contents.trim_end())
      }
    }
  }
}

fn main() -> io::Result<()> {
  let head = get_head()?;
  let head_hash = head.get_hash()?;
  println!("Head hash: {}", head_hash);
  Ok(())
}

现在，无论我们查看主分支还是直接查看提交哈希值，都会打印出来。

1	rust复制代码Head hash: af64eba00e3cfccc058403c4a110bb49b938af2f

我们已经成功确定了当前提交的哈希值。现在，我们该如何找出该提交所存储的信息呢？

提交中有什么？

当你在GitHub这样的网页界面或通过 git show 这样的命令查看一个提交时，你会看到该提交带来的变化(“diff”)。

所以你可能会认为，git 会把每个提交作为一个 diff 来存储。也有可能像备份一样存储每个提交，里面包含该提交的每个文件的内容。

这两种方法其实都可以：你可以从两个文件的副本中计算出一个差异，你也可以通过按顺序应用每个差异(从空的版本库或从最近的提交开始)来计算文件的内容。使用哪种方法取决于你要优化的内容。

基于diff的方法会占用较少的存储空间；它最大限度地减少了重复的信息量，因为它只存储变化的内容。然而，存储内容使得在某个特定的提交中检查代码的速度更快，因为我们不需要应用潜在的成千上万的差异。(这也使得 git clone --depth 1 的实现变得容易，它通过只下载最近的提交来加快克隆的速度)。

而且，如果改动不大，从两个提交的内容中计算差异也不会太费时：差异算法相当快，而且git可以自动跳过没有改动的目录/文件，这个我们后面会看到。

由于这些原因，git采用了 “存储每个文件的内容” 的方法。 git的实现设法只存储一个相同文件的副本，这比天真的解决方案节省了大量的存储空间。

本文转载自: 掘金

开发者博客 – 和开发相关的这里全都有