Git提交统计信息_Git - Fatal编程技术网

Git提交统计信息

git

Git提交统计信息,git,Git,我如何“滥用”责备（或某些更适合的函数，和/或与shell命令结合使用）来统计存储库中当前有多少行（代码）来自每个提交者示例输出： Committer 1: 8046 Lines Committer 2: 4378 Lines 这将显示每个作者的提交列表 git ls-tree -r HEAD|sed -re 's/^.{53}//'|while read filename; do file "$filename"; done|grep -E ': .*text'|sed -r -e 's

我如何“滥用”责备（或某些更适合的函数，和/或与shell命令结合使用）来统计存储库中当前有多少行（代码）来自每个提交者

示例输出：

Committer 1: 8046 Lines
Committer 2: 4378 Lines

这将显示每个作者的提交列表

git ls-tree -r HEAD|sed -re 's/^.{53}//'|while read filename; do file "$filename"; done|grep -E ': .*text'|sed -r -e 's/: .*//'|while read filename; do git blame -w "$filename"; done|sed -r -e 's/.*\((.*)[0-9]{4}-[0-9]{2}-[0-9]{2} .*/\1/' -e 's/ +$//'|sort|uniq -c

逐步解释：

列出所有受版本控制的文件

git ls-tree -r HEAD|sed -re 's/^.{53}//'

将列表删减为仅包含文本文件

|while read filename; do file "$filename"; done|grep -E ': .*text'|sed -r -e 's/: .*//'

Git责怪所有的文本文件，忽略了空格的更改

|while read filename; do git blame -w "$filename"; done

拿出作者的名字

|sed -r -e 's/.*\((.*)[0-9]{4}-[0-9]{2}-[0-9]{2} .*/\1/' -e 's/ +$//'

对作者列表进行排序，并让uniq计算连续重复的行数

|sort|uniq -c

示例输出：

   1334 Maneater
   1924 Another guy
  37195 Brian Ruby
   1482 Anna Lambda

Erik的解决方案非常棒，但我在变音符号方面遇到了一些问题（尽管我的

LC.*

环境变量的设置表面上是正确的），并且噪声在代码行中泄漏，而代码行中实际上有日期。我的sed-fu很差，所以我最终得到了这个带有ruby的弗兰肯斯坦片段，但它在200000+LOC上完美地为我工作，并且它对结果进行了排序：

git ls tree-r HEAD | gsed-re's/^.{53}/'|\
读取文件名时；执行文件“$filename”；完成|\
grep-E':.*text'| gsed-r-E's/：.*/'|\
读取文件名时；git会责怪“$filename”；完成|\
ruby-ne'如果${8}\（.*？\s*\d{4}-\d{2}-\d{2}/'则放入$1.strip\
排序| uniq-c |排序-rg

还要注意的是

gsed

而不是

sed

，因为这是二进制的自制安装，使系统sed保持原样。

我写了一个名为gem的程序，可能很有用

安装和使用：

$gem安装git\u fame

$cd/path/to/gitdir

$git fame

输出：

Statistics based on master
Active files: 21
Active lines: 967
Total commits: 109

Note: Files matching MIME type image, binary has been ignored

+----------------+-----+---------+-------+---------------------+
| name           | loc | commits | files | distribution (%)    |
+----------------+-----+---------+-------+---------------------+
| Linus Oleander | 914 | 106     | 21    | 94.5 / 97.2 / 100.0 |
| f1yegor        | 47  | 2       | 7     |  4.9 /  1.8 / 33.3  |
| David Selassie | 6   | 1       | 2     |  0.6 /  0.9 /  9.5  |
+----------------+-----+---------+-------+---------------------+

更新我在路上更新了一些东西

为方便起见，您还可以将其放入自己的命令中：

#!/bin/bash

# save as i.e.: git-authors and set the executable flag
git ls-tree -r -z --name-only HEAD -- $1 | sed 's/^/.\//' | xargs -0 -n1 git blame \
 --line-porcelain HEAD |grep -ae "^author "|sort|uniq -c|sort -nr

将其存储在路径中的某个位置，或者修改路径并像这样使用它

git作者'*/*.c'#查找所有以.c结尾的递归文件

git作者'*/*.[ch]'#查找所有以.c或.h结尾的递归文件

git authors'Makefile'#只需计算Makefile中的作者行数

原始答案虽然被接受的答案起作用，但速度非常慢

$ git ls-tree --name-only -z -r HEAD|egrep -z -Z -E '\.(cc|h|cpp|hpp|c|txt)$' \
  |xargs -0 -n1 git blame --line-porcelain|grep "^author "|sort|uniq -c|sort -nr

几乎是瞬间的

要获取当前跟踪的文件列表，可以使用

git ls-tree --name-only -r HEAD

此解决方案避免调用

file

来确定文件类型，并使用grep来匹配所需的扩展名，以提高性能。如果应该包括所有文件，只需将其从行中删除即可

grep -E '\.(cc|h|cpp|hpp|c)$' # for C/C++ files
grep -E '\.py$'               # for Python files

如果文件中可能包含空格，而空格对Shell不好，则可以使用：

git ls-tree -z --name-only -r HEAD | egrep -Z -z '\.py'|xargs -0 ... # passes newlines as '\0'

给出一个文件列表（通过管道），可以使用xargs调用命令并分发参数。允许处理多个文件的命令会限制

-n1

。在这种情况下，我们调用

git-dull--line-circular

，每次调用只使用一个参数

xargs -n1 git blame --line-porcelain

然后，我们过滤“author”出现的输出对列表进行排序，并按以下方式计算重复行数：

grep "^author "|sort|uniq -c|sort -nr

注其他答案实际上过滤掉了只包含空格的行

grep -Pzo "author [^\n]*\n([^\n]*\n){10}[\w]*[^\w]"|grep "author "

上面的命令将打印至少包含一个非空白字符的行的作者。您还可以使用match

\w*[^\w#]

，这也将排除第一个非空白字符不是

的行（许多脚本语言中的注释）.

查看可从

获得的gitstats命令这里是@Alex的答案中的主要片段，它实际上执行了聚合责任线的操作。我将其缩减为操作单个文件而不是一组文件

git blame --line-porcelain path/to/file.txt | grep  "^author " | sort | uniq -c | sort -nr

我之所以在这里发表这篇文章，是因为我经常回到这个答案上来，重新阅读这篇文章，重新消化示例，以提取我认为它很难理解的部分。对于我的用例来说，它也不够通用；它的范围是整个C项目

我喜欢列出每个文件的统计信息，通过bash

for

迭代器而不是

xargs

来实现，因为我发现xargs可读性较差，难以使用/记忆，所以其优点/缺点应该在别处讨论

下面是一个实用的片段，它将分别显示每个文件的结果：

for file in $(git ls-files); do \
    echo $file; \
    git blame --line-porcelain $file \
        | grep  "^author " | sort | uniq -c | sort -nr; \
    echo; \
done

我测试过，在bash shell中运行stright是ctrl+c安全的，如果需要将其放入bash脚本中，如果希望用户能够中断for循环，则可能需要这样做。

我有一个解决方案，可以计算所有文本文件（不包括二进制文件，甚至是版本文件）中的错误行数：

git summary

软件包提供的内容正是您所需要的。请在以下位置查看文档：

给出如下所示的输出：

project  : TestProject
lines    : 13397
authors  :
8927 John Doe            66.6%
4447 Jane Smith          33.2%
  23 Not Committed Yet   0.2%

制作了我自己的脚本，它是@nilbus和@Alex的组合

#!/bin/sh

for f in $(git ls-tree -r  --name-only HEAD --);
do
    j=$(file "$f" | grep -E ': .*text'| sed -r -e 's/: .*//');
    if [ "$f" != "$j" ]; then
        continue;
    fi
    git blame -w --line-porcelain HEAD "$f" | grep  "^author " | sed 's/author //'`enter code here`
done | sort | uniq -c | sort -nr

Bash函数，目标是在MacOS上运行的单个源文件

函数glac{
#git\u行\u作者\u计数
git BURN-w“$1”| sed-E”s/*\（.*）+[0-9]{4}-[0-9]{2}.*/\1/g“| sort | uniq-c | sort-nr
}

我采用了Powershell：

（git ls tree-rz-仅名称头）。拆分（0x00）{$\u-Match.*\.py'}}%{git gun-w-线瓷头$\u124;选择字符串-模式“^author”|组对象|选择对象-属性计数，名称|排序对象-属性计数-降序

它是可选的，取决于是否使用

-w

开关运行

git

，我添加它是因为它忽略了空格更改

虽然Bash解决方案在

下运行，但我的机器上的性能有利于Powershell（对于同一个repo，~50s vs~65s），这可以在repo的源结构的任何目录下运行，以防您想要检查某个源模块

find . -name '*.c' | xargs -n1 git blame --line-porcelain | grep "^author "|sort|uniq -c|sort -nr

这将返回每个作者的提交数，而不是行数。这对于确定项目/目录/文件的主要贡献者非常有用。如果我有不同的

sed

版本，我不理解<

project  : TestProject
lines    : 13397
authors  :
8927 John Doe            66.6%
4447 Jane Smith          33.2%
  23 Not Committed Yet   0.2%

#!/bin/sh

for f in $(git ls-tree -r  --name-only HEAD --);
do
    j=$(file "$f" | grep -E ': .*text'| sed -r -e 's/: .*//');
    if [ "$f" != "$j" ]; then
        continue;
    fi
    git blame -w --line-porcelain HEAD "$f" | grep  "^author " | sed 's/author //'`enter code here`
done | sort | uniq -c | sort -nr

find . -name '*.c' | xargs -n1 git blame --line-porcelain | grep "^author "|sort|uniq -c|sort -nr