Python实现C代码统计工具(一)

标签： Python 代码统计

Python实现C代码统计工具(一)
声明
一. 问题提出
二. 代码实现
三. 效果验证
四. 后记

声明

本文将基于Python2.7脚本语言，实现一个简易的C代码统计工具。

本文同时也发布于作业部落，视觉效果略有不同。

一. 问题提出

代码规模较大时，不易对其做出准确的评估。通过代码统计工具，可自动分析和统计软件项目中的文件行数、有效代码行数、注释行数及空白行数，提供准确而直观的代码量报告。基于这种定量报告，可以有针对性地提升代码质量。例如，分拆组合以消除巨型文件，对注释率过低的文件增加注释信息，等等。

为简单起见，本文仅仅统计C语言代码，即后缀名为c或h的文件。并且，约定以下几条统计原则：

当代码和注释位于同一行时，代码行数和注释行数均会加1。考虑到行注释的普遍性，因此代码行数、注释行数和空白行数的总和通常大于文件总行数。
块注释中间和空白行计入空白行数。
"#if 0...endif"块视为代码行。

二. 代码实现

首先，定义两个存储统计结果的列表：

rawCountInfo = [0, 0, 0, 0, 0]

detailCountInfo = []

其中，rawCountInfo存储粗略的文件总行数信息，列表元素依次为文件行、代码行、注释行和空白行的总数，以及文件数目。detailCountInfo存储详细的统计信息，包括单个文件的行数信息和文件名，以及所有文件的行数总和。这是一个多维列表，存储内容示例如下：

[['line.c', [33, 19, 15, 4]], ['test.c', [44, 34, 3, 7]]]

以下将给出具体的实现代码。为避免大段粘贴代码，以函数为片段简要描述。

def CalcLines(lineList):

    lineNo, totalLines = 0, len(lineList)

    codeLines, commentLines, emptyLines = 0, 0, 0

    while lineNo < len(lineList):

        if lineList[lineNo].isspace():  #空行

            emptyLines += 1; lineNo += 1; continue

        regMatch = re.match('^([^/]*)/(/|\*)+(.*)$', lineList[lineNo].strip())

        if regMatch != None:  #注释行

            commentLines += 1

            #代码&注释混合行

            if regMatch.group(1) != '':

                codeLines += 1

            elif regMatch.group(2) == '*' \

                and re.match('^.*\*/.+$', regMatch.group(3)) != None:

                codeLines += 1

            #行注释或单行块注释

            if '/*' not in lineList[lineNo] or '*/' in lineList[lineNo]:

                lineNo += 1; continue

            #跨行块注释

            lineNo += 1

            while '*/' not in lineList[lineNo]:

                if lineList[lineNo].isspace():

                    emptyLines += 1

                else:

                    commentLines += 1

                lineNo = lineNo + 1; continue

            commentLines += 1  #'*/'所在行

        else:  #代码行

            codeLines += 1

        lineNo += 1; continue

    return [totalLines, codeLines, commentLines, emptyLines]

CalcLines()函数基于C语法判断文件行属性，按代码、注释或空行分别统计。参数lineList由readlines()读取文件得到，读到的每行末尾均含换行符。strip()可剔除字符串首尾的空白字符(包括换行符)。当通过print输出文件行内容时，可采用如下两种写法剔除多余的换行符：

print '%s' %(line), #注意行末逗号

print '%s' %(line.strip())

行尾包含换行符的问题也存在于readline()和read()调用，包括for line in file的语法。对于read()调用，可在读取文件后split('\n')得到不带换行符的行列表。注意，调用readlines()和read()时，会读入整个文件，文件位置指示器将指向文件尾端。此后再调用时，必须先通过file.seek(0)方法返回文件开头，否则读取的内容为空。

def CountFileLines(filePath, isRaw=True):

    fileExt = os.path.splitext(filePath)

    if fileExt[1] != '.c' and fileExt[1] != '.h': #识别C文件

        return

    try:

        fileObj = open(filePath, 'r')

    except IOError:

        print 'Cannot open file (%s) for reading!', filePath

    else:

        lineList = fileObj.readlines()

        fileObj.close()

    if isRaw:

        global rawCountInfo

        rawCountInfo[:-1] = [x+y for x,y in zip(rawCountInfo[:-1], CalcLines(lineList))]

        rawCountInfo[-1] += 1

    else:

        detailCountInfo.append([filePath, CalcLines(lineList)])

CountFileLines()统计单个文件的行数信息，其参数isRaw指示统计报告是粗略还是详细的。对于详细报告，需要向detailCountInfo不断附加单个文件的统计结果；而对于详细报告，只需要保证rawCountInfo的元素值正确累加即可。

def ReportCounterInfo(isRaw=True):

    #Python2.5版本引入条件表达式(if-else)实现三目运算符，低版本可采用and-or的短路特性

    #print 'FileLines  CodeLines  CommentLines  EmptyLines  %s' %('' if isRaw else 'FileName')

    print 'FileLines  CodeLines  CommentLines  EmptyLines  %s' %(not isRaw and 'FileName' or '')

    if isRaw:

       print '%-11d%-11d%-14d%-12d<Total:%d Files>' %(rawCountInfo[0], rawCountInfo[1],\

             rawCountInfo[2], rawCountInfo[3], rawCountInfo[4])

       return

    total = [0, 0, 0, 0]

    #对detailCountInfo按第一列元素(文件名)排序，以提高输出可读性

    #import operator; detailCountInfo.sort(key=operator.itemgetter(0))

    detailCountInfo.sort(key=lambda x:x[0]) #简洁灵活，但不如operator高效

    for item in detailCountInfo:

        print '%-11d%-11d%-14d%-12d%s' %(item[1][0], item[1][1], item[1][2], item[1][3], item[0])

        total[0] += item[1][0]; total[1] += item[1][1]

        total[2] += item[1][2]; total[3] += item[1][3]

    print '%-11d%-11d%-14d%-12d<Total:%d Files>' %(total[0], total[1], total[2], total[3], len(detailCountInfo))

ReportCounterInfo()输出统计报告。注意，详细报告输出前，先按文件名排序。

def CountDirLines(dirPath, isRawReport=True):

    if not os.path.exists(dirPath):

        print dirPath + ' is non-existent!'

        return

    if not os.path.isdir(dirPath):

        print dirPath + ' is not a directory!'

        return

    for root, dirs, files in os.walk(dirPath):

        for file in files:

            CountFileLines(os.path.join(root, file), isRawReport)

    ReportCounterInfo(isRawReport)

CountDirLines()统计当前目录及其子目录下所有文件的行数信息，并输出统计报告。注意，os.walk()不一定按字母顺序遍历文件。在作者的Windows XP主机上，os.walk()按文件名顺序遍历；而在Linux Redhat主机上，os.walk()以"乱序"遍历。

最后，添加简单的命令行处理：

if __name__ == '__main__':

    DIR_PATH = r'E:\PyTest\lctest'

    if len(sys.argv) == 1: #脚本名

        CountDirLines(DIR_PATH)

        sys.exit()

    if len(sys.argv) >= 2:

        if int(sys.argv[1]):

            CountDirLines(DIR_PATH, False)

        else:

            CountDirLines(DIR_PATH)

        sys.exit()

三. 效果验证

为验证上节的代码实现，建立lctest调试目录。该目录下包含line.c及和《为C函数自动添加跟踪语句》一文中的test.c文件。其中，line.c内容如下：

#include <stdio.h>

 /* {{{ comment */

/***********

  Multiline

  Comment

***********/

int test(int a/*comment*/, int b)

{

    int a2; int b2;  //comment

    a2 = 1;

    b2 = 2;

}

/* {{{ test3 */

int test3(int a,

          int b) /*test2 has been deleted,

so this is test3. */

{int a3 = 1; int b3 = 2;

    if(a3)

    {/*comment*/

        a3 = 0;

    }

//comment

    b3 = 0;

}

/* }}} */

//comment //comment

/*FALSE*/ #if M_DEFINED

#error Defination!

#endif

以不同的命令行参数运行CLineCounter.py，输出如下：

E:\PyTest>CLineCounter.py

FileLines  CodeLines  CommentLines  EmptyLines

77         53         18            11          <Total:2 Files>

E:\PyTest>CLineCounter.py 0

FileLines  CodeLines  CommentLines  EmptyLines

77         53         18            11          <Total:2 Files>

E:\PyTest>CLineCounter.py 1

FileLines  CodeLines  CommentLines  EmptyLines  FileName

33         19         15            4           E:\PyTest\lctest\line.c

44         34         3             7           E:\PyTest\lctest\test.c

77         53         18            11          <Total:2 Files>

经人工校验，统计信息正确。

接着，在实际工程中运行python CLineCounter.py 1，截取部分运行输出如下：

[wangxiaoyuan_@localhost ~]$ python CLineCounter.py 1

FileLines  CodeLines  CommentLines  EmptyLines  FileName

99         21         58            24          /sdb1/wangxiaoyuan/include/Dsl_Alloc.h

120        79         28            24          /sdb1/wangxiaoyuan/include/Dsl_Backtrace.h

... ... ... ... ... ... ... ...

139        89         24            26          /sdb1/wangxiaoyuan/source/Dsl_Tbl_Map.c

617        481        64            78          /sdb1/wangxiaoyuan/source/Dsl_Test_Suite.c

797        569        169           82          /sdb1/wangxiaoyuan/source/xDSL_Common.c

15450      10437      3250          2538        <Total:40 Files>

四. 后记

本文所实现的C代码统计工具较为简陋，后续将重构代码并添加控制选项。