Technique to Read Source Code

Excerpted from http://ruby-hacking-guide.github.io/intro.htm

Any programmer has to read a source code somewhere, but I guess there are not many occasions(机会) that someone teaches you the concrete ways how to read. Why? Does it mean you can naturally read a program if you can write a program?

But I can’t think reading the program written by other people is so easy. In the same way as writing programs, there must be techniques and theories in reading programs. And they are necessary. Therefore, I’d like to expand a general summary of an approach you need to take in reading a source code.

Principles

At first, I mention the principle.

Decide a goal

An important key to reading the source code is to set a concrete goal.

This is a word by the author of Ruby, Matsumoto. Indeed, his word is very convincing(有说服力的) for me. When the motivation is a spontaneous(自发的) idea “Maybe I should read a kernel, at least…”, you would get source code expanded or explanatory books ready on the desk. But not knowing what to do, the studies are to be left untouched. Haven’t you? On the other hand, when you have in mind “I’m sure there is a bug somewhere in this tool. I need to quickly fix it and make it work. Otherwise I will not be able to make the deadline…”, you will probably be able to fix the code in a blink, even if it’s written by someone else. Haven’t you?

The difference in these two cases is motivation you have. In order to know something, you at least have to know what you want to know. Therefore, the first step of all is to figure out what you want to know in explicit words.

However, of course this is not all needed to make it your own “technique”. Because “technique” needs to be a common method that anybody can make use of it by following it. In the following section, I will explain how to bring the first step into the landing place where you achieve the goal finally.

Visualising the goal

Now let us suppose that our final goal is set “Understand all about ruby”. This is certainly considered as “one set goal”, but apparently it will not be useful for reading the source code actually. It will not be a trigger of any concrete action. Therefore, your first job will be to drag down the vague(模糊的) goal to the level of a concrete thing.

Then how can we do it? The first way is thinking as if you are the person who wrote the program. You can utilize your knowledge in writing a program, in this case. For example, when you are reading a traditional “structured” programming (结构化编程) by somebody, you will analyze it hiring the strategies of structured programming too. That is, you will divide the target into pieces, little by little. If it is something circulating in a event loop such as a GUI program, first roughly(大概地) browse the event loop then try to find out the role of each event handler. Or, try to investigate(研究) the “M” of MVC (Model View Controller) first.

Second, it’s good to be aware of the method to analyze. Everybody might have certain analysis methods, but they are often done relying on experience or intuition(直觉). In what way can we read source codes well? Thinking about the way itself and being aware of it are crucially(至关地) important.

Well, what are such methods like? I will explain it in the next section.

Analysis methods

The methods to read source code can be roughly divided into two; one is a static method and the other is dynamic method. Static method is to read and analyze the source code without running the program. Dynamic method is to watch the actual behavior using tools like a debugger.

It’s better to start studying a program by dynamic analysis. That is because what you can see there is the “fact”. The results from static analysis, due to the fact of not running the program actually, may well be “prediction” to a greater or lesser extent(程度). If you want to know the truth, you should start from watching the fact.

Of course, you don’t know whether the results of dynamic analysis are the fact really. The debugger could run with a bug, or the CPU may not be working properly due to overheat. The conditions of your configuration could be wrong. However, the results of static analysis should at least be closer to the fact than dynamic analysis.

Dynamic analysis

Using the target program

You can’t start without the target program. First of all, you need to know in advance what the program is like, and what are expected behaviors.

Following the behavior using the debugger

If you want to see the paths of code execution and the data structure produced as a result, it’s quicker to look at the result by running the program actually than to emulate the behavior in your brain. In order to do so easily, use the debugger.

I would be more happy if the data structure at runtime can be seen as a picture, but unfortunately we can nearly scarcely(几乎没有) find a tool for that purpose (especially few tools are available for free). If it is about a snapshot of the comparatively(相对地) simpler structure, we might be able to write it out as a text and convert it to a picture by using a tool like graphviz (See doc/graphviz.html in the attached CD-ROM). But it’s very difficult to find a way for general purpose and real time analysis.

Tracer

You can use the tracer if you want to trace the procedures that code goes through. In case of C-language, there is a tool named ctrace. For tracing a system call, you can use tools like strace, truss, and ktrace.

Print everywhere

There is a word “printf debugging”. This method also works for analysis other than debugging. If you are watching the history of one variable, for example, it may be easier to understand to look at the dump of the result of the print statements embed(嵌入), than to track the variable with a debugger.

Modifying the code and running it

Say for example, in the place where it’s not easy to understand its behavior, just make a small change in some part of the code or a particular parameter and then re-run the program. Naturally it would change the behavior, thus you would be able to infer the meaning of the code from it.

It goes without saying, you should also have an original binary and do the same thing on both of them.

Static analysis

The importance of names

Static analysis is simply source code analysis. And source code analysis is really an analysis of names. File names, function names, variable names, type names, member names — A program is a bunch of names.

This may seem obvious because one of the most powerful tools for creating abstractions in programming is naming, but keeping this in mind will make reading much more efficient.

Also, we’d like to know about coding rules beforehand(事先) to some extent. For example, in C language, extern function often uses prefix to distinguish the type of functions. And in object-oriented programs, function names sometimes contain the information about where they belong to in prefixes, and it becomes valuable information (e.g. rb_str_length).

Reading documents

Sometimes a document describes the internal structure is included. Especially be careful of a file named HACKING etc.

Reading the directory structure

Looking at in what policy the directories are divided. Grasping the overview such as how the program is structured, and what the parts are.

Reading the file structure

While browsing (the names of) the functions, also looking at the policy of how the files are divided. You should pay attention to the file names because they are like comments whose lifetime is very long.

Additionally, if a file contains some modules in it, for each module the functions to compose it should be grouped together, so you can find out the module structure from the order of the functions.

Investigating abbreviations

As you encounter ambiguous(模棱两可的) abbreviations(缩写), make a list of them and investigate each of them as early as possible. For example, when it is written “GC”, things will be very different depending on whether it means “Garbage Collection” or “Graphic Context”.

Abbreviations for a program are generally made by the methods like taking the initial letters or dropping the vowels(元音). Especially, popular abbreviations in the fields of the target program are used unconditionally(无条件地), thus you should be familiar with them at an early stage(阶段).

Understanding data structure

If you find both data and code, you should first investigate the data structure. In other words, when exploring code in C, it’s better to start with header files. And in this case, let’s make the most of our imagination(想象力) from their filenames. For example, if you find frame.h, it would probably be the stack frame definition.

Also, you can understand many things from the member names of a struct and their types. For example, if you find the member next, which points to its own type, then it will be a linked list. Similarly, when you find members such as parent, children, and sibling, then it must be a tree structure. When prev, it will be a stack.

Understanding the calling relationship between functions

After names, the next most important thing to understand is the relationships between functions. A tool to visualize the calling relationships is especially called a “call graph”, and this is very useful. For this, we’d like to utilize tools.

A text-based tool is sufficient(足够的), but it’s even better if a tool can generate diagrams(图表). However such tool is seldom available (especially few tools are for free). When I analyzed ruby to write this book, I wrote a small command language and a parser(解析器) in Ruby and generated diagrams half-automatically by passing the results to the tool named graphviz.

Reading functions

Reading how it works to be able to explain things done by the function concisely(简明地). It’s good to read it part by part as looking at the figure of the function relationships.

What is important when reading functions is not “what to read” but “what not to read”. The ease of reading is decided by how much we can cut out the codes. What should exactly be cut out? It is hard to understand without seeing the actual example, thus it will be explained in the main part.

Additionally, when you don’t like its coding style, you can convert it by using the tool like indent(缩进).

Experimenting by modifying it as you like

It’s a mystery of human body, when something is done using a lot of parts of your body, it can easily persist in your memory. I think the reason why not a few people prefer using manuscript(手写的) papers to a keyboard is not only they are just nostalgic(怀旧的) but such fact is also related.

Therefore, because merely(仅仅是) reading on a monitor is very ineffective to remember with our bodies, rewrite it while reading. This way often helps our bodies get used to the code relatively soon. If there are names or code you don’t like, rewrite them. If there’s a cryptic(神秘的) abbreviation, substitute (替代) it so that it would be no longer abbreviated.

However, it goes without saying but you should also keep the original source aside and check the original one when you think it does not make sense along the way. Otherwise, you would be wondering for hours because of a simple your own mistake. And since the purpose of rewriting is getting used to and not rewriting itself, please be careful not to be enthusiastic(热情的) very much.

Reading the history

A program often comes with a document which is about the history of changes. For example, if it is a software of GNU, there’s always a file named ChangeLog. This is the best resource to know about “the reason why the program is as it is”.

Alternatively, when a version control system like CVS or SCCS is used and you can access it, its utility value is higher than ChangeLog. Taking CVS as an example, cvs annotate, which displays the place which modified a particular line, and cvs diff, which takes difference from the specified version, and so on are convenient.

Moreover, in the case when there’s a mailing list or a news group for developers, you should get the archives(档案) so that you can search over them any time because often there’s the information about the exact reason of a certain change. Of course, if you can search online, it’s also sufficient.

The tools for static analysis

Since various tools are available for various purposes, I can’t describe them as a whole. But if I have to choose only one of them, I’d recommend global. The most attractive(有吸引力的) point is that its structure allows us to easily use it for the other purposes. For instance, gctags, which comes with it, is actually a tool to create tag files, but you can use it to create a list of the function names contained in a file.

~/src/ruby % gctags class.c | awk '{print $1}'
SPECIAL_SINGLETON
SPECIAL_SINGLETON
clone_method
include_class_new
ins_methods_i
ins_methods_priv_i
ins_methods_prot_i
method_list

That said, but this is just a recommendation of this author, you as a reader can use whichever tool you like. But in that case, you should choose a tool equipped with at least the following features.

  • list up the function names contained in a file
  • find the location from a function name or a variable name (It’s more preferable if you can jump to the location)
  • function cross-reference

Technique to Read Source Code的更多相关文章

  1. Tips for newbie to read source code

    This post is first posted on my WeChat public account: GeekArtT Reading source code is always one bi ...

  2. 编程等宽字体Source Code Pro(转)

    Source Code Pro - 最佳的免费编程字体之一!来自 Adobe 公司的开源等宽字体下载     每一位程序员都有一套自己喜爱的代码编辑器与编程字体,譬如我们之前就推荐过一款"神 ...

  3. How to build the Robotics Library from source code on Windows

    The Robotics Library is an open source C++ library for robot kinematics, motion planning and control ...

  4. How to build windows azure PowerShell Source Code

    Download any version source code of Windows Azure Powershell from https://github.com/Azure/azure-sdk ...

  5. akka cluster sharding source code 学习 (1/5) 替身模式

    为了使一个项目支持集群,自己学习使用了 akka cluster 并在项目中实施了,从此,生活就变得有些痛苦.再配上 apache 做反向代理和负载均衡,debug 起来不要太酸爽.直到现在,我还对 ...

  6. view class source code with JAD plugin in Eclipse

    The default class viewer doesn't decompile the class file so you cannot open and check the source co ...

  7. Classic Source Code Collected

    收藏一些经典的源码,持续更新!!! 1.深度学习框架(Deep Learning Framework). A:Caffe (Convolutional Architecture for Fast Fe ...

  8. Attach source code to a Netbeans Library Wrapper Module

    http://rubenlaguna.com/wp/2008/02/22/attach-source-code-to-a-netbeans-library-wrapper-module/ Attach ...

  9. convert source code files to pdf format in python

    import os import sys def find_file(root_dir, type): dirs_pool = [root_dir] dest_pool = [] def scan_d ...

  10. Ununtu 12.04 gedit安装插件Source Code Browser

    1. 安装ctags: sudo apt-get install exuberant-ctags 2. 打开https://github.com/Quixotix/gedit-source-code- ...

随机推荐

  1. AR手势识别交互,让应用更加“得心应手”

    现如今, AR技术不断发展,人们不再满足于运用键盘.鼠标等简单器械来实现传统的人机交互模式.随着用户接触机器的多样化,繁琐的操作不但对一些用户有门槛,而且还增加其学习成本:如果能用自然且符合日常生活习 ...

  2. conky配置(附配置项作用解释)

    alignment top_right #是否嵌入桌面 background yes #是否绘制窗口边框 draw_borders no #窗口边框 border_width 10 #cpu_avg_ ...

  3. 关于deepin-wine或wine设置PATH环境变量的方法

    前言 更改wine中PATH变量主要是为了能在 cmd输入一些命令而已,这里你可能会问怎么用cmd? deepin-wine cmd 这样就进入了cmd,而设置PATH 环境变量不能像windows一 ...

  4. [论文阅读] 颜色迁移-Correlated Color Space

    [论文阅读] 颜色迁移-Correlated Color Space 文章: Color transfer in correlated color space, [paper], [matlab co ...

  5. 【每日一题】【动态规划&二分】2022年2月9日-NC91 最长上升子序列(三)

    描述给定数组 arr ,设长度为 n ,输出 arr 的最长上升子序列.(如果有多个答案,请输出其中 按数值(注:区别于按单个字符的ASCII码值)进行比较的 字典序最小的那个) 方法1:双层循环实现 ...

  6. 如何在SpringBoot中优雅地重试调用第三方API?

    前言 作为后端程序员,我们的日常工作就是调用一些第三方服务,将数据存入数据库,返回信息给前端.但你不能保证所有的事情一直都很顺利.像有些第三方API,偶尔会出现超时.此时,我们要重试几次,这取决于你的 ...

  7. 图解B树及C#实现(2)数据的读取及遍历

    目录 前言 查询数据 算法说明 代码实现 查询最值 算法说明 代码实现 B树的遍历 算法说明 代码实现 Benchmarks 总结 参考资料 前言 本文为系列文章 B树的定义及数据的插入 数据的读取及 ...

  8. uniapp微信小程序返回上一页并刷新数据

    根据要求:详情页返回列表页时,要刷新列表页的数据,操作如下 @click="goBack" goBack{ let pages = getCurrentPages(); // 当前 ...

  9. MongoDB - 模式设计

    注意事项 模式设计,即在文档中表示数据的方式,对于数据表示来说时非常关键的. 为 MongoDB 做模式设计时,在性能.可伸缩性和简单性方面是重中之重,也需要考虑一些特别的注意事项. 限制条件 与常见 ...

  10. 前端必备ps切图方法,拿下ui设计师就靠它了。

    方法1(图层切图): 点击源psd文件中需要的图片,右击鼠标选择导出为png,保存即可.图片与文字在两个或两个以上图层上的时候,按住Control选择多个图层,右键选择合并图层(快捷键:Control ...