by Tom
Copeland

04/09/2003

A Review of PMD

A few weeks ago, O'Reilly Network ran an article on PMD, an open source, Java static-analysis tool sponsored under the umbrella of the Defense Advanced Research
Projects Agency (DARPA) project "Cougaar." That article covered some of the basics of PMD--it's built on an Extended Backus Naur Format (EBNF) grammar, from which JavaCC generates a parser
and JJTree generates an Java Abstract Syntax Tree (AST), and comes with a number of ready-to-run rules that you can run on your own source code. You can also write your own rules
to enforce coding practices specific to your organization.

In this article, we'll take a closer look at the AST, how it is generated, and some of its complexities. Then we'll write a custom PMD rule to find the creation of Thread objects. We'll write this custom rule two ways,
first in the form of a Java class, and then in the form of an XPath expression.

The AST

Recall from the first article that the Java AST is a tree structure that represents a chunk of Java source code. For example, here's a simple code snippet and the corresponding AST:

Source Code Abstract Syntax Tree
Thread t = new Thread();
FieldDeclaration
Type
Name
VariableDeclarator
VariableDeclaratorId
VariableInitializer
Expression
PrimaryExpression
PrimaryPrefix
AllocationExpression
Name
Arguments

Here we can see that the AST is a standard tree structure: a hierarchy of nodes of various types. All of the node types and their valid children are defined in the EBNF grammar file. For example, here's the definition of a FieldDeclaration:

void FieldDeclaration() :
{
}
{
( "public" { ((AccessNode) jjtThis).setPublic( true ); }
| "protected" { ((AccessNode) jjtThis).setProtected( true ); }
| "private" { ((AccessNode) jjtThis).setPrivate( true ); }
| "static" { ((AccessNode) jjtThis).setStatic( true ); }
| "final" { ((AccessNode) jjtThis).setFinal( true ); }
| "transient" { ((AccessNode) jjtThis).setTransient( true ); }
| "volatile" { ((AccessNode) jjtThis).setVolatile( true ); } )* Type() VariableDeclarator() ( "," VariableDeclarator() )* ";"
}

FieldDeclaration is composed of a Type followed by at least one VariableDeclarator; for example, int
x,y,z = 0;
. A FieldDeclaration may also be preceeded by a couple of different modifiers, that is, Java keywords like transient or private.
Since these modifiers are separated by a pipe symbol and followed by an asterisk, any number can appear in any order. All of these grammar rules eventually can be traced back to the Java Language Specification (JLS) (see the Referencessection
below).

Related Reading

Java Enterprise Best Practices

By The O'Reilly Java Authors

The grammar doesn't enforce nuances like "a field can't be both public and private". That's the job of a semantic layer that would be built into a full compiler such as javacor Jikes.
PMD avoids the job of validating modifiers--and the myriad other tasks a compiler must perform--by assuming the code is compilable. If it's not, PMD will report an error, skip that source file, and move on. After all, if a source file can't even be compiled,
there's not much use in trying to check it for unused code.

Looking closer at the grammar snippet above, we can also see some custom actions that occur when a particular token is found. For example, when the keyword publicis found at the start of a FieldDeclaration,
the parser that JavaCC generates will call the method setPublic(true) on the current node. The PMD grammar is full of this sort of thing, and new actions are continually being added. By the time a source code file makes
it through the parser, a lot of work has been done that makes rule writing much easier.

A Custom Rule

Now that we've reviewed the AST a bit more, let's write a custom PMD rule. As mentioned before, we'll assume we're writing Enterprise Java Beans, so we shouldn't be using some of the standard Java library classes. We shouldn't open a FileInputStream,
start a ServerSocket, or instantiate a new Thread. To make sure our code is safe for use inside of an EJB container, let's write a rule that checks for Thread creation.

Writing a Custom Rule as a Java Class

Let's start by writing a Java class that traverses the AST. From the first article, recall that JJTree generates AST classes that support the Visitor pattern. Our class will register for callbacks when it hits a certain
type of AST node, then poke around the surrounding nodes to see if it's found something interesting. Here's some boilerplace code:

// Extend AbstractRule to enable the Visitor pattern
// and get some handy utility methods
public class EmptyIfStmtRule extends AbstractRule {
}

If you look back up at the AST for that initial code snippet--Thread t = new Thread();--you will find an AST type called an AllocationExpression. Yup, that sounds like what we're
looking for: allocation of newThread objects. Let's add in a hook to notify us when it hits a new [something] node:

public class EmptyIfStmtRule extends AbstractRule {
// make sure we get a callback for any object creation expressions
public Object visit(ASTAllocationExpression node, Object data){
return super.visit(node, data);
}
}

We've put a super.visit(node,data) in there so the Visitor will continue to visit children of this node. This lets us catch allocations within allocations, i.e., new
Foo(new Thread())
. Let's add in an if statement to exclude array allocations:

public class EmptyIfStmtRule extends AbstractRule {
public Object visit(ASTAllocationExpression node, Object data){
// skip allocations of arrays and primitive types:
// new int[], new byte[], new Object[]
if ((node.jjtGetChild(0) instanceof ASTName) {
return super.visit(node, data);
}
}
}

We're not concerned about array allocations, not even Thread-related allocations like Thread[] threads = new Thread[];. Why not? Because instantiating an array of Thread object
references doesn't really create any new Thread objects. It just creates the object references. We'll focus on catching the actual creation of the Thread objects. Finally, let's
add in a check for the Thread name:

public class EmptyIfStmtRule extends AbstractRule {
public Object visit(ASTAllocationExpression node, Object data){
if ((node.jjtGetChild(0) instanceof ASTName &&
((ASTName)node.jjtGetChild(0)).getImage().equals("Thread")) {
// we've found one! Now we'll record a RuleViolation and move on
ctx.getReport().addRuleViolation(
createRuleViolation(ctx, node.getBeginLine()));
}
return super.visit(node, data);
}
}

That about wraps up the Java code. Back in the first article, we described a PMD ruleset and the XML rule definition. Here's a possible ruleset definition containing the rule we just wrote:

<?xml version="1.0"?>
<ruleset name="My company's EJB checker rules">
<description>
The Design Ruleset contains a collection of rules that find questionable designs.
</description>
<rule name="DontCreateThreadsRule"
message="Don't create threads, use the MyCompanyThreadService instead"
class="org.mycompany.util.pmd.DontCreateThreadsRule">
<description>
Don't create Threads, use the MyCompanyThreadService instead.
</description> <example>
<![CDATA[
Thread t = new Thread(); // don't do this!
]]>
</example>
</rule>
</ruleset>

You can put this ruleset on your CLASSPATH or refer to it directly, like this:

java net.sourceforge.pmd.PMD /path/to/src xml /path/to/ejbrules.xml

Writing a Custom Rule as an XPath Expression

Recently Daniel Sheppard enhanced PMD to allow rules to be written using XPath. We won't explain XPath completely here--it would require a large book--but generally speaking, XPath is a way of querying an XML document. You can write an XPath query to get a
list of nodes that fit a certain pattern. For example, if you have an XML document with a list of departments and employees, you could write a simple XPath query that returns all the employees in a given department, and you wouldn't need to write DOM-traversal
or SAX-listener code.

Related Reading

XPath and XPointer

Locating Content in XML Documents

By John E. Simpson

That's all well and good, but how does querying XML documents relate to PMD? Daniel noticed that an AST is a tree, just like an XML document. He downloaded the Jaxen XPath engine and wrote a class called a DocumentNavigator that
allows Jaxen to traverse the AST. Jaxen gets the XPath expression, evaluates it, applies it to the AST, and returns a list of matching nodes to PMD. PMD creates RuleViolation objects from the matching nodes and moves
along to the next source file.

XPath is a new language, though, so why write PMD rules using XPath when you're already a whiz-bang Java programmer? The reason is that it's a whole lot easier to write simple rules using XPath. To illustrate, here's the "DontCreateThreadsRule" written as an
XPath expression:

//AllocationExpression[Name/@Image='Thread'][not(ArrayDimsAndInits)]

Concise, eh? There's no Java class to track--you don't have to compile anything or put anything else on your CLASSPATH. Just add the XPath expression to your rule definition like this:

<?xml version="1.0"?>
<ruleset name="My company's EJB checker rules">
<description>
The Design Ruleset contains a collection of rules that find questionable designs.
</description>
<rule name="DontCreateThreadsRule"
message="Don't create threads, use the MyCompanyThreadService instead"
class="org.mycompany.util.pmd.DontCreateThreadsRule">
<description>
Don't create Threads, use the MyCompanyThreadService instead.
</description>
<properties>
<property name="xpath">
<value>
<![CDATA[
//AllocationExpression[Name/@Image='Thread'][not(ArrayDimsAndInits)]>
]]>
</value>
</property>
</properties>
<example>
<![CDATA[
Thread t = new Thread(); // don't do this!
]]>
</example>
</rule>
</ruleset>

Refer to the rule as usual to run it on your source code.

You can learn a lot about XPath by looking at how the built-in PMD rules identify nodes, and you can also try out new XPath expressions using a PMD utility called the ASTViewer. Run this utility by executing theastviewer.bat or astviewer.sh scripts
in the etc/ directory of the PMD distribution. It will bring up a window that looks like Figure 1. Type some code into the left-hand panel, put an XPath expression in the text field, click the "Go" button at the bottom of the window, and the other
panels will be populated with the AST and the results of the XPath query.


Figure 1. Screenshot of ASTViewer

When should you use XPath to write a PMD rule? My initial thought is, "Anytime you can." I think that you'll find that many simple rules can be written using XPath, especially those that are checking for braces or a particular name. For example, almost all
of the rules in the PMD basic ruleset and braces ruleset are now written as very short, concise XPath expressions. The more complicated rules--primarily those dealing with the symbol table--are probably still easiest to write in Java. We'll see, though. At
some point we may even wrap the symbol table in a DocumentNavigator.

Future Plans

There's still a lot of work to do on PMD. Now that this XPath infrastructure is in place, it might be possible to write an interactive rule editor. Ideally, you could open a GUI, type in a code snippet, select certain AST nodes, and an XPath expression that
finds those nodes would be generated for you. PMD can always use more rules, of course. Currently, there are over 40 feature requests on the web site just waiting for someone to implement them. Also, PMD has a pretty weak symbol table, so it occasionally picks
up a false positive. There's plenty of room for contributors to jump in and improve the code.

Conclusion

This article has presented a more in-depth look at the Java AST and how it's defined. We've written a PMD rule that checks for Thread creation using two techniques--a Java class and an XPath query. Give PMD a try and
see what it finds in your code today!

Credits

Thanks to the Cougaar program and DARPA for supporting PMD. Thanks to Dan Sheppard for writing the XPath integration. Thanks also to the many other contributors without whom PMD would be a much less useful utility.

References

Tom Copeland started programming on a TRS-80 Model III, but demand for that skill has waned and he now programs mostly in Java and Ruby.

Custom PMD Rules的更多相关文章

  1. EBS Custom Password Rules

    https://blogs.oracle.com/manojmadhusoodanan/entry/custom_password_rules Custom Password Rules By Man ...

  2. [引]雅虎日历控件 Example: Two-Pane Calendar with Custom Rendering and Multiple Selection

    本文转自:http://yuilibrary.com/yui/docs/calendar/calendar-multipane.html This example demonstrates how t ...

  3. Android Weekly Notes Issue #235

    Android Weekly Issue #235 December 11th, 2016 Android Weekly Issue #235 本期内容包括: 开发一个自定义View并发布为开源库的完 ...

  4. Fedora 24中的日志管理

    Introduction Log files are files that contain messages about the system, including the kernel, servi ...

  5. Rails sanitize

    The SanitizeHelper module provides a set of methods for scrubbing text of undesired HTML elements. T ...

  6. 关于 ant 不同渠道自动打包的笔记

    必要的java.android.ant文件及循环打包用到的ant的jar 下载Ant(这里的Ant不是eclipse和android SDk里面自带的ant)      官方下载地址:http://a ...

  7. windows下Android利用ant自动编译、修改配置文件、批量多渠道,打包生成apk文件

    原创文章,转载请注明:http://www.cnblogs.com/ycxyyzw/p/4535459.html android 程序打包成apk,如果在是命令行方式,一般都要经过如下步骤: 1.用a ...

  8. [Architect] ABP(现代ASP.NET样板开发框架) 翻译

    所有翻译文档,将上传word文档至GitHub 本节目录: 简介 代码示例 支持的功能 GitHub 简介 ABP是“ASP.NET Boilerplate Project (ASP.NET样板项目) ...

  9. [Tool] 使用StyleCop验证命名规则

    [Tool] 使用StyleCop验证命名规则 前言 微软的MSDN上,有提供了一份微软的命名方针,指引开发人员去建立风格一致的程序代码. http://msdn.microsoft.com/zh-t ...

  10. Android Studio 和 Gradle

    由于以前没做过什么java项目,在使用Android Studio时遇到了Gradle,真是一头雾水,决定总结一下. 具体的使用方法请参看:http://www.cnblogs.com/youxilu ...

随机推荐

  1. 【YashanDB知识库】开源调度框架Quartz写入Boolean值到YashanDB报错

    问题现象 Quartz 是一个广泛应用于企业级应用中的开源作业调度框架,它主要用于在Java环境中管理和执行任务. 为了任务调度,Quartz的数据模型中使用了大量的布尔值记录任务.流程的各种状态,如 ...

  2. 真人模特失业?AI虚拟试衣一键成图,IDM-VTON下载介绍

    在电商行业竞争尤为激烈的当下,除了打价格战外,如何有效的控制成本,是每个从业者都在思考的问题 IDM-VTON是一个AI虚拟换装工具,旨在帮助服装商家解决约拍模特导致的高昂成本问题,只需一张服装图片, ...

  3. Yarn 3.0 Plug'n'Play (PnP) 安装和迁移

    前言 以前用 npm, 后来 yarn 火了就用 yarn. 后来 yarn 2.0 大改版, Angular 不支持就一直没用. 一直到去年的 Angular 13 才开始支持. 最近又开始写 An ...

  4. CSS – z-index

    介绍 z-index 是用来设置 element 层次高低的 (当 element 重叠的时候) 参考: 4 reasons your z-index isn't working (and how t ...

  5. ASP.NET Core Library – Hangfire

    前言 以前写过 Hangfire 的学习笔记, 但写的很乱. 这篇做个整理. 介绍 Hangfire 是用来做 server task 的, 比如: background job, delay job ...

  6. 2024年常用的Python可视化框架及开源项目

    以下是 2024 年一些较为流行的 Python 可视化框架及开源项目: Matplotlib 框架声明:是 Python 中最基础.最广泛使用的可视化库之一,用于创建各种静态.动态和交互式图表. 官 ...

  7. 大模型训练:K8s 环境中数千节点存储最佳实践

    今天这篇博客来自全栈工程师朱唯唯,她在前不久举办的 KubeCon 中国大会上进行了该主题分享. Kubernetes 已经成为事实的应用编排标准,越来越多的应用在不断的向云原生靠拢.与此同时,人工智 ...

  8. WordPress产品分类添加,自动排序插件

    效果图如下 目前这个预览菜单这个效果有点问题,但是不影响实际排序,有懂源码的朋友可以自行修改一下,目录结构menu -assets menu.css menu.js menu.php 源码如下menu ...

  9. Windows11忘记开机密码重置

    在锁屏页面按着shift键重启,找到命令行输入一下两行代码 copy c:\windows\system\system32\utilman.exe c:\windows\system32\utilma ...

  10. CMake 属性之目录属性

    [写在前面] CMake 的目录属性是指在特定目录(及其子目录)范围内有效的设置. 这些属性不同于全局变量或目标(Target)属性,它们提供了一种机制,允许开发者为项目中的不同部分定义不同的构建行为 ...