第2篇里,介绍了jenaThe general purpose rule engine(通用规则引擎)及其使用,本篇继续探究,如何自定义builtin。

builtin介绍

先回顾builtin为何物,官方叫Builtin primitives,可以理解为内置函数、内置指令,可以返回true或者false用来检验rule是否匹配,官方包含如下的primitives

Builtin Operations

isLiteral(?x) notLiteral(?x)

isFunctor(?x) notFunctor(?x)

isBNode(?x) notBNode(?x)

Test whether the single argument is or is not a literal, a functor-valued
literal or a blank-node, respectively.
bound(?x...) unbound(?x..)
Test if all of the arguments are bound (not bound) variables
equal(?x,?y) notEqual(?x,?y)
Test if x=y (or x != y). The equality test is semantic equality so that,
for example, the xsd:int 1 and the xsd:decimal 1 would test equal.

lessThan(?x, ?y), greaterThan(?x, ?y)

le(?x, ?y), ge(?x, ?y)

Test if x is <, >, <= or >= y. Only passes if both x and y
are numbers or time instants (can be integer or floating point or XSDDateTime).

sum(?a, ?b, ?c)

addOne(?a, ?c)

difference(?a, ?b, ?c)

min(?a, ?b, ?c)

max(?a, ?b, ?c)

product(?a, ?b, ?c)

quotient(?a, ?b, ?c)

Sets c to be (a+b), (a+1) (a-b), min(a,b), max(a,b), (a
b), (a/b). Note that these do not run backwards, if in
 
sum
 a and c are bound and b is unbound then the test will
fail rather than bind b to (c-a). This could be fixed.

strConcat(?a1, .. ?an, ?t)

uriConcat(?a1, .. ?an, ?t)

Concatenates the lexical form of all the arguments except the last, then
binds the last argument to a plain literal (strConcat) or a URI node
(uriConcat) with that lexical form. In both cases if an argument node
is a URI node the URI will be used as the lexical form.

regex(?t, ?p)

regex(?t, ?p, ?m1, .. ?mn)

Matches the lexical form of a literal (?t) against a regular expression
pattern given by another literal (?p). If the match succeeds, and if
there are any additional arguments then it will bind the first n capture
groups to the arguments ?m1 to ?mn. The regular expression pattern syntax
is that provided by java.util.regex. Note that the capture groups are
numbered from 1 and the first capture group will be bound to ?m1, we
ignore the implicit capture group 0 which corresponds to the entire matched
string. So for example

regexp('foo bar', '(.) (.
)', ?m1, ?m2)

will bind
 
m1
 to
 
"foo"
 and
 
m2
 to
 
"bar".

now(?x)
Binds ?x to an xsd:dateTime value corresponding to the current time.
makeTemp(?x)
Binds ?x to a newly created blank node.
makeInstance(?x, ?p, ?v)

makeInstance(?x, ?p, ?t, ?v)
Binds ?v to be a blank node which is asserted as the value of the ?p property
on resource ?x and optionally has type ?t. Multiple calls with the same
arguments will return the same blank node each time - thus allowing this
call to be used in backward rules.
makeSkolem(?x, ?v1, ... ?vn)
Binds ?x to be a blank node. The blank node is generated based on the values
of the remain ?vi arguments, so the same combination of arguments will
generate the same bNode.
noValue(?x, ?p)

noValue(?x ?p ?v)
True if there is no known triple (x, p, ) or (x, p, v) in the model or
the explicit forward deductions so far.
remove(n, ...)

drop(n, ...)
Remove the statement (triple) which caused the n'th body term of this (forward-only)
rule to match. Remove will propagate the change to other consequent rules
including the firing rule (which must thus be guarded by some other clauses).
In particular, if the removed statement (triple) appears in the body
of a rule that has already fired, the consequences of such rule are retracted
from the deducted model. Drop will silently remove the triple(s) from
the graph but not fire any rules as a consequence. These are clearly
non-monotonic operations and, in particular, the behaviour of a rule
set in which different rules both drop and create the same triple(s)
is undefined.
isDType(?l, ?t) notDType(?l, ?t)
Tests if literal ?l is (or is not) an instance of the datatype defined
by resource ?t.
print(?x, ...)
Print (to standard out) a representation of each argument. This is useful
for debugging rather than serious IO work.
listContains(?l, ?x)
 

listNotContains(?l, ?x)
Passes if ?l is a list which contains (does not contain) the element ?x,
both arguments must be ground, can not be used as a generator.
listEntry(?list, ?index, ?val)
Binds ?val to the ?index'th entry in the RDF list ?list. If there is no
such entry the variable will be unbound and the call will fail. Only
usable in rule bodies.
listLength(?l, ?len)
Binds ?len to the length of the list ?l.
listEqual(?la, ?lb)
 

listNotEqual(?la, ?lb)
listEqual tests if the two arguments are both lists and contain the same
elements. The equality test is semantic equality on literals (sameValueAs)
but will not take into account owl:sameAs aliases. listNotEqual is the
negation of this (passes if listEqual fails).
listMapAsObject(?s, ?p ?l)
 

listMapAsSubject(?l, ?p, ?o)
These can only be used as actions in the head of a rule. They deduce a
set of triples derived from the list argument ?l : listMapAsObject asserts
triples (?s ?p ?x) for each ?x in the list ?l, listMapAsSubject asserts
triples (?x ?p ?o).
table(?p) tableAll()
Declare that all goals involving property ?p (or all goals) should be tabled
by the backward engine.
hide(p)
Declares that statements involving the predicate p should be hidden. Queries
to the model will not report such statements. This is useful to enable
non-monotonic forward rules to define flag predicates which are only
used for inference control and do not "pollute" the inference results.

builtin 自定义

自定义很简单,实现Builtin接口, 然后使用BuiltinRegistry.theRegistry.register注册即可。

Builtin接口定义如下:

public interface Builtin {

    /**
* Return a convenient name for this builtin, normally this will be the name of the
* functor that will be used to invoke it and will often be the final component of the
* URI.
*/
public String getName(); /**
* Return the full URI which identifies this built in.
*/
public String getURI(); /**
* Return the expected number of arguments for this functor or 0 if the number is flexible.
*/
public int getArgLength(); /**
* This method is invoked when the builtin is called in a rule body.
* @param args the array of argument values for the builtin, this is an array
* of Nodes, some of which may be Node_RuleVariables.
* @param length the length of the argument list, may be less than the length of the args array
* for some rule engines
* @param context an execution context giving access to other relevant data
* @return return true if the buildin predicate is deemed to have succeeded in
* the current environment
*/
public boolean bodyCall(Node[] args, int length, RuleContext context); /**
* This method is invoked when the builtin is called in a rule head.
* Such a use is only valid in a forward rule.
* @param args the array of argument values for the builtin, this is an array
* of Nodes.
* @param length the length of the argument list, may be less than the length of the args array
* for some rule engines
* @param context an execution context giving access to other relevant data
*/
public void headAction(Node[] args, int length, RuleContext context); /**
* Returns false if this builtin has side effects when run in a body clause,
* other than the binding of environment variables.
*/
public boolean isSafe(); /**
* Returns false if this builtin is non-monotonic. This includes non-monotonic checks like noValue
* and non-monotonic actions like remove/drop. A non-monotonic call in a head is assumed to
* be an action and makes the overall rule and ruleset non-monotonic.
* Most JenaRules are monotonic deductive closure rules in which this should be false.
*/
public boolean isMonotonic();
}

一般我们不用直接实现该接口,可以继承默认的实现BaseBuiltin, 一般只需要Override 下getName提供指令名称,实现bodyCall,提供函数调用即可。

    @Override
public String getName() {
return "semsim";
}

比如,我们来自定义一个指令,用来计算两两语义相似度:

public class SemanticSimilarityBuiltin extends BaseBuiltin {
/**
* Return a convenient name for this builtin, normally this will be the name of the
* functor that will be used to invoke it and will often be the final component of the
* URI.
*/
@Override
public String getName() {
return "semsim";
} @Override
public int getArgLength() {
return 3;
} /**
* This method is invoked when the builtin is called in a rule body.
*
* @param args the array of argument values for the builtin, this is an array
* of Nodes, some of which may be Node_RuleVariables.
* @param context an execution context giving access to other relevant data
* @return return true if the buildin predicate is deemed to have succeeded in
* the current environment
*/
@Override
public boolean bodyCall(Node[] args, int length, RuleContext context) {
checkArgs(length, context);
Node n1 = getArg(0, args, context);
Node n2 = getArg(1, args, context);
Node score = getArg(2,args,context); if(!score.isLiteral() || score.getLiteral().getValue()==null){
return false;
}
String value;
Double hold = Double.parseDouble(score.getLiteralValue().toString()); // n.isLiteral() && n.getLiteralValue() instanceof Number if (n1.isLiteral() && n2.isLiteral()) {
String v1 = n1.getLiteralValue().toString();
String v2 = n2.getLiteralValue().toString(); // 调用服务计算相似度
String requestUrl = "http://API-URL:5101/similarity/cosine?s1="+v1+"&s2="+v2;
String result = HttpClientUtil.doGet(requestUrl);
JSONObject json = JSON.parseObject(result);
if(json.getDouble("similarity") >= hold){
return true;
} return true;
}
return false;
}
}
  • 这里有个getArgLength和checkArgs(length, context),可以用来限制参数长度,检验必须符合该长度。
  • 可以通过getArg(idx, args, context)来获取待计算的参数
  • 上面的计算相似度,主要是调用外度的服务来计算两两的语义向量的cosine得分,如果满足阈值,我们就认为规则匹配

测试

我们来测试上面的定义的计算语义相似度的指令semsim,还是用第2篇里的例子:

我们新增加两个属性主要业务竞争对手,我们定义,如果两个公司的主要业务语义上相似,我们就认为两家公司是竞争对手。

        Property 主要业务 = myMod.createProperty(finance + "主要业务");
Property 竞争对手 = myMod.createProperty(finance + "竞争对手"); // 加入三元组 myMod.add(万达集团, 主要业务, "房地产,文娱");
myMod.add(融创中国, 主要业务, "房地产");

然后定义规则:

[ruleCompetitor: (?c1 :主要业务 ?b1) (?c2 :主要业务 ?b2) notEqual(?c1,?c2) semsim(?b1,?b2,0.6)  -> (?c1 :竞争对手 ?c2)]

规则意思是,公司C1 主要业务是 b1,c2 主要业务是b2,并且c1和c2不是同一家公司,如果b1,b2的相似度大于0.6,那么C1和c2是竞争对手。

完整测试代码:

       // 注册自定义builtin
BuiltinRegistry.theRegistry.register(new SemanticSimilarityBuiltin()); Model myMod = ModelFactory.createDefaultModel();
String finance = "http://www.example.org/kse/finance#";
Resource 孙宏斌 = myMod.createResource(finance + "孙宏斌");
Resource 融创中国 = myMod.createResource(finance + "融创中国");
Resource 乐视网 = myMod.createResource(finance + "乐视网");
Property 执掌 = myMod.createProperty(finance + "执掌");
Resource 贾跃亭 = myMod.createResource(finance + "贾跃亭");
Resource 地产公司 = myMod.createResource(finance + "地产公司");
Resource 公司 = myMod.createResource(finance + "公司");
Resource 法人实体 = myMod.createResource(finance + "法人实体");
Resource 人 = myMod.createResource(finance + "人");
Property 主要收入 = myMod.createProperty(finance + "主要收入");
Resource 地产事业 = myMod.createResource(finance + "地产事业");
Resource 王健林 = myMod.createResource(finance + "王健林");
Resource 万达集团 = myMod.createResource(finance + "万达集团");
Property 主要资产 = myMod.createProperty(finance + "主要资产"); Property 股东 = myMod.createProperty(finance + "股东");
Property 关联交易 = myMod.createProperty(finance + "关联交易");
Property 收购 = myMod.createProperty(finance + "收购"); Property 主要业务 = myMod.createProperty(finance + "主要业务");
Property 竞争对手 = myMod.createProperty(finance + "竞争对手"); // 加入三元组
myMod.add(孙宏斌, 执掌, 融创中国);
myMod.add(贾跃亭, 执掌, 乐视网);
myMod.add(王健林, 执掌, 万达集团);
myMod.add(乐视网, RDF.type, 公司);
myMod.add(万达集团, RDF.type, 公司);
myMod.add(融创中国, RDF.type, 地产公司);
myMod.add(地产公司, RDFS.subClassOf, 公司);
myMod.add(公司, RDFS.subClassOf, 法人实体);
myMod.add(孙宏斌, RDF.type, 人);
myMod.add(贾跃亭, RDF.type, 人);
myMod.add(王健林, RDF.type, 人);
myMod.add(万达集团, 主要资产, 地产事业);
myMod.add(万达集团, 主要业务, "房地产,文娱");
myMod.add(融创中国, 主要收入, 地产事业);
myMod.add(融创中国, 主要业务, "房地产");
myMod.add(孙宏斌, 股东, 乐视网);
myMod.add(孙宏斌, 收购, 万达集团); PrintUtil.registerPrefix("", finance); // 输出当前模型
StmtIterator i = myMod.listStatements(null, null, (RDFNode) null);
while (i.hasNext()) {
System.out.println(" - " + PrintUtil.print(i.nextStatement()));
} GenericRuleReasoner reasoner = (GenericRuleReasoner) GenericRuleReasonerFactory.theInstance().create(null);
reasoner.setRules(Rule.parseRules(
"[ruleHoldShare: (?p :执掌 ?c) -> (?p :股东 ?c)] \n"
+ "[ruleConnTrans: (?p :收购 ?c) -> (?p :股东 ?c)] \n"
+ "[ruleConnTrans: (?p :股东 ?c) (?p :股东 ?c2) -> (?c :关联交易 ?c2)] \n"
+ "[ruleCompetitor:: (?c1 :主要业务 ?b1) (?c2 :主要业务 ?b2) notEqual(?c1,?c2) semsim(?b1,?b2,0.6) -> (?c1 :竞争对手 ?c2)] \n"
+ "-> tableAll()."));
reasoner.setMode(GenericRuleReasoner.HYBRID); InfGraph infgraph = reasoner.bind(myMod.getGraph());
infgraph.setDerivationLogging(true); System.out.println("推理后...\n"); Iterator<Triple> tripleIterator = infgraph.find(null, null, null);
while (tripleIterator.hasNext()) {
System.out.println(" - " + PrintUtil.print(tripleIterator.next()));
}

运行结果:

 - (:万达集团 :关联交易 :乐视网)
- (:万达集团 :关联交易 :融创中国)
- (:万达集团 :竞争对手 :融创中国)
- (:万达集团 :关联交易 :万达集团)
- (:孙宏斌 :股东 :万达集团)
- (:孙宏斌 :股东 :融创中国)
- (:融创中国 :关联交易 :万达集团)
- (:融创中国 :竞争对手 :万达集团)
- (:融创中国 :关联交易 :乐视网)
- (:融创中国 :关联交易 :融创中国)
- (:乐视网 :关联交易 :万达集团)
- (:乐视网 :关联交易 :融创中国)
- (:乐视网 :关联交易 :乐视网)
- (:贾跃亭 :股东 :乐视网)
- (:王健林 :股东 :万达集团)
- (:公司 rdfs:subClassOf :法人实体)
- (:万达集团 :主要业务 '房地产,文娱')
- (:万达集团 :主要资产 :地产事业)
- (:万达集团 rdf:type :公司)
- (:地产公司 rdfs:subClassOf :公司)
- (:融创中国 :主要业务 '房地产')
- (:融创中国 :主要收入 :地产事业)
- (:融创中国 rdf:type :地产公司)
- (:孙宏斌 :收购 :万达集团)
- (:孙宏斌 :股东 :乐视网)
- (:孙宏斌 rdf:type :人)
- (:孙宏斌 :执掌 :融创中国)
- (:乐视网 rdf:type :公司)
- (:贾跃亭 rdf:type :人)
- (:贾跃亭 :执掌 :乐视网)
- (:王健林 rdf:type :人)
- (:王健林 :执掌 :万达集团)

可以根据需要,扩展更多的builtin,比如运行js,比如http请求。。。


作者:Jadepeng

出处:jqpeng的技术记事本--http://www.cnblogs.com/xiaoqi

您的支持是对博主最大的鼓励,感谢您的认真阅读。

本文版权归作者所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

知识图谱推理与实践(3) -- jena自定义builtin的更多相关文章

  1. 知识图谱推理与实践 (2) -- 基于jena实现规则推理

    本章,介绍 基于jena的规则引擎实现推理,并通过两个例子介绍如何coding实现. 规则引擎概述 jena包含了一个通用的规则推理机,可以在RDFS和OWL推理机使用,也可以单独使用. 推理机支持在 ...

  2. 知识图谱学习与实践(4)——通过例句介绍Sparql的使用

    通过例句介绍Sparql的使用 1 简介 SPARQL的定义,是一个递归的定义,为SPARQL Protocal and RDF Query Language,是W3C制定的RDF知识图谱标准查询语言 ...

  3. 知识图谱学习与实践(4)——Protégé使用入门

    1 Protégé简介 Protégé是一个本体建模工具软件,由斯坦福大学基于java语言开发的,属于开放源代码软件.软件主要用于语义网中本体的构建和基于本体的知识应用,是本体构建的核心开发工具,最新 ...

  4. 知识图谱学习与实践(6)——从结构化数据进行知识抽取(D2RQ介绍)

    1 概述 D2RQ,含义是把关系型数据库当作虚拟的RDF图数据库进行访问.D2RQ平台是一个将关系型数据库当作虚拟的.只读的RDF图数据库进行访问的系统.提供了基于RDF访问关系数据库的内容,而无需复 ...

  5. 知识图谱顶会论文(IJCAI-2022) TEMP:多跳推理的类型感知嵌入

    IJCAI-TEMP:知识图谱上多跳推理的类型感知嵌入 论文地址: Type-aware Embeddings for Multi-Hop Reasoning over Knowledge Graph ...

  6. 简单构建基于RDF和SPARQL的KBQA(知识图谱问答系统)

    本文主要通过python实例讲解基于RDF和SPARQL的KBQA系统的构建.该项目可在python2和python3上运行通过. 注:KBQA即是我们通常所说的基于知识图谱的问答系统.这里简单构建的 ...

  7. 知识图谱基础之RDF,RDFS与OWL

    https://blog.csdn.net/u011801161/article/details/78833958 https://blog.csdn.net/baidu_15113429/artic ...

  8. 知识图谱基础之RDF,RDFS与OWL 2

    https://zhuanlan.zhihu.com/p/32122644 看过之前两篇文章([1](为什么需要知识图谱?什么是知识图谱?——KG的前世今生), [2](语义网络,语义网,链接数据和知 ...

  9. K8s 学习者绝对不能错过的最全知识图谱(内含 58个知识点链接)

    作者 | 平名 阿里服务端开发技术专家 导读:Kubernetes 作为云原生时代的“操作系统”,熟悉和使用它是每名用户的必备技能.本篇文章概述了容器服务 Kubernetes 的知识图谱,部分内容参 ...

随机推荐

  1. nginx支持wss配置

    nginx证书 nginx.conf配置

  2. SpringBoot第一次案例

    一.Spring Boot 入门 1.Spring Boot 简介 简化Spring应用开发的一个框架: 整个Spring技术栈的一个大整合: J2EE开发的一站式解决方案: 2.微服务 2014,m ...

  3. 【JavaEE】之MyBatis的ParameterType的使用

    在MyBatis的Mapper.xml文件中,参数的表示方法有两种:一种是使用 “#{XXX}” 的方式表示的,另一种是使用 “${XXX}” 的方式表示的.今天来介绍以下这两种方式的不同之处. 1. ...

  4. CCNA 之 九 STP生成树协议

    STP生成树 在上一次实验中,使用了单臂路由是两个不同的VLAN之间进行通信,而单臂路由的这种网络拓扑,当一条链路或者路由设备出现故障的时候,整个网络就会瘫痪. 称此网络为:不健壮的,无冗余的网络环境 ...

  5. Day01-初识 Python

    1.CPU/内存/硬盘/操作系统 CPU :计算机的运算和处理中心,相当于人类的大脑. 内存 :暂时存储数据,临时加载数据应用程序. 硬盘 :长期存储数据. 操作系统:一个软件,连接计算机的硬件与所有 ...

  6. HOOK的类型

  7. Python报错ERROR: Command errored out with exit status 1:

    解决方法: 1.以管理员身份打开cmd 2.pip install robotframework-AutoItLibrary (本次安装时Python基于3.7.3,pip为最新版本) 3.安装成功

  8. 解决 Docker Hadoop ssh "Connection to * closed".问题

    Docker 最近很火, 可以快速轻量级地虚拟出多个node,所以打算在Docker中跑Hadoop伪分布式应用. 其实要做出个简单的版本倒是不难,主要在 建立ssh无密码登录本机时,出现刚登录上去, ...

  9. Xcode编译引用Framework

    需要两步配置 1.在xcode工程的search path下设置要引用的Framework所在路径 2.将Framewoek拖入工程中时 不要选择copy,而选择引用模式.

  10. js练习- 给你一个对象,求有几层

    // 比如这个a中,就有四层.如何算出这四层 const a = { b: 1, c() {}, d: { e: 2, f: { g: 3, h: { i: 4, }, }, j: { k: 5, } ...