更改elasticsearch的score评分

  在某些情况下,我们需要自定义score的分值,从而达到个性化搜索的目的。例如我们通过机器学习可以得到每个用户的特征向量、同时知道每个商品的特征向量,如何计算这两个特征向量的相似度?这个两个特征向量越高,评分越高,从而把那些与用户相似度高的商品优先推荐给用户。

插件源码解读

  通过查看官网文档,运行一个脚步必须通过“ScriptEngine”来实现的。为了开发一个自定义的插件,我们需要实现“ScriptEngine”接口,并通过getScriptEngine()这个方法来加载我们的插件。ScriptEngine接口具体介绍见文献[1].下面通过官网给出的一个具体例子:

  private static class MyExpertScriptEngine implements ScriptEngine {
//可以命名自己在脚本api中使用的名称来引用这个脚本后端。
@Override
public String getType() {
return "expert_scripts";
}

 

  //核心方法,下面是通过java的lamada表达式来实现的
@Override
public <T> T compile(String scriptName, String scriptSource, ScriptContext<T> context, Map<String, String> params) {
if (context.equals(SearchScript.CONTEXT) == false) {
throw new IllegalArgumentException(getType() + " scripts cannot be used for context [" + context.name + "]");
}
// we use the script "source" as the script identifier
if ("pure_df".equals(scriptSource)) {
//通过p来获取参数params中的值,lookup得到文档中的的值
SearchScript.Factory factory = (p, lookup) -> new SearchScript.LeafFactory() {
final String field;
final String term;
{
if (p.containsKey("field") == false) {
throw new IllegalArgumentException("Missing parameter [field]");
}
if (p.containsKey("term") == false) {
throw new IllegalArgumentException("Missing parameter [term]");
}
field = p.get("field").toString();
term = p.get("term").toString();
} @Override
public SearchScript newInstance(LeafReaderContext context) throws IOException {
PostingsEnum postings = context.reader().postings(new Term(field, term));
if (postings == null) {
// the field and/or term don't exist in this segment, so always return 0
return new SearchScript(p, lookup, context) {
@Override
public double runAsDouble() {
return 0.0d;
}
};
}
return new SearchScript(p, lookup, context) {
int currentDocid = -1;
@Override
public void setDocument(int docid) {
// advance has undefined behavior calling with a docid <= its current docid
if (postings.docID() < docid) {
try {
postings.advance(docid);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
currentDocid = docid;
}
@Override
public double runAsDouble() {
if (postings.docID() != currentDocid) {
// advance moved past the current doc, so this doc has no occurrences of the term
return 0.0d;
}
try {
return postings.freq();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
};
} @Override
public boolean needs_score() {
return false;
}
};
return context.factoryClazz.cast(factory);
}
throw new IllegalArgumentException("Unknown script name " + scriptSource);
} @Override
public void close() {
// optionally close resources
}
}

通过分析上面的代码及结合业务需求,我们给出如下脚步:

脚步一

    package com;

    import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.lucene.index.LeafReaderContext;
import org.elasticsearch.script.ScriptContext;
import org.elasticsearch.script.ScriptEngine;
import org.elasticsearch.script.SearchScript; import java.io.IOException;
import java.util.*; /**
* \* Created with IntelliJ IDEA.
* \* User: 0.0
* \* Date: 18-8-9
* \* Time: 下午2:32
* \* Description:为了得到个性化推荐搜索效果,我们计算用户向量与每个产品特征向量的相似度。
*          相似度越高,最后得到的分值越高,排序越靠前.
* \
*/ public class FeatureVectorScoreSearchScript implements ScriptEngine {
private final static Logger logger = LogManager.getLogger(FeatureVectorScoreSearchScript.class);
@Override
public String getType() {
return "feature_vector_scoring_script";
}
@Override
public <T> T compile(String scriptName, String scriptSource, ScriptContext<T> context, Map<String, String> params) {
logger.info("The feature_vector_scoring_script is calculating the similarity of users and commodities");
if (!context.equals(SearchScript.CONTEXT)) {
throw new IllegalArgumentException(getType() + " scripts cannot be used for context [" + context.name + "]");
}
if("whb_fvs".equals(scriptSource)) {
SearchScript.Factory factory = (p, lookup) -> new SearchScript.LeafFactory() {
// 对入参检查
final Map<String, Object> inputFeatureVector;
final String field;
{
if (p.containsKey("field") == false) {
throw new IllegalArgumentException("Missing parameter [field]");
}
if(p.containsKey("inputFeatureVector") == false){
throw new IllegalArgumentException("Missing parameter [inputFeatureVector]");
}
field = p.get("field").toString();
inputFeatureVector = (Map<String,Object>) p.get("inputFeatureVector"); }
@Override
public SearchScript newInstance(LeafReaderContext context) throws IOException {
return new SearchScript(p, lookup, context) {
@Override
public double runAsDouble() {
if(lookup.source().containsKey(field)==true){
final Map<String, Double> productFeatureVector = (Map<String, Double>) lookup.source().get(field);
return calculateVectorSimilarity(inputFeatureVector, productFeatureVector);
}else {
logger.info("The " + field + " is not exist in the product");
return 0.0D;
}
}
};
} @Override
public boolean needs_score() {
return false;
}
};
return context.factoryClazz.cast(factory);
}throw new IllegalArgumentException("Unknown script name " + scriptSource); } @Override
public void close() {
} //计算两个向量的相似度(cos)
public double calculateVectorSimilarity(Map<String, Object> inputFeatureVector , Map<String,Double> productFeatureVector){
double sumOfProduct = 0.0D;
double sumOfUser = 0.0D;
double sumOfSquare = 0.0D;
if(inputFeatureVector!=null && productFeatureVector!=null){
for(Map.Entry<String, Object> entry: inputFeatureVector.entrySet()){
String dimName = entry.getKey();
double dimScore = Double.parseDouble(entry.getValue().toString());
double itemDimScore = productFeatureVector.get(dimName);
sumOfUser += dimScore*dimScore;
sumOfProduct += itemDimScore*itemDimScore;
sumOfSquare += dimScore*itemDimScore;
}
if(sumOfUser*sumOfProduct==0.0D){
return 0.0D;
}
return sumOfSquare / (Math.sqrt(sumOfUser)*Math.sqrt(sumOfProduct));
}else {
return 0.0D;
}
} }

脚本二(fast-vector-distance)


/**
* \* Created with IntelliJ IDEA.
* \* User: 王火斌
* \* Date: 18-8-9
* \* Time: 下午2:32
* \* Description:为了得到个性化推荐搜索效果,我们计算用户向量与每个产品特征向量的相似度。
*          相似度越高,最后得到的分值越高,排序越靠前.
* \
*/
/**
package com;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.lucene.index.LeafReaderContext;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.plugins.ScriptPlugin;
import org.elasticsearch.script.ScriptContext;
import org.elasticsearch.script.ScriptEngine;
import org.elasticsearch.script.SearchScript;
import org.apache.lucene.index.BinaryDocValues;
import org.apache.lucene.store.ByteArrayDataInput;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.DoubleBuffer;
import java.util.*; * This class is instantiated when Elasticsearch loads the plugin for the
* first time. If you change the name of this plugin, make sure to update
* src/main/resources/es-plugin.properties file that points to this class.
*/
public final class FastVectorDistance extends Plugin implements ScriptPlugin { @Override
public ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<?>> contexts) {
return new FastVectorDistanceEngine();
} private static class FastVectorDistanceEngine implements ScriptEngine {
private final static Logger logger = LogManager.getLogger(FastVectorDistance.class);
private static final int DOUBLE_SIZE = 8; double queryVectorNorm; @Override
public String getType() {
return "feature_vector_scoring_script";
} @Override
public <T> T compile(String scriptName, String scriptSource, ScriptContext<T> context, Map<String, String> params) {
logger.info("The feature_vector_scoring_script is calculating the similarity of users and commodities");
if (!context.equals(SearchScript.CONTEXT)) {
throw new IllegalArgumentException(getType() + " scripts cannot be used for context [" + context.name + "]");
}
if ("whb_fvd".equals(scriptSource)) {
SearchScript.Factory factory = (p, lookup) -> new SearchScript.LeafFactory() {
// The field to compare against
final String field;
//Whether this search should be cosine or dot product
final Boolean cosine;
//The query embedded vector
final Object vector;
Boolean exclude;
//The final comma delimited vector representation of the query vector
double[] inputVector; {
if (p.containsKey("field") == false) {
throw new IllegalArgumentException("Missing parameter [field]");
} //Determine if cosine
final Object cosineBool = p.get("cosine");
cosine = cosineBool != null ? (boolean) cosineBool : true; //Get the field value from the query
field = p.get("field").toString(); final Object excludeBool = p.get("exclude");
exclude = excludeBool != null ? (boolean) cosineBool : true; //Get the query vector embedding
vector = p.get("vector"); //Determine if raw comma-delimited vector or embedding was passed
if (vector != null) {
final ArrayList<Double> tmp = (ArrayList<Double>) vector;
inputVector = new double[tmp.size()];
for (int i = 0; i < inputVector.length; i++) {
inputVector[i] = tmp.get(i);
}
} else {
final Object encodedVector = p.get("encoded_vector");
if (encodedVector == null) {
throw new IllegalArgumentException("Must have 'vector' or 'encoded_vector' as a parameter");
}
inputVector = Util.convertBase64ToArray((String) encodedVector);
} //If cosine calculate the query vec norm
if (cosine) {
queryVectorNorm = 0d;
// compute query inputVector norm once
for (double v : inputVector) {
queryVectorNorm += Math.pow(v, 2.0);
}
}
} @Override
public SearchScript newInstance(LeafReaderContext context) throws IOException { return new SearchScript(p, lookup, context) {
Boolean is_value = false; // Use Lucene LeafReadContext to access binary values directly.
BinaryDocValues accessor = context.reader().getBinaryDocValues(field); @Override
public void setDocument(int docId) {
// advance has undefined behavior calling with a docid <= its current docid
try {
accessor.advanceExact(docId);
is_value = true;
} catch (IOException e) {
is_value = false;
}
} @Override
public double runAsDouble() { //If there is no field value return 0 rather than fail.
if (!is_value) return 0.0d; final int inputVectorSize = inputVector.length;
final byte[] bytes; try {
bytes = accessor.binaryValue().bytes;
} catch (IOException e) {
return 0d;
} final ByteArrayDataInput byteDocVector = new ByteArrayDataInput(bytes); byteDocVector.readVInt(); final int docVectorLength = byteDocVector.readVInt(); // returns the number of bytes to read if (docVectorLength != inputVectorSize * DOUBLE_SIZE) {
return 0d;
} final int position = byteDocVector.getPosition(); final DoubleBuffer doubleBuffer = ByteBuffer.wrap(bytes, position, docVectorLength).asDoubleBuffer(); final double[] docVector = new double[inputVectorSize]; doubleBuffer.get(docVector); double docVectorNorm = 0d;
double score = 0d; //calculate dot product of document vector and query vector
for (int i = 0; i < inputVectorSize; i++) { score += docVector[i] * inputVector[i]; if (cosine) {
docVectorNorm += Math.pow(docVector[i], 2.0);
}
} //If cosine, calcluate cosine score
if (cosine) { if (docVectorNorm == 0 || queryVectorNorm == 0) return 0d; score = score / (Math.sqrt(docVectorNorm) * Math.sqrt(queryVectorNorm));
} return score;
}
};
} @Override
public boolean needs_score() {
return false;
}
};
return context.factoryClazz.cast(factory);
}
throw new IllegalArgumentException("Unknown script name " + scriptSource);
} @Override
public void close() {}
}
}

部署

通过maven来部署,具体部署步骤如下:

  1. 配置pom文件

    加载依赖类,设置项目创建目录。

    4.0.0

    es-plugin

    elasticsearch-plugin

    1.0-SNAPSHOT

     <dependencies>
    <dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>6.1.1</version>
    </dependency>
    <dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.12</version>
    <scope>test</scope>
    </dependency>
    </dependencies>
    <build>
    <plugins>
    <plugin>
    <artifactId>maven-assembly-plugin</artifactId>
    <version>2.3</version>
    <configuration>
    <appendAssemblyId>false</appendAssemblyId>
    <outputDirectory>${project.build.directory}/releases/</outputDirectory>
    <descriptors>
    <descriptor>${basedir}/src/assembly/plugin.xml</descriptor>
    </descriptors>
    </configuration>
    <executions>
    <execution>
    <phase>package</phase>
    <goals>
    <goal>single</goal>
    </goals>
    </execution>
    </executions>
    </plugin>
    <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <configuration>
    <source>1.8</source>
    <target>1.8</target>
    </configuration>
    </plugin>
    </plugins>
    </build>

2.创建xml文件

<?xml version="1.0"?>
<assembly>
<id>plugin</id>
<formats>
<format>zip</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<fileSets>
<fileSet>
<directory>${project.basedir}/src/main/resources</directory>
<outputDirectory>feature-vector-score</outputDirectory>
</fileSet>
</fileSets>
<dependencySets>
<dependencySet>
<outputDirectory>feature-vector-score</outputDirectory>
<useProjectArtifact>true</useProjectArtifact>
<useTransitiveFiltering>true</useTransitiveFiltering>
<excludes>
<exclude>org.elasticsearch:elasticsearch</exclude>
<exclude>org.apache.logging.log4j:log4j-api</exclude>
</excludes>
</dependencySet>
</dependencySets>
</assembly>

3.创建plugin-descriptor.properties文件

description=feature-vector-similarity
version=1.0
name=feature-vector-score
site=${elasticsearch.plugin.site}
jvm=true
classname=com.FeatureVectorScoreSearchPlugin
java.version=1.8
elasticsearch.version=6.1.1
isolated=${elasticsearch.plugin.isolated}

description:simple summary of the plugin

version(String):plugin’s version

name(String):the plugin name

classname(String):the name of the class to load, fully-qualified.

java.version(String):version of java the code is built against. Use the system property java.specification.version. Version string must be a sequence of nonnegative decimal integers separated by "."'s and may have leading zeros.

测试

创建索引

create_index = {
"settings": {
"analysis": {
"analyzer": {
# this configures the custom analyzer we need to parse vectors such that the scoring
# plugin will work correctly
"payload_analyzer": {
"type": "custom",
"tokenizer":"whitespace",
"filter":"delimited_payload_filter"
}
}
}
},
"mappings": {
"movies": {
# this mapping definition sets up the metadata fields for the movies
"properties": {
"movieId": {
"type": "integer"
},
"tmdbId": {
"type": "keyword"
},
"genres": {
"type": "keyword"
},
"release_date": {
"type": "date",
"format": "year"
},
"@model": {
# this mapping definition sets up the fields for movie factor vectors of our model
"properties": {
"factor": {
"type": "binary",
"doc_values": true
},
"version": {
"type": "keyword"
},
"timestamp": {
"type": "date"
}
}
}
}}
}}

查询

You can execute the script by specifying its lang as expert_scripts, and the name of the script as the script source:

{
"query": { "function_score": {
"query": {
"match_all": {
}
},
"functions": [
{
"script_score": {
"script": {
"source": "whb_fvd",
"lang" : "feature_vector_scoring_script",
"params": {
"field": "@model.factor",
"cosine": true,
"encoded_vector" :"v9EUmGAAAAC/6f9VAAAAAL/j+OOgAAAAv+m6+oAAAAA/lTSDIAAAAL/FdkTAAAAAv7rKHKAAAAA/0iyEYAAAAD/ZUY6gAAAAP7TzYoAAAAA/1K4IAAAAAD+yH9XgAAAAv6QRBSAAAAA/vRiiwAAAAL/mRhzgAAAAv9WxpiAAAAC/8YD+QAAAAL/jpbtgAAAAv+zmD+AAAAC/1eqtIAAAAA=="
}
}
}
}
]
}
}
}

版本说明

在最近一年中,es版本迭代速度很快,上述插件主要使用了SearchScript类适用于v5.4-v6.4。在esv5.4以下的版本,主要使用ExecutableScript类。对于es大于6.4版本,出现了一个新类ScoreScript来实现自定义评分脚本。

项目详细见github

https://github.com/SnailWhb/elasticsearch_pulgine_fast-vector-distance

参考文献

[1]https://static.javadoc.io/org.elasticsearch/elasticsearch/6.0.1/org/elasticsearch/script/ScriptEngine.html

[2]https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-engine.html

[3]https://github.com/jiashiwen/elasticsearchpluginsample

[4]https://www.elastic.co/guide/en/elasticsearch/plugins/6.3/plugin-authors.html

elasticsearch插件的开发--计算特征向量的相似度的更多相关文章

  1. elasticsearch 插件 大全

    本文使用的elasticsearch版本:1.7.3 推荐几款比较常用的elasticsearch插件 1.集群监控插件 bigdesk node cluster 2.集群资源查看和查询插件 kopf ...

  2. Springboot整合elasticsearch以及接口开发

    Springboot整合elasticsearch以及接口开发 搭建elasticsearch集群 搭建过程略(我这里用的是elasticsearch5.5.2版本) 写入测试数据 新建索引book( ...

  3. Android组件化和插件化开发

    http://www.cnblogs.com/android-blogs/p/5703355.html 什么是组件化和插件化? 组件化开发就是将一个app分成多个模块,每个模块都是一个组件(Modul ...

  4. 大熊君JavaScript插件化开发------(第一季)

    一,开篇分析 Hi,大家!大熊君又来了,今天这系列文章主要是说说如何开发基于“JavaScript”的插件式开发,我想很多人对”插件“这个词并不陌生, 有的人可能叫“组件”或“部件”,这不重要,关键是 ...

  5. 使用 WordPress 插件模板开发高质量插件

    WordPress 插件样板是标准化的,有组织的,面向对象的基础,用于构建高品质的 WordPress 插件.样板遵循编码标准和文件标准,所以你不必自己学习这些,根据注释编写代码即可. 官方网站    ...

  6. TinyFrame升级之八:实现简易插件化开发

    本章主要讲解如何为框架新增插件化开发功能. 在.net 4.0中,我们可以在Application开始之前,通过PreApplicationStartMethod方法加载所需要的任何东西.那么今天我们 ...

  7. Android插件化开发

    客户端开发给人的印象往往是小巧,快速奔跑.但随着产品的发展,目前产生了大量的门户型客户端.功能模块持续集成,开发人员迅速增长.不同的开发小组开发不同的功能模块,甚至还有其他客户端集成进入.能做到功能模 ...

  8. C#学习笔记-----基于AppDomain的"插件式"开发

    很多时候,我们都想使用(开发)USB式(热插拔)的应用,例如,开发一个WinForm应用,并且这个WinForm应用能允许开发人员定制扩展插件,又例如,我们可能维护着一个WinService管理系统, ...

  9. Android应用插件式开发解决方法

    转自:http://blog.csdn.net/arui319/article/details/8109650 一.现实需求描述 一般的,一个Android应用在开发到了一定阶段以后,功能模块将会越来 ...

随机推荐

  1. 蚂蚁男孩.队列组件(Framework.Mayiboy.Queue)

    它能做什么 主要是用来方便使用队列而诞生,该组件封装了Queue和Redis中的队列,能够通过简单配置就可以高效快速使用起来. 使用说明 一.    下载源码,自己手动编译,手动引用必要的程序集.(需 ...

  2. winform npoi excel 样式设置

    IWorkbook excel = new HSSFWorkbook();//创建.xls文件 ISheet sheet = excel.CreateSheet("sheet1") ...

  3. MVVM Light 新手入门(2) :ViewModel / Model 中定义“属性” ,并在View中调用

    今天学习MVVM架构中“属性”的添加并调用,特记录如下,学习资料均来自于网络,特别感谢翁智华的利刃 MVVMLight系列. 一个窗口的基本模型如下: View(视图) -> ViewModel ...

  4. 记录JavaScript中使用keyup事件做输入验证(附event.keyCode表)

    input的blur事件 $("#input-name").blur(function () { var value = $(this).val(); if (value === ...

  5. From Alpha to Gamma (II)

    这篇文章被拖延得这么久是因为我没有找到合适的引言 -- XXX 这一篇接着讲Gamma.近几年基于物理的渲染(Physically Based Shading, 后文简称PBS)开始在游戏业界受到关注 ...

  6. 【cocos2d-x 仙凡奇缘-网游研发(2) 角色换线系统】

    转载请注明出处:http://www.cnblogs.com/zisou/p/xianfan01.html 做一款游戏就先得制作好策划文档,和基本的人物世界构架的设计,然后架空在这样一个虚拟的世界中每 ...

  7. POI(java 操作excel,word等)编程

    一.下载所需jar包 下载地址:http://poi.apache.org/download.html http://download.csdn.net/detail/likai22/534250 二 ...

  8. win10中shift+右键,在此处打开cmd窗口

    通过添加注册表项,实现右击“在此处打开命令行功能” 注册表位置:HKEY_CLASSES_ROOT\Directory\Background\shell\ win10系统用标识右键菜单打开命令行的键, ...

  9. docker 搭建Mysql集群

    docker基本指令: 更新软件包 yum -y update 安装Docker虚拟机(centos 7) yum install -y docker 运行.重启.关闭Docker虚拟机 servic ...

  10. UIVisualEffectView(高斯模糊效果)

    ///高斯模糊. UIView *tempView = [[UIView alloc] initWithFrame:CGRectMake(100, 100, 100, 100)]; tempView. ...