编辑距离算法-DP问题
Levenshtein Distance
The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
Example
For example, the Levenshtein distance between kitten and sitting is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
- kitten → sitten (substitution of "s" for "k")
- sitten → sittin (substitution of "i" for "e")
- sittin → sitting (insertion of "g" at the end)
思路:
Let’s take a simple example of finding minimum edit distance between strings ME and MY. Intuitively you already know that minimum edit distance here is 1 operation and this operation. And it is replacing E with Y. But let’s try to formalize it in a form of the algorithm in order to be able to do more complex examples like transforming Saturday into Sunday.
To apply the mathematical formula mentioned above to ME → MY transformation we need to know minimum edit distances of ME → M, M → MY and M → M transformations in prior. Then we will need to pick the minimum one and add one operation to transform last letters E → Y. So minimum edit distance of ME → MY transformation is being calculated based on three previously possible transformations.
To explain this further let’s draw the following matrix:
- Cell
(0:1)contains red number 1. It means that we need 1 operation to transformMto an empty string. And it is by deletingM. This is why this number is red. - Cell
(0:2)contains red number 2. It means that we need 2 operations to transformMEto an empty string. And it is by deletingEandM. - Cell
(1:0)contains green number 1. It means that we need 1 operation to transform an empty string toM. And it is by insertingM. This is why this number is green. - Cell
(2:0)contains green number 2. It means that we need 2 operations to transform an empty string toMY. And it is by insertingYandM. - Cell
(1:1)contains number 0. It means that it costs nothing to transformMintoM. - Cell
(1:2)contains red number 1. It means that we need 1 operation to transformMEtoM. And it is by deletingE. - And so on...
This looks easy for such small matrix as ours (it is only 3x3). But here you may find basic concepts that may be applied to calculate all those numbers for bigger matrices (let’s say a 9x7 matrix for Saturday → Sunday transformation).
According to the formula you only need three adjacent cells (i-1:j), (i-1:j-1), and (i:j-1) to calculate the number for current cell (i:j). All we need to do is to find the minimum of those three cells and then add 1 in case if we have different letters in i's row and j's column.如果等的话,找最小就好。
代码如下:
/**
* @param {string} a
* @param {string} b
* @return {number}
*/
export default function levenshteinDistance(a, b) {
// Create empty edit distance matrix for all possible modifications of
// substrings of a to substrings of b.
const distanceMatrix = Array(b.length + ).fill(null).map(() => Array(a.length + ).fill(null)); // Fill the first row of the matrix.
// If this is first row then we're transforming empty string to a.
// In this case the number of transformations equals to size of a substring.
for (let i = ; i <= a.length; i += ) {
distanceMatrix[][i] = i;
} // Fill the first column of the matrix.
// If this is first column then we're transforming empty string to b.
// In this case the number of transformations equals to size of b substring.
for (let j = ; j <= b.length; j += ) {
distanceMatrix[j][] = j;
} for (let j = ; j <= b.length; j += ) {
for (let i = ; i <= a.length; i += ) {
const indicator = a[i - ] === b[j - ] ? : ;
distanceMatrix[j][i] = Math.min(
distanceMatrix[j][i - ] + , // deletion
distanceMatrix[j - ][i] + , // insertion
distanceMatrix[j - ][i - ] + indicator, // substitution
);
}
} return distanceMatrix[b.length][a.length];
}
编辑距离算法-DP问题的更多相关文章
- 用C#实现字符串相似度算法(编辑距离算法 Levenshtein Distance)
在搞验证码识别的时候需要比较字符代码的相似度用到"编辑距离算法",关于原理和C#实现做个记录. 据百度百科介绍: 编辑距离,又称Levenshtein距离(也叫做Edit Dist ...
- [转]字符串相似度算法(编辑距离算法 Levenshtein Distance)
转自:http://www.sigvc.org/bbs/forum.php?mod=viewthread&tid=981 http://www.cnblogs.com/ivanyb/archi ...
- Levenshtein Distance算法(编辑距离算法)
编辑距离 编辑距离(Edit Distance),又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数.许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符, ...
- 字符串相似度算法(编辑距离算法 Levenshtein Distance)(转)
在搞验证码识别的时候需要比较字符代码的相似度用到“编辑距离算法”,关于原理和C#实现做个记录. 据百度百科介绍: 编辑距离,又称Levenshtein距离(也叫做Edit Distance),是指两个 ...
- 字符串相似度算法(编辑距离算法 Levenshtein Distance)
在搞验证码识别的时候需要比较字符代码的相似度用到“编辑距离算法”,关于原理和C#实现做个记录.据百度百科介绍:编辑距离,又称Levenshtein距离(也叫做Edit Distance),是指两个字串 ...
- 自然语言处理(5)之Levenshtein最小编辑距离算法
自然语言处理(5)之Levenshtein最小编辑距离算法 题记:之前在公司使用Levenshtein最小编辑距离算法来实现相似车牌的计算的特性开发,正好本节来总结下Levenshtein最小编辑距离 ...
- 编辑距离算法(Levenshtein)
编辑距离定义: 编辑距离,又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数. 许可的编辑操作包括:将一个字符替换成另一个字符,插入一个字符,删除一个字符. 例如 ...
- Java实现编辑距离算法
Java实现编辑距离算法 编辑距离,又称Levenshtein距离(莱文斯坦距离也叫做Edit Distance),是指两个字串之间,由一个转成另一个所需的最少编辑操作次数,如果它们的距离越大,说明它 ...
- Levenshtein distance 编辑距离算法
这几天再看 virtrual-dom,关于两个列表的对比,讲到了 Levenshtein distance 距离,周末抽空做一下总结. Levenshtein Distance 介绍 在信息理论和计算 ...
随机推荐
- leetcode中的sql
1 组合两张表 组合两张表, 题目很简单, 主要考察JOIN语法的使用.唯一需要注意的一点, 是题目中的这句话, "无论 person 是否有地址信息".说明即使Person表, ...
- PAT Basic 1083 是否存在相等的差 (20) [hash映射,map STL]
题目 给定 N 张卡⽚,正⾯分别写上 1.2.--.N,然后全部翻⾯,洗牌,在背⾯分别写上 1.2.--. N.将每张牌的正反两⾯数字相减(⼤减⼩),得到 N 个⾮负差值,其中是否存在相等的差? 输⼊ ...
- 【模式分解】无损连接&保持函数依赖
首先引入定义 无损分解指的是对关系模式分解时,原关系模型下任一合法的关系值在分解之后应能通过自然联接运算恢复起来.反之,则称为有损分解. 保持函数依赖的分解指的是对关系分解时,原关系的闭包与分解后关系 ...
- 上传本地项目到GIT码云
1.下载GIT 下载地址:https://git-scm.com/downloads 我这里下载的64位 2.安装GIT 双击下载的Git-2.18.0-64-bit.exe文件,选择自己的安装目录, ...
- Long型转ZonedDateTime型
/** * 将Long类型转化成0 * @author yk * @param time * @return */public static ZonedDateTime toZonedDateTime ...
- goweb-文本处理
文本处理 Web开发中对于文本处理是非常重要的一部分,我们往往需要对输出或者输入的内容进行处理,这里的文本包括字符串.数字.Json.XML等等.Go语言作为一门高性能的语言,对这些文本的处理都有官方 ...
- [原]win10拖拽贴靠功能注册表项调查记录
win10的拖拽贴靠功能被禁用了,偶然的机会,在设置中看到了相关的设置项,如下图 直觉告诉我一定是设置注册表中的某一项,于是决定调查下具体的注册表位置.请出procmon.exe,然后关闭贴靠功能,停 ...
- ZJNU 1129 - The sum problem——中级
枚举区间可能的长度len,将m减去1~len构成的序列和后如果结果是len的倍数,则可以构成答案区间. /* Written By. StelaYuri */ #include<stdio.h& ...
- Metasploit详解
title date tags layout Metasploit 详解 2018-09-25 Metasploit post 一.名词解释 exploit 测试者利用它来攻击一个系统,程序,或服务, ...
- 使用java读取解析txt文本数据,管理简单的数据
在实际开发中会经常碰到使用编程语言读取文本文件的内容,这内容可以是各种各样的一下本人写出我自己做的一个读取文本文件的例子,文件中存储的是我的个人网站 www.yzcopen.com 导航栏目因为懒得使 ...