编辑距离算法-DP问题
Levenshtein Distance
The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
Example
For example, the Levenshtein distance between kitten and sitting is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
- kitten → sitten (substitution of "s" for "k")
- sitten → sittin (substitution of "i" for "e")
- sittin → sitting (insertion of "g" at the end)
思路:
Let’s take a simple example of finding minimum edit distance between strings ME and MY. Intuitively you already know that minimum edit distance here is 1 operation and this operation. And it is replacing E with Y. But let’s try to formalize it in a form of the algorithm in order to be able to do more complex examples like transforming Saturday into Sunday.
To apply the mathematical formula mentioned above to ME → MY transformation we need to know minimum edit distances of ME → M, M → MY and M → M transformations in prior. Then we will need to pick the minimum one and add one operation to transform last letters E → Y. So minimum edit distance of ME → MY transformation is being calculated based on three previously possible transformations.
To explain this further let’s draw the following matrix:
- Cell
(0:1)contains red number 1. It means that we need 1 operation to transformMto an empty string. And it is by deletingM. This is why this number is red. - Cell
(0:2)contains red number 2. It means that we need 2 operations to transformMEto an empty string. And it is by deletingEandM. - Cell
(1:0)contains green number 1. It means that we need 1 operation to transform an empty string toM. And it is by insertingM. This is why this number is green. - Cell
(2:0)contains green number 2. It means that we need 2 operations to transform an empty string toMY. And it is by insertingYandM. - Cell
(1:1)contains number 0. It means that it costs nothing to transformMintoM. - Cell
(1:2)contains red number 1. It means that we need 1 operation to transformMEtoM. And it is by deletingE. - And so on...
This looks easy for such small matrix as ours (it is only 3x3). But here you may find basic concepts that may be applied to calculate all those numbers for bigger matrices (let’s say a 9x7 matrix for Saturday → Sunday transformation).
According to the formula you only need three adjacent cells (i-1:j), (i-1:j-1), and (i:j-1) to calculate the number for current cell (i:j). All we need to do is to find the minimum of those three cells and then add 1 in case if we have different letters in i's row and j's column.如果等的话,找最小就好。
代码如下:
/**
* @param {string} a
* @param {string} b
* @return {number}
*/
export default function levenshteinDistance(a, b) {
// Create empty edit distance matrix for all possible modifications of
// substrings of a to substrings of b.
const distanceMatrix = Array(b.length + ).fill(null).map(() => Array(a.length + ).fill(null)); // Fill the first row of the matrix.
// If this is first row then we're transforming empty string to a.
// In this case the number of transformations equals to size of a substring.
for (let i = ; i <= a.length; i += ) {
distanceMatrix[][i] = i;
} // Fill the first column of the matrix.
// If this is first column then we're transforming empty string to b.
// In this case the number of transformations equals to size of b substring.
for (let j = ; j <= b.length; j += ) {
distanceMatrix[j][] = j;
} for (let j = ; j <= b.length; j += ) {
for (let i = ; i <= a.length; i += ) {
const indicator = a[i - ] === b[j - ] ? : ;
distanceMatrix[j][i] = Math.min(
distanceMatrix[j][i - ] + , // deletion
distanceMatrix[j - ][i] + , // insertion
distanceMatrix[j - ][i - ] + indicator, // substitution
);
}
} return distanceMatrix[b.length][a.length];
}
编辑距离算法-DP问题的更多相关文章
- 用C#实现字符串相似度算法(编辑距离算法 Levenshtein Distance)
在搞验证码识别的时候需要比较字符代码的相似度用到"编辑距离算法",关于原理和C#实现做个记录. 据百度百科介绍: 编辑距离,又称Levenshtein距离(也叫做Edit Dist ...
- [转]字符串相似度算法(编辑距离算法 Levenshtein Distance)
转自:http://www.sigvc.org/bbs/forum.php?mod=viewthread&tid=981 http://www.cnblogs.com/ivanyb/archi ...
- Levenshtein Distance算法(编辑距离算法)
编辑距离 编辑距离(Edit Distance),又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数.许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符, ...
- 字符串相似度算法(编辑距离算法 Levenshtein Distance)(转)
在搞验证码识别的时候需要比较字符代码的相似度用到“编辑距离算法”,关于原理和C#实现做个记录. 据百度百科介绍: 编辑距离,又称Levenshtein距离(也叫做Edit Distance),是指两个 ...
- 字符串相似度算法(编辑距离算法 Levenshtein Distance)
在搞验证码识别的时候需要比较字符代码的相似度用到“编辑距离算法”,关于原理和C#实现做个记录.据百度百科介绍:编辑距离,又称Levenshtein距离(也叫做Edit Distance),是指两个字串 ...
- 自然语言处理(5)之Levenshtein最小编辑距离算法
自然语言处理(5)之Levenshtein最小编辑距离算法 题记:之前在公司使用Levenshtein最小编辑距离算法来实现相似车牌的计算的特性开发,正好本节来总结下Levenshtein最小编辑距离 ...
- 编辑距离算法(Levenshtein)
编辑距离定义: 编辑距离,又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数. 许可的编辑操作包括:将一个字符替换成另一个字符,插入一个字符,删除一个字符. 例如 ...
- Java实现编辑距离算法
Java实现编辑距离算法 编辑距离,又称Levenshtein距离(莱文斯坦距离也叫做Edit Distance),是指两个字串之间,由一个转成另一个所需的最少编辑操作次数,如果它们的距离越大,说明它 ...
- Levenshtein distance 编辑距离算法
这几天再看 virtrual-dom,关于两个列表的对比,讲到了 Levenshtein distance 距离,周末抽空做一下总结. Levenshtein Distance 介绍 在信息理论和计算 ...
随机推荐
- php 文件锁解决并发问题
阻塞(等待)模式: <?php $fp = fopen("lock.txt", "r"); if(flock($fp,LOCK_EX)) { //.. d ...
- 14)载入png图片
1)之前在窗口中载入图片 一般都是bmp的 但是 我想从网上下一些图片,这些图片可能是png的 2)那么就有了下面的操作 3)png图片可以直接做成透明的. 4)首先是创建窗口的基本代码: #i ...
- 第4章 ZK基本特性与基于Linux的ZK客户端命令行学习
第4章 ZK基本特性与基于Linux的ZK客户端命令行学习 4-1 zookeeper常用命令行操作 4-2 session的基本原理与create命令的使用
- KVM---利用 libvirt+qemu-kvm 创建虚拟机
KVM 虚拟化已经是一个工业级的虚拟化解决方案了,以前都是直接下载 VMware,然后安装其他操作系统的,今天我们来体验一下自己动手创建一台虚拟机,这样你就会知道在KVM下创建一台虚拟机,是多么简单的 ...
- c语言中多维数组和指针的关系
如图: 执行结果: 说明:由执行结果可知,三个输出的结果相等(可能在不同的平台执行结果不相同,但三个的结果是相等的),数组multi的地址与数组multi[0]的地址相同,都等于存储的第一个整数的地址 ...
- iOS筛选菜单、分段选择器、导航栏、悬浮窗、转场动画、启动视频等源码
iOS精选源码 APP启动视频 自定义按钮,图片可调整图文间距SPButton 一款定制性极高的轮播图,可自定义轮播图Item的样式(或只... iOS 筛选菜单 分段选择器 仿微信导航栏的实现,让你 ...
- PyTorch基础——机器翻译的神经网络实现
一.介绍 内容 "基于神经网络的机器翻译"出现了"编码器+解码器+注意力"的构架,让机器翻译的准确度达到了一个新的高度.所以本次主题就是"基于深度神经 ...
- python+selenium自动化--参数化(paramunittest)
unnittest的参数化模块-paramunittest paramunittest是unittest实现参数化的一个专门的模块,可以传入多组参数,自动生成多个用例 两种用法 import unit ...
- C# 使用 HttpPost 请求调用 WebService (转)
转自 https://www.cnblogs.com/Brambling/p/7266482.html 之前调用 WebService 都是直接添加服务引用,然后调用 WebService 方法的,最 ...
- Opencv笔记(十四)——边缘检测算法canny
简介 Canny 边缘检测算法 是 John F. Canny 于 1986年开发出来的一个多级边缘检测算法,也被很多人认为是边缘检测的 最优算法,它是由很多步构成的算法. 最优边缘检测的三个主要评价 ...