最喜欢的算法(们) - Levenshtein distance
String Matching: Levenshtein distance
- Purpose: to use as little effort to convert one string into the other
- Intuition behind the method: replacement, addition or deletion of a charcter in a string
- Steps
|
Step |
Description |
|---|---|
|
1 |
Set n to be the length of s. Set m to be the length of t. If n = 0, return m and exit. If m = 0, return n and exit. Construct a matrix containing 0..m rows and 0..n columns. |
|
2 |
Initialize the first row to 0..n. Initialize the first column to 0..m. |
|
3 |
Examine each character of s (i from 1 to n). |
|
4 |
Examine each character of t (j from 1 to m). |
|
5 |
If s[i] equals t[j], the cost is 0. If s[i] doesn't equal t[j], the cost is 1. |
|
6 |
Set cell d[i,j] of the matrix equal to the minimum of: a. The cell immediately above plus 1: d[i-1,j] + 1. b. The cell immediately to the left plus 1: d[i,j-1] + 1. c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost. |
|
7 |
After the iteration steps (3, 4, 5, 6) are complete, the distance is found in cell d[n,m]. |
Example
This section shows how the Levenshtein distance is computed when the source string is "GUMBO" and the target string is "GAMBOL".
Steps 1 and 2
| G | U | M | B | O | ||
| 0 | 1 | 2 | 3 | 4 | 5 | |
| G | 1 | |||||
| A | 2 | |||||
| M | 3 | |||||
| B | 4 | |||||
| O | 5 | |||||
| L | 6 |
Steps 3 to 6 When i = 1
| G | U | M | B | O | ||
| 0 | 1 | 2 | 3 | 4 | 5 | |
| G | 1 | 0 | ||||
| A | 2 | 1 | ||||
| M | 3 | 2 | ||||
| B | 4 | 3 | ||||
| O | 5 | 4 | ||||
| L | 6 | 5 |
Steps 3 to 6 When i = 2
| G | U | M | B | O | ||
| 0 | 1 | 2 | 3 | 4 | 5 | |
| G | 1 | 0 | 1 | |||
| A | 2 | 1 | 1 | |||
| M | 3 | 2 | 2 | |||
| B | 4 | 3 | 3 | |||
| O | 5 | 4 | 4 | |||
| L | 6 | 5 | 5 |
Steps 3 to 6 When i = 3
| G | U | M | B | O | ||
| 0 | 1 | 2 | 3 | 4 | 5 | |
| G | 1 | 0 | 1 | 2 | ||
| A | 2 | 1 | 1 | 2 | ||
| M | 3 | 2 | 2 | 1 | ||
| B | 4 | 3 | 3 | 2 | ||
| O | 5 | 4 | 4 | 3 | ||
| L | 6 | 5 | 5 | 4 |
Steps 3 to 6 When i = 4
| G | U | M | B | O | ||
| 0 | 1 | 2 | 3 | 4 | 5 | |
| G | 1 | 0 | 1 | 2 | 3 | |
| A | 2 | 1 | 1 | 2 | 3 | |
| M | 3 | 2 | 2 | 1 | 2 | |
| B | 4 | 3 | 3 | 2 | 1 | |
| O | 5 | 4 | 4 | 3 | 2 | |
| L | 6 | 5 | 5 | 4 | 3 |
Steps 3 to 6 When i = 5
| G | U | M | B | O | ||
| 0 | 1 | 2 | 3 | 4 | 5 | |
| G | 1 | 0 | 1 | 2 | 3 | 4 |
| A | 2 | 1 | 1 | 2 | 3 | 4 |
| M | 3 | 2 | 2 | 1 | 2 | 3 |
| B | 4 | 3 | 3 | 2 | 1 | 2 |
| O | 5 | 4 | 4 | 3 | 2 | 1 |
| L | 6 | 5 | 5 | 4 | 3 | 2 |
Step 7
The distance is in the lower right hand corner of the matrix, i.e. 2. This corresponds to our intuitive realization that "GUMBO" can be transformed into "GAMBOL" by substituting "A" for "U" and adding "L" (one substitution and 1 insertion = 2 changes).
最喜欢的算法(们) - Levenshtein distance的更多相关文章
- Java 比较两个字符串的相似度算法(Levenshtein Distance)
转载自: https://blog.csdn.net/JavaReact/article/details/82144732 算法简介: Levenshtein Distance,又称编辑距离,指的是两 ...
- 字符串相似度算法(编辑距离Levenshtein Distance)的应用场景
应用场景 DNA分析: 将DNA的一级序列如β-球蛋白基因的第一个外显子(Exon)转化为分子“结构图”,然后由所得“结构图”提取图的不变量,如分子连接性指数.以图的不变量作为自变量,再由相似度计算公 ...
- 用C#实现字符串相似度算法(编辑距离算法 Levenshtein Distance)
在搞验证码识别的时候需要比较字符代码的相似度用到"编辑距离算法",关于原理和C#实现做个记录. 据百度百科介绍: 编辑距离,又称Levenshtein距离(也叫做Edit Dist ...
- Levenshtein Distance(编辑距离)算法与使用场景
前提 已经很久没深入研究过算法相关的东西,毕竟日常少用,就算死记硬背也是没有实施场景导致容易淡忘.最近在做一个脱敏数据和明文数据匹配的需求的时候,用到了一个算法叫Levenshtein Distanc ...
- C#实现Levenshtein distance最小编辑距离算法
Levenshtein distance,中文名为最小编辑距离,其目的是找出两个字符串之间需要改动多少个字符后变成一致.该算法使用了动态规划的算法策略,该问题具备最优子结构,最小编辑距离包含子最小编辑 ...
- Levenshtein Distance算法(编辑距离算法)
编辑距离 编辑距离(Edit Distance),又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数.许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符, ...
- Magic Number(Levenshtein distance算法)
Magic Number Time Limit:1000MS Memory Limit:65536KB 64bit IO Format:%I64d & %I64u Submit ...
- 字符串相似度算法(编辑距离算法 Levenshtein Distance)(转)
在搞验证码识别的时候需要比较字符代码的相似度用到“编辑距离算法”,关于原理和C#实现做个记录. 据百度百科介绍: 编辑距离,又称Levenshtein距离(也叫做Edit Distance),是指两个 ...
- 扒一扒编辑距离(Levenshtein Distance)算法
最近由于工作需要,接触了编辑距离(Levenshtein Distance)算法.赶脚很有意思.最初百度了一些文章,但讲的都不是很好,读起来感觉似懂非懂.最后还是用google找到了一些资料才慢慢理解 ...
随机推荐
- Generics and Collection (2)
Integer is a subtype of NumberDouble is a subtype of NumberArrayList<E> is a subtype of List&l ...
- Tomcat启动时项目重复加载,导致资源初始化两次的问题
http://blog.csdn.net/testcs_dn/article/details/38855641
- vim 标记 mark 详解 (转载)
http://www.cnblogs.com/jianyungsun/archive/2011/02/14/1954057.html Vim 允许你在文本中放置自定义的标记.命令 "ma&q ...
- Mysql Communication link failure :1153 Got a packet bigger than 'max_allowed_packet' bytes
出现这种情况: 临时解决方法是: 登录mysql: 执行: set global max_allowed_packet=1000000000; set global net_buffer_ ...
- String类的常用判断方法使用练习
选取了一些常用的判断方法进行了使用练习,后续跟新其他方法 package StringDemo; // String类的判断方法解析 // 1:boolean equals(); // 判断字符串是否 ...
- POJ 1015 Jury Compromise 2个月后重做,其实这是背包题目
http://poj.org/problem?id=1015 题目大意:在遥远的国家佛罗布尼亚,嫌犯是否有罪,须由陪审团决定.陪审团是由法官从公众中挑选的.先随机挑选n个人作为陪审团的候选人,然后再从 ...
- Android 开发环境配置
转至:http://www.cnblogs.com/shangdahao/archive/2013/04/17/3025429.html Windows下的开发环境需要安装以下软件: Java JDK ...
- 又一种XML的解析方法
[Fact(DisplayName="用户名为空")] public void Should_UsernameEmpty() { var paras = new Dictionar ...
- Sublime text 3 中Package Control 的安装与使用方法和解决Sublime Text 3不能正确显示中文的问题
Sublime text 3 中Package Control 的安装与使用方法,英文好可以在这个网址看看, 下面简单的说明一下 : https://packagecontrol.io/install ...
- Java程序设计笔记
程序:编写Java程序,此程序从命令行接收多个数字,求和之后输出结果. 设计思想:首先在程序中设置关于参数个数的长度的公式,用.length公式读出用户所设置的参数的个数,参数默认为字符串类型,利用强 ...