[LeetCode] Short Encoding of Words 单词集的短编码

Given a list of words, we may encode it by writing a reference string S and a list of indexes A.

For example, if the list of words is ["time", "me", "bell"], we can write it as S = "time#bell#" and indexes = [0, 2, 5].

Then for each index, we will recover the word by reading from the reference string from that index until we reach a "#" character.

What is the length of the shortest reference string S possible that encodes the given words?

Example:

Input: words = ["time", "me", "bell"]

Output: 10

Explanation: S = "time#bell#" and indexes = [0, 2, 5].

Note:

1 <= words.length <= 2000.
1 <= words[i].length <= 7.
Each word has only lowercase letters.

这道题给了我们一个单词数组，让我们对其编码，不同的单词之间加入#号，每个单词的起点放在一个坐标数组内，终点就是#号，能合并的单词要进行合并，问输入字符串的最短长度。题意不难理解，难点在于如何合并单词，我们观察题目的那个例子，me和time是能够合并的，只要标清楚其实位置，time的起始位置是0，me的起始位置是2，那么根据#号位置的不同就可以顺利的取出me和time。需要注意的是，如果me换成im，或者tim的话，就不能合并了，因为我们是要从起始位置到#号之前所有的字符都要取出来。搞清楚了这一点之后，我们在接着观察，由于me是包含在time中的，所以我们处理的顺序应该是先有time#，然后再看能否包含me，而不是先生成了me#之后再处理time，所以我们可以得出结论，应该先处理长单词，那么就给单词数组按长度排序一下就行，自己重写一个comparator就行。然后我们遍历数组，对于每个单词，我们都在编码字符串查找一下，如果没有的话，直接加上这个单词，再加一个#号，如果有的话，就可以得到出现的位置。比如在time#中查找me，得到found=2，然后我们要验证该单词后面是否紧跟着一个#号，所以我们直接访问found+word.size()这个位置，如果不是#号，说明不能合并，我们还是要加上这个单词和#号。最后返回编码字符串的长度即可，参见代码如下：

解法一：

class Solution {

public:

    int minimumLengthEncoding(vector<string>& words) {

        string str = "";

        sort(words.begin(), words.end(), [](string& a, string& b){return a.size() > b.size();});

        for (string word : words) {

            int found = str.find(word);

            if (found == string::npos || str[found + word.size()] != '#') {

                str += word + "#";

            }

        }

        return str.size();

    }

};

我们再来看一种不用自定义comparator的方法，根据之前的分析，我们知道其实是在找单词的后缀，比如me就是time的后缀。我们希望将能合并的单词排在一起，比较好处理，而后缀又不好排序。那么我们就将其转为前缀，做法就是给每个单词翻转一下，time变成emit，me变成em，这样我们只要用默认的字母顺序排，就可以得到em，emit的顺序，那么能合并的单词就放到一起了，而且一定是当前的合并到后面一个，那么就好做很多了。我们只要判读当前单词是否是紧跟着的单词的前缀，是的话就加0，不是的话就要加上当前单词的长度并再加1，多加的1是#号。判断前缀的方法很简单，直接在后面的单词中取相同长度的前缀比较就行了。由于我们每次都要取下一个单词，为了防止越界，只处理到倒数第二个单词，那么就要把最后一个单词的长度加入结果res，并再加1即可，参见代码如下：

解法二：

class Solution {

public:

    int minimumLengthEncoding(vector<string>& words) {

        int res = , n = words.size();

        for (int i = ; i < n; ++i) reverse(words[i].begin(), words[i].end());

        sort(words.begin(), words.end());

        for (int i = ; i < n - ; ++i) {

            res += (words[i] == words[i + ].substr(, words[i].size())) ?   : words[i].size() + ;

        }

        return res + words.back().size() + ;

    }

};

接下来的这种方法也很巧妙，用了一个HashSet，将所有的单词先放到这个HashSet中。原理是对于每个单词，我们遍历其所有的后缀，比如time，那么就遍历ime，me，e，然后看HashSet中是否存在这些后缀，有的话就删掉，那么HashSet中的me就会被删掉，这样保证了留下来的单词不可能再合并了，最后再加上每个单词的长度到结果res，并且同时要加上#号的长度，参见代码如下：

解法三：

class Solution {

public:

    int minimumLengthEncoding(vector<string>& words) {

        int res = ;

        unordered_set<string> st(words.begin(), words.end());

        for (string word : st) {

            for (int i = ; i < word.size(); ++i) {

                st.erase(word.substr(i));

            }

        }

        for (string word : st) res += word.size() + ;

        return res;

    }

};

参考资料：

https://leetcode.com/problems/short-encoding-of-words/

https://leetcode.com/problems/short-encoding-of-words/discuss/125825/Easy-to-understand-Java-solution

https://leetcode.com/problems/short-encoding-of-words/discuss/125822/C%2B%2B-4-lines-reverse-and-sort

https://leetcode.com/problems/short-encoding-of-words/discuss/125811/C%2B%2BJavaPython-Easy-Understood-Solution-with-Explanation

LeetCode All in One 题目讲解汇总(持续更新中...)

[LeetCode] Short Encoding of Words 单词集的短编码的更多相关文章

LC 820. Short Encoding of Words
Given a list of words, we may encode it by writing a reference string S and a list of indexes A. For ...
2.keras实现-->字符级或单词级的one-hot编码 VS 词嵌入
1. one-hot编码 # 字符集的one-hot编码 import string samples = ['zzh is a pig','he loves himself very much','p ...
【LeetCode】820. 单词的压缩编码 Short Encoding of Words（Python）
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址:https://leetcode-cn.com/problems/short- ...
[Swift]LeetCode820. 单词的压缩编码 | Short Encoding of Words
Given a list of words, we may encode it by writing a reference string S and a list of indexes A. For ...
【leetcode】820. Short Encoding of Words
题目如下: 解题思路:本题考查就是找出一个单词是不是另外一个单词的后缀,如果是的话,就可以Short Encode.所以,我们可以把words中每个单词倒置后排序,然后遍历数组,每个元素只要和其后面相 ...
[LeetCode] Concatenated Words 连接的单词
Given a list of words (without duplicates), please write a program that returns all concatenated wor ...
[LeetCode] Valid Word Square 验证单词平方
Given a sequence of words, check whether it forms a valid word square. A sequence of words forms a v ...
[LeetCode] Valid Word Abbreviation 验证单词缩写
Given a non-empty string s and an abbreviation abbr, return whether the string matches with the give ...
[LeetCode] Shortest Word Distance 最短单词距离
Given a list of words and two words word1 and word2, return the shortest distance between these two ...

随机推荐

第五节：从源码的角度理解各种Result(ActionResult、JsonResult、JavaScriptResult等)
一. 背景提到MVC不得不说MVC中的各种Result,这些高度封装的xxxResult以及在xxxResult再度封装的xxx,大大提高了MVC框架的开发效率. 相信做过MVC开发的朋友都会用到过 ...
Blender 2.78 突然卡顿
之前一直好好的直到这两天打开就一直延迟半拍,重启重装无效. 解决方法: NVIDIA控制面板→管理3d设置→程序设置→选择blender,更改显卡为集成显卡→应用亲测有效.
MinGW GCC 8.1.0 2018年5月2日出炉啦
MSYS_MinGW-w64_GCC_810_x86-x64.7z for x86 x64 59.0 MB发布日期: 2018-05-04 下载地址:https://sourceforge.net/p ...
css-块级格式上下文
定义: 块级格式上下文(Block Formatting Context)是CSS中一个相对冷门的概念,今天被问到才引起注意,下文简单介绍下它的用法,学习资料多来源于网络,实际开发中遇到再继续更博 ...
【easy】88. Merge Sorted Array 合并两个有序数组
合并两个有序的list 把排序好的nums2插入nums1中,假设nums1这个vector的空间永远是够的思路:倒序!! class Solution { public: void merge(v ...
zabbix3.2使用fping批量监控ip的连通性
.在zabbix-agent端安装fping wget http://www.fping.org/dist/fping-3.16.tar.gz tar zxvf fping-3.16.tar.gz c ...
【原创】大数据基础之Kudu（1）简介、安装、使用
kudu 1.7 官方:https://kudu.apache.org/ 一简介 kudu有很多概念,有分布式文件系统(HDFS),有一致性算法(Zookeeper),有Table(Hive Tab ...
【深度学习】吴恩达网易公开课练习(class2 week1 task2 task3)
正则化定义:正则化就是在计算损失函数时,在损失函数后添加权重相关的正则项. 作用:减少过拟合现象正则化有多种,有L1范式,L2范式等.一种常用的正则化公式 \[J_{regularized} = ...
Golang 新手可能会踩的 50 个坑【转】
译文:https://github.com/wuYin/blog/blob/master/50-shades-of-golang-traps-gotchas-mistakes.md 原文:50 Sha ...
Java Spring Boot VS .NetCore （十） Java Interceptor vs .NetCore Interceptor
Java Spring Boot VS .NetCore (一)来一个简单的 Hello World Java Spring Boot VS .NetCore (二)实现一个过滤器Filter Jav ...

[LeetCode] Short Encoding of Words 单词集的短编码

[LeetCode] Short Encoding of Words 单词集的短编码的更多相关文章

随机推荐

热门专题