Moderate 加入空格使得可辨别单词数量最多 @CareerCup
递归题目,注意结合了memo的方法和trie的应用
package Moderate; import java.util.Hashtable; import CtCILibrary.AssortedMethods;
import CtCILibrary.Trie; /**
* Oh, no! You have just completed a lengthy document when you have an unfortu-
nate Find/Replace mishap. You have accidentally removed all spaces, punctuation,
and capitalization in the document. A sentence like "I reset the computer. It still
didn't boot!" would become "iresetthecomputeritstilldidntboot". You figure that you
can add back in the punctation and capitalization later, once you get the individual
words properly separated. Most of the words will be in a dictionary, but some strings,
like proper names, will not.
Given a dictionary (a list of words), design an algorithm to find the optimal way of
"unconcatenating" a sequence of words. In this case, "optimal" is defined to be the
parsing which minimizes the number of unrecognized sequences of characters.
For example, the string "jesslookedjustliketimherbrother" would be optimally parsed
as "JESS looked just like TIM her brother". This parsing has seven unrecognized char-
acters, which we have capitalized for clarity. 给一个string,把string内的所有标点,空格都去掉。然后要求找到把空格加回去使得不可辨别的
单词数量达到最少的方法(判断是否可以辨别是通过提供一个字典来判断) *
*/
public class S17_14 { public static String sentence;
public static Trie dictionary; /* incomplete code */
public static Result parse(int wordStart, int wordEnd, Hashtable<Integer, Result> cache) {
if (wordEnd >= sentence.length()) {
return new Result(wordEnd - wordStart, sentence.substring(wordStart).toUpperCase());
}
if (cache.containsKey(wordStart)) {
return cache.get(wordStart).clone();
}
String currentWord = sentence.substring(wordStart, wordEnd + 1);
boolean validPartial = dictionary.contains(currentWord, false);
boolean validExact = validPartial && dictionary.contains(currentWord, true); /* break current word */
Result bestExact = parse(wordEnd + 1, wordEnd + 1, cache);
if (validExact) {
bestExact.parsed = currentWord + " " + bestExact.parsed;
} else {
bestExact.invalid += currentWord.length();
bestExact.parsed = currentWord.toUpperCase() + " " + bestExact.parsed;
} /* extend current word */
Result bestExtend = null;
if (validPartial) {
bestExtend = parse(wordStart, wordEnd + 1, cache);
} /* find best */
Result best = Result.min(bestExact, bestExtend);
cache.put(wordStart, best.clone());
return best;
} public static int parseOptimized(int wordStart, int wordEnd, Hashtable<Integer, Integer> cache) {
if (wordEnd >= sentence.length()) {
return wordEnd - wordStart;
}
if (cache.containsKey(wordStart)) {
return cache.get(wordStart);
} String currentWord = sentence.substring(wordStart, wordEnd + 1);
boolean validPartial = dictionary.contains(currentWord, false); /* break current word */
int bestExact = parseOptimized(wordEnd + 1, wordEnd + 1, cache);
if (!validPartial || !dictionary.contains(currentWord, true)) {
bestExact += currentWord.length();
} /* extend current word */
int bestExtend = Integer.MAX_VALUE;
if (validPartial) {
bestExtend = parseOptimized(wordStart, wordEnd + 1, cache);
} /* find best */
int min = Math.min(bestExact, bestExtend);
cache.put(wordStart, min);
return min;
} public static int parseSimple(int wordStart, int wordEnd) {
if (wordEnd >= sentence.length()) {
return wordEnd - wordStart;
} String word = sentence.substring(wordStart, wordEnd + 1); /* break current word */
int bestExact = parseSimple(wordEnd + 1, wordEnd + 1);
if (!dictionary.contains(word, true)) {
bestExact += word.length();
} /* extend current word */
int bestExtend = parseSimple(wordStart, wordEnd + 1); /* find best */
return Math.min(bestExact, bestExtend);
} public static String clean(String str) {
char[] punctuation = {',', '"', '!', '.', '\'', '?', ','};
for (char c : punctuation) {
str = str.replace(c, ' ');
}
return str.replace(" ", "").toLowerCase();
} public static void main(String[] args) {
dictionary = AssortedMethods.getTrieDictionary();
sentence = "As one of the top companies in the world, Google will surely attract the attention of computer gurus. This does not, however, mean the company is for everyone.";
sentence = clean(sentence);
System.out.println(sentence);
//Result v = parse(0, 0, new Hashtable<Integer, Result>());
//System.out.println(v.parsed);
int v = parseOptimized(0, 0, new Hashtable<Integer, Integer>());
System.out.println(v);
} static class Result {
public int invalid = Integer.MAX_VALUE;
public String parsed = "";
public Result(int inv, String p) {
invalid = inv;
parsed = p;
} public Result clone() {
return new Result(this.invalid, this.parsed);
} public static Result min(Result r1, Result r2) {
if (r1 == null) {
return r2;
} else if (r2 == null) {
return r1;
} return r2.invalid < r1.invalid ? r2 : r1;
}
} }
Moderate 加入空格使得可辨别单词数量最多 @CareerCup的更多相关文章
- Storm监控文件夹变化 统计文件单词数量
监控指定文件夹,读取文件(新文件动态读取)里的内容,统计单词的数量. FileSpout.java,监控文件夹,读取新文件内容 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
- python核心编程正则表达式练习题1-2匹配由单个空格分隔的任意单词对,也就是性和名
# 匹配由单个空格分隔的任意单词对,也就是姓和名 import re patt = '[A-Za-z]+ [A-Za-z]+' # 方法一 +加号操作符匹配它左边的正则表达式至少出现一次的情况 # p ...
- Python GitHub上星星数量最多的项目
GitHub上星星数量最多的项目 """ most_popular.py 查看GitHub上获得星星最多的项目都是用什么语言写的 """ i ...
- Java基础IO类之字符串流(查字符串中的单词数量)与管道流
一.字符串流 定义:字符串流(StringReader),以一个字符为数据源,来构造一个字符流. 作用:在Web开发中,我们经常要从服务器上获取数据,数据返回的格式通常一个字符串(XML.JSON), ...
- go语言小练习——给定英语文章统计单词数量
给定一篇英语文章,要求统计出所有单词的个数,并按一定次序输出.思路是利用go语言的map类型,以每个单词作为关键字存储数量信息,代码实现如下: package main import ( " ...
- 练习1-21:编写程序entab,将空格串替换为最少数量的制表符和空格。。。(C程序设计语言 第2版)
#include <stdio.h> #define N 5 main() { int i, j, c, lastc; lastc = 'a'; i = j = ; while ((c=g ...
- hadoop-mapreduce-(1)-统计单词数量
编写map程序 package com.cvicse.ump.hadoop.mapreduce.map; import java.io.IOException; import org.apache.h ...
- 在Linux系统下有一个目录/usr/share/dict/ 这个目录里包含了一个词典的文本文件,我们可以利用这个文件来辨别单词是否为词典中的单词。
#!/bin/bash s=`cat /usr/share/dict/linux.words` for i in $s; do if [ $1 = $i ];then echo "$1 在字 ...
- Python的 counter内置函数,统计文本中的单词数量
counter是 colletions内的一个类 可以理解为一个简单的计数 import collections str1=['a','a','b','d'] m=collections.Counte ...
随机推荐
- 2016年9月ccf
去长沙理工考ccf.恰好又可以见闺蜜. 前2道题很简单,第三题题目太长就跳过了[绕来绕去就像“你儿子是我儿子的爸爸一样头疼”],就做第四题.但是还有最后一个部分没写写好就到点了. 现在把它补充完整. ...
- SGU 190.Dominoes(二分图匹配)
时间限制:0.25s 空间限制:4M 题意: 给定一个N*N的棋盘,一些格子被移除,在棋盘上放置一些1*2的骨牌,判定能否放满,并且输出任意方案. Solution: 首先考虑对棋盘的一个格子黑白染色 ...
- 自构BeanHandler(用BeansUtils)
class BeanHandler<T> implements ResultSetHandler<T>{ private Class<T> clazz; publi ...
- window下配置SSH连接GitHub、GitHub配置ssh key(转)
转自:http://jingyan.baidu.com/article/a65957f4e91ccf24e77f9b11.html 此经验分两部分: 第一部分介绍:在windows下通过msysGit ...
- 《C和指针》章节后编程练习解答参考——6.2
<C和指针>——6.2 题目: 编写一个函数,删除源字符串中含有的子字符串部分. 函数原型: int del_substr(char *str, char const *substr); ...
- operation 多线程
2.Cocoa Operation 优点:不需要关心线程管理,数据同步的事情.Cocoa Operation 相关的类是 NSOperation ,NSOperationQueue.NSOperati ...
- 跨平台的CStdString类,实现了CString的接口
在实际工作中,std的string功能相对于MFC的CString来说,实在是相形见绌. CStdString类实现了CString的功能,支持跨平台. // ==================== ...
- cf C. Fox and Box Accumulation
题意:输入一个n,然后输入n个数,问你可以划分多少个序列,序列为:其中一个数为c,在它的前面最多可以有c个数. 思路:先排序,然后对于每一个数逐步的找没有被用过的数,且这个数可以符合条件,然后如果没有 ...
- ParentWindow属性及其一系列函数的作用——适合于那些不需要父控件管理内存释放的子控件
TWinControl = class(TControl) property ParentWindow: HWnd read FParentWindow write SetParentWindow; ...
- struts2令牌,防止重复提交
struts2的令牌,可以用来防止重复提交,其原理是在提交jsp页面中,写入一个隐藏域name="token",然后在action中定义一个变量token并get.set.在服务器 ...