Moderate 加入空格使得可辨别单词数量最多 @CareerCup
递归题目,注意结合了memo的方法和trie的应用
package Moderate; import java.util.Hashtable; import CtCILibrary.AssortedMethods;
import CtCILibrary.Trie; /**
* Oh, no! You have just completed a lengthy document when you have an unfortu-
nate Find/Replace mishap. You have accidentally removed all spaces, punctuation,
and capitalization in the document. A sentence like "I reset the computer. It still
didn't boot!" would become "iresetthecomputeritstilldidntboot". You figure that you
can add back in the punctation and capitalization later, once you get the individual
words properly separated. Most of the words will be in a dictionary, but some strings,
like proper names, will not.
Given a dictionary (a list of words), design an algorithm to find the optimal way of
"unconcatenating" a sequence of words. In this case, "optimal" is defined to be the
parsing which minimizes the number of unrecognized sequences of characters.
For example, the string "jesslookedjustliketimherbrother" would be optimally parsed
as "JESS looked just like TIM her brother". This parsing has seven unrecognized char-
acters, which we have capitalized for clarity. 给一个string,把string内的所有标点,空格都去掉。然后要求找到把空格加回去使得不可辨别的
单词数量达到最少的方法(判断是否可以辨别是通过提供一个字典来判断) *
*/
public class S17_14 { public static String sentence;
public static Trie dictionary; /* incomplete code */
public static Result parse(int wordStart, int wordEnd, Hashtable<Integer, Result> cache) {
if (wordEnd >= sentence.length()) {
return new Result(wordEnd - wordStart, sentence.substring(wordStart).toUpperCase());
}
if (cache.containsKey(wordStart)) {
return cache.get(wordStart).clone();
}
String currentWord = sentence.substring(wordStart, wordEnd + 1);
boolean validPartial = dictionary.contains(currentWord, false);
boolean validExact = validPartial && dictionary.contains(currentWord, true); /* break current word */
Result bestExact = parse(wordEnd + 1, wordEnd + 1, cache);
if (validExact) {
bestExact.parsed = currentWord + " " + bestExact.parsed;
} else {
bestExact.invalid += currentWord.length();
bestExact.parsed = currentWord.toUpperCase() + " " + bestExact.parsed;
} /* extend current word */
Result bestExtend = null;
if (validPartial) {
bestExtend = parse(wordStart, wordEnd + 1, cache);
} /* find best */
Result best = Result.min(bestExact, bestExtend);
cache.put(wordStart, best.clone());
return best;
} public static int parseOptimized(int wordStart, int wordEnd, Hashtable<Integer, Integer> cache) {
if (wordEnd >= sentence.length()) {
return wordEnd - wordStart;
}
if (cache.containsKey(wordStart)) {
return cache.get(wordStart);
} String currentWord = sentence.substring(wordStart, wordEnd + 1);
boolean validPartial = dictionary.contains(currentWord, false); /* break current word */
int bestExact = parseOptimized(wordEnd + 1, wordEnd + 1, cache);
if (!validPartial || !dictionary.contains(currentWord, true)) {
bestExact += currentWord.length();
} /* extend current word */
int bestExtend = Integer.MAX_VALUE;
if (validPartial) {
bestExtend = parseOptimized(wordStart, wordEnd + 1, cache);
} /* find best */
int min = Math.min(bestExact, bestExtend);
cache.put(wordStart, min);
return min;
} public static int parseSimple(int wordStart, int wordEnd) {
if (wordEnd >= sentence.length()) {
return wordEnd - wordStart;
} String word = sentence.substring(wordStart, wordEnd + 1); /* break current word */
int bestExact = parseSimple(wordEnd + 1, wordEnd + 1);
if (!dictionary.contains(word, true)) {
bestExact += word.length();
} /* extend current word */
int bestExtend = parseSimple(wordStart, wordEnd + 1); /* find best */
return Math.min(bestExact, bestExtend);
} public static String clean(String str) {
char[] punctuation = {',', '"', '!', '.', '\'', '?', ','};
for (char c : punctuation) {
str = str.replace(c, ' ');
}
return str.replace(" ", "").toLowerCase();
} public static void main(String[] args) {
dictionary = AssortedMethods.getTrieDictionary();
sentence = "As one of the top companies in the world, Google will surely attract the attention of computer gurus. This does not, however, mean the company is for everyone.";
sentence = clean(sentence);
System.out.println(sentence);
//Result v = parse(0, 0, new Hashtable<Integer, Result>());
//System.out.println(v.parsed);
int v = parseOptimized(0, 0, new Hashtable<Integer, Integer>());
System.out.println(v);
} static class Result {
public int invalid = Integer.MAX_VALUE;
public String parsed = "";
public Result(int inv, String p) {
invalid = inv;
parsed = p;
} public Result clone() {
return new Result(this.invalid, this.parsed);
} public static Result min(Result r1, Result r2) {
if (r1 == null) {
return r2;
} else if (r2 == null) {
return r1;
} return r2.invalid < r1.invalid ? r2 : r1;
}
} }
Moderate 加入空格使得可辨别单词数量最多 @CareerCup的更多相关文章
- Storm监控文件夹变化 统计文件单词数量
监控指定文件夹,读取文件(新文件动态读取)里的内容,统计单词的数量. FileSpout.java,监控文件夹,读取新文件内容 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
- python核心编程正则表达式练习题1-2匹配由单个空格分隔的任意单词对,也就是性和名
# 匹配由单个空格分隔的任意单词对,也就是姓和名 import re patt = '[A-Za-z]+ [A-Za-z]+' # 方法一 +加号操作符匹配它左边的正则表达式至少出现一次的情况 # p ...
- Python GitHub上星星数量最多的项目
GitHub上星星数量最多的项目 """ most_popular.py 查看GitHub上获得星星最多的项目都是用什么语言写的 """ i ...
- Java基础IO类之字符串流(查字符串中的单词数量)与管道流
一.字符串流 定义:字符串流(StringReader),以一个字符为数据源,来构造一个字符流. 作用:在Web开发中,我们经常要从服务器上获取数据,数据返回的格式通常一个字符串(XML.JSON), ...
- go语言小练习——给定英语文章统计单词数量
给定一篇英语文章,要求统计出所有单词的个数,并按一定次序输出.思路是利用go语言的map类型,以每个单词作为关键字存储数量信息,代码实现如下: package main import ( " ...
- 练习1-21:编写程序entab,将空格串替换为最少数量的制表符和空格。。。(C程序设计语言 第2版)
#include <stdio.h> #define N 5 main() { int i, j, c, lastc; lastc = 'a'; i = j = ; while ((c=g ...
- hadoop-mapreduce-(1)-统计单词数量
编写map程序 package com.cvicse.ump.hadoop.mapreduce.map; import java.io.IOException; import org.apache.h ...
- 在Linux系统下有一个目录/usr/share/dict/ 这个目录里包含了一个词典的文本文件,我们可以利用这个文件来辨别单词是否为词典中的单词。
#!/bin/bash s=`cat /usr/share/dict/linux.words` for i in $s; do if [ $1 = $i ];then echo "$1 在字 ...
- Python的 counter内置函数,统计文本中的单词数量
counter是 colletions内的一个类 可以理解为一个简单的计数 import collections str1=['a','a','b','d'] m=collections.Counte ...
随机推荐
- C#winform程序自定义鼠标样式
public void SetCursor(Bitmap cursor, Point hotPoint) { int hotX = hotPoint.X; int hotY = hotPoint.Y; ...
- vijos P1055奶牛浴场&& Winter Camp2002
这道题是我在寒假的模拟赛里碰到的,现在想起来仍觉得余味无穷.题目大意大致如下:给你一个矩形并在其中划出一个最大的子矩形,当然,在这个矩形里有些地方是取不到的,也就是说我们划的这个子矩形不能包含这些点( ...
- ng的数据绑定
ng创建了一个自己的事件循环,当浏览器事件(常用的dom事件,xhr事件等)发生时,对DOM对应的数据进行检查,若更改了,则标记为脏值,并进入更新循环,修改对应的(可能是多个) DOM的参数.这样就实 ...
- JDK与Tomcat的联系
如果服务器没有安装JDK或没有配置JDK环境变量,则Tomcat启动出错 报错:需要JAVA_HOME 或JRE_HOME环境变量 所以必须首先安装JDK 配置环境变量 web服务器Tomcat才能运 ...
- python json string和dict的转化
__author__ = 'SRC_TJ_XiaoqingZhang' import json data1 = {'b': 789, 'c': 456, 'a': 123} encode_json = ...
- python django 自定义 装饰器
# -*-coding:utf-8-*- __author__ = "GILANG (pleasurelong@foxmail.com)" """ d ...
- Obj-C的hello,world 1
不得不说,Obj-C所谓的中缀表达式真的蛮奇怪的,当无参或者只有一个参数时看起来还不错: //无参数的方法 -(void) say; [employee say]; //只有一个参数的方法 -(voi ...
- 转:Java架构师与开发者提高效率的10个工具
原文来自于:http://www.importnew.com/14624.html Java受到全球百万计开发者的追捧,已经演变为一门出色的编程语言.最终,这门语言随着技术的变化,不断的被改善以迎合变 ...
- Microsfot SQL Server 2012 日志收缩
//Microsfot SQL Server 2012 日志收缩 USE DataBaseName;GO ALTER DATABASE DataBaseNameSET RECOVERY SIMPLE; ...
- Unity OF 3DMax毛坯房制作标准
Unity OF 3DMax毛坯房制作标准 1.模型 2.贴图 3.模型塌陷展UV 4.灯光 5.Radiosity 6.Render To Texture 7.烘焙 8.导出 1.模型回目录 ...