Moderate 加入空格使得可辨别单词数量最多 @CareerCup

递归题目，注意结合了memo的方法和trie的应用

package Moderate;

import java.util.Hashtable;

import CtCILibrary.AssortedMethods;

import CtCILibrary.Trie;

/**

 *  Oh, no! You have just completed  a lengthy  document when you have an  unfortu-

nate Find/Replace mishap.  You have accidentally  removed all spaces, punctuation,

and  capitalization  in  the document. A sentence like  "I reset  the  computer.  It  still

didn't boot!" would become "iresetthecomputeritstilldidntboot".  You figure that you

can add back in the punctation and capitalization later, once you get the individual

words properly separated. Most of the words will be in a dictionary,  but some strings,

like proper names, will  not.

Given a dictionary  (a list of words), design an algorithm  to find the optimal way of

"unconcatenating" a sequence of words. In this case, "optimal" is defined  to be the

parsing  which  minimizes  the number  of unrecognized sequences of characters.

For example, the string "jesslookedjustliketimherbrother"  would be optimally parsed

as  "JESS looked just like TIM her brother". This parsing  has seven unrecognized  char-

acters, which  we have capitalized  for  clarity.

给一个string，把string内的所有标点，空格都去掉。然后要求找到把空格加回去使得不可辨别的

单词数量达到最少的方法（判断是否可以辨别是通过提供一个字典来判断）

 *

 */

public class S17_14 {

	public static String sentence;

    public static Trie dictionary;

    /* incomplete code */

    public static Result parse(int wordStart, int wordEnd, Hashtable<Integer, Result> cache) {

            if (wordEnd >= sentence.length()) {

                    return new Result(wordEnd - wordStart, sentence.substring(wordStart).toUpperCase());

            }

            if (cache.containsKey(wordStart)) {

                    return cache.get(wordStart).clone();

            }

            String currentWord = sentence.substring(wordStart, wordEnd + 1);

            boolean validPartial = dictionary.contains(currentWord, false);

            boolean validExact = validPartial && dictionary.contains(currentWord, true);

            /* break current word */

            Result bestExact = parse(wordEnd + 1, wordEnd + 1, cache);

            if (validExact) {

                    bestExact.parsed = currentWord + " " + bestExact.parsed;

            } else {

                    bestExact.invalid += currentWord.length();

                    bestExact.parsed = currentWord.toUpperCase() + " " + bestExact.parsed;

            } 

            /* extend current word */

            Result bestExtend = null;

            if (validPartial) {

                    bestExtend = parse(wordStart, wordEnd + 1, cache);

            }

            /* find best */

            Result best = Result.min(bestExact, bestExtend);

            cache.put(wordStart, best.clone());

            return best;

    }        

    public static int parseOptimized(int wordStart, int wordEnd, Hashtable<Integer, Integer> cache) {

            if (wordEnd >= sentence.length()) {

                    return wordEnd - wordStart;

            }

            if (cache.containsKey(wordStart)) {

                    return cache.get(wordStart);

            }                

            String currentWord = sentence.substring(wordStart, wordEnd + 1);

            boolean validPartial = dictionary.contains(currentWord, false);

            /* break current word */

            int bestExact = parseOptimized(wordEnd + 1, wordEnd + 1, cache);

            if (!validPartial || !dictionary.contains(currentWord, true)) {

                    bestExact += currentWord.length();

            }

            /* extend current word */

            int bestExtend = Integer.MAX_VALUE;

            if (validPartial) {

                    bestExtend = parseOptimized(wordStart, wordEnd + 1, cache);

            }

            /* find best */

            int min = Math.min(bestExact, bestExtend);

            cache.put(wordStart, min);

            return min;

    }

    public static int parseSimple(int wordStart, int wordEnd) {

            if (wordEnd >= sentence.length()) {

                    return wordEnd - wordStart;

            }

            String word = sentence.substring(wordStart, wordEnd + 1);

            /* break current word */

            int bestExact = parseSimple(wordEnd + 1, wordEnd + 1);

            if (!dictionary.contains(word, true)) {

                    bestExact += word.length();

            }

            /* extend current word */

            int bestExtend = parseSimple(wordStart, wordEnd + 1);

            /* find best */

            return Math.min(bestExact, bestExtend);

    }        

    public static String clean(String str) {

            char[] punctuation = {',', '"', '!', '.', '\'', '?', ','};

            for (char c : punctuation) {

                    str = str.replace(c, ' ');

            }

            return str.replace(" ", "").toLowerCase();

    }

    public static void main(String[] args) {

            dictionary = AssortedMethods.getTrieDictionary();

            sentence = "As one of the top companies in the world, Google will surely attract the attention of computer gurus. This does not, however, mean the company is for everyone.";

            sentence = clean(sentence);

            System.out.println(sentence);

            //Result v = parse(0, 0, new Hashtable<Integer, Result>());

            //System.out.println(v.parsed);

            int v = parseOptimized(0, 0, new Hashtable<Integer, Integer>());

            System.out.println(v);

    }

    static class Result {

        public int invalid = Integer.MAX_VALUE;

        public String parsed = "";

        public Result(int inv, String p) {

                invalid = inv;

                parsed = p;

        }

        public Result clone() {

                return new Result(this.invalid, this.parsed);

        }

        public static Result min(Result r1, Result r2) {

                if (r1 == null) {

                        return r2;

                } else if (r2 == null) {

                        return r1;

                } 

                return r2.invalid < r1.invalid ? r2 : r1;

        }

    }

}

Moderate 加入空格使得可辨别单词数量最多 @CareerCup的更多相关文章

Storm监控文件夹变化统计文件单词数量
监控指定文件夹,读取文件(新文件动态读取)里的内容,统计单词的数量. FileSpout.java,监控文件夹,读取新文件内容 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
python核心编程正则表达式练习题1-2匹配由单个空格分隔的任意单词对，也就是性和名
# 匹配由单个空格分隔的任意单词对,也就是姓和名 import re patt = '[A-Za-z]+ [A-Za-z]+' # 方法一 +加号操作符匹配它左边的正则表达式至少出现一次的情况 # p ...
Python GitHub上星星数量最多的项目
GitHub上星星数量最多的项目 """ most_popular.py 查看GitHub上获得星星最多的项目都是用什么语言写的 """ i ...
Java基础IO类之字符串流（查字符串中的单词数量）与管道流
一.字符串流定义:字符串流(StringReader),以一个字符为数据源,来构造一个字符流. 作用:在Web开发中,我们经常要从服务器上获取数据,数据返回的格式通常一个字符串(XML.JSON), ...
go语言小练习——给定英语文章统计单词数量
给定一篇英语文章,要求统计出所有单词的个数,并按一定次序输出.思路是利用go语言的map类型,以每个单词作为关键字存储数量信息,代码实现如下: package main import ( " ...
练习1-21：编写程序entab，将空格串替换为最少数量的制表符和空格。。。（C程序设计语言第2版）
#include <stdio.h> #define N 5 main() { int i, j, c, lastc; lastc = 'a'; i = j = ; while ((c=g ...
hadoop-mapreduce-(1)-统计单词数量
编写map程序 package com.cvicse.ump.hadoop.mapreduce.map; import java.io.IOException; import org.apache.h ...
在Linux系统下有一个目录/usr/share/dict/ 这个目录里包含了一个词典的文本文件，我们可以利用这个文件来辨别单词是否为词典中的单词。
#!/bin/bash s=`cat /usr/share/dict/linux.words` for i in $s; do if [ $1 = $i ];then echo "$1 在字 ...
Python的 counter内置函数，统计文本中的单词数量
counter是 colletions内的一个类可以理解为一个简单的计数 import collections str1=['a','a','b','d'] m=collections.Counte ...

随机推荐

nginx location
1. “= ”,字面精确匹配, 如果匹配,则跳出匹配过程.(不再进行正则匹配) 2. “^~ ”,最大前缀匹配,如果匹配,则跳出匹配过程.(不再进行正则匹配) 3. 不带任何前缀:最大前缀匹配,举例如 ...
jquery中eq和get的区别与使用方法
$("p").eq(0).css("color") //因为eq(num)返回的是个jq对象,所以可以用jq的方法css使用get来获得第一个p标签的color ...
C#编程连接数据库，通过更改配置文件切换数据库功能。
该实例主要应用情景:假如某公司用mysql当做数据库服务器,由于发现mysql数据库在运行中出现不稳定情况,针对这情况,厂家要求更换连接数据库方式,改用SQL server数据库,来满足 ...
coreseek(sphinx)错误:WARNING: attribute 'id' not found - IGNORING
coreseek(sphinx)错误:WARNING: attribute 'id' not found - IGNORING原因及解决方法 coreseek(sphinx)建立索引时提示错误: WA ...
Redis 中的事务
Redis支持简单的事务 Redis与mysql事务的对比 Mysql Redis 开启 start transaction muitl 语句普通sql 普通命令失败 rollback 回滚 di ...
windows下实现uboot的tftp下载功能
一.原理分析带有uboot的开发板实际上充当的就是tftp客户端,而PC机扮演的角色就是tftp服务器端,而tftp下载功能实际上就是文件传输.tftp服务器可以建立在虚拟机linux下,也可以建立 ...
Tornado 的教材
Tornado 的教材作者:杨昆链接:https://www.zhihu.com/question/19707966/answer/12731684来源:知乎著作权归作者所有,转载请联系作者获得授权 ...
转:100个高质量Java开发者博客
原文来自于:http://www.importnew.com/7469.html ImportNew注:原文中还没有100个.作者希望大家一起来推荐高质量的Java开发博客,然后不段补充到这个列表.欢 ...
关于C#的委托
作者陈嘉栋(慕容小匹夫) 阅读目录 0x00 前言 0x01 从观察者模式说起 0x02 向Unity3D中的SendMessage和BroadcastMessage说拜拜 0x03 认识回调函数 ...
How to solve "The specified service has been marked for deletion" error
There may be several causes which lead to the service being stuck in “marked for deletion”. Microsof ...

Moderate 加入空格使得可辨别单词数量最多 @CareerCup

Moderate 加入空格使得可辨别单词数量最多 @CareerCup的更多相关文章

随机推荐

热门专题