Oh, no! You have just completed  a lengthy  document when you have an  unfortu-
nate Find/Replace mishap. You have accidentally removed all spaces, punctuation,
and capitalization in the document. A sentence like "I reset the computer. It still
didn't boot!" would become "iresetthecomputeritstilldidntboot". You figure that you
can add back in the punctation and capitalization later, once you get the individual
words properly separated. Most of the words will be in a dictionary, but some strings,
like proper names, will not.
Given a dictionary (a list of words), design an algorithm to find the optimal way of
"unconcatenating" a sequence of words. In this case, "optimal" is defined to be the
parsing which minimizes the number of unrecognized sequences of characters.
For example, the string "jesslookedjustliketimherbrother" would be optimally parsed
as "JESS looked just like TIM her brother". This parsing has seven unrecognized char-
acters, which we have capitalized for clarity.

这是CareerCup Chapter 17的第14题,我没怎么看CareerCup上的解法,但感觉这道题跟Word Break, Palindrome Partition II很像,都是有一个dictionary, 可以用一维DP来做,用一个int[] res = new int[len+1]; res[i] refers to minimized # of unrecognized chars in first i chars, res[0]=0, res[len]即为所求。

有了维护量,现在需要考虑转移方程,如下:

int unrecogNum = dict.contains(s.substring(j, i))? 0 : i-j; //看index从j到i-1的substring在不在dictionary里,如果不在,unrecogNum=j到i-1的char数
res[i] = Math.min(res[i], res[j]+unrecogNum);

亲测,我使用的case都过了,只是不知道有没有不过的Corner Case:

 package fib;

 import java.util.Arrays;
import java.util.HashSet;
import java.util.Set; public class unconcatenating {
public int optway(String s, Set<String> dict) {
if (s==null || s.length()==0) return 0;
int len = s.length();
if (dict.isEmpty()) return len;
int[] res = new int[len+1]; // res[i] refers to minimized # of unrecognized chars in first i chars
Arrays.fill(res, Integer.MAX_VALUE);
res[0] = 0;
for (int i=1; i<=len; i++) {
for (int j=0; j<i; j++) {
String str = s.substring(j, i);
int unrecogNum = dict.contains(str)? 0 : i-j;
res[i] = Math.min(res[i], res[j]+unrecogNum);
}
}
return res[len];
} public static void main(String[] args) {
unconcatenating example = new unconcatenating();
Set<String> dict = new HashSet<String>();
dict.add("reset");
dict.add("the");
dict.add("computer");
dict.add("it");
dict.add("still");
dict.add("didnt");
dict.add("boot");
int result = example.optway("johnresetthecomputeritdamnstilldidntboot", dict);
System.out.print("opt # of unrecognized chars is ");
System.out.println(result);
} }

output是:opt # of unrecognized chars is 8

CareerCup: 17.14 minimize unrecognized characters的更多相关文章

  1. [CareerCup] 17.14 Unconcatenate Words 断词

    17.14 Oh, no! You have just completed a lengthy document when you have an unfortunate Find/Replace m ...

  2. [CareerCup] 17.6 Sort Array 排列数组

    17.6 Given an array of integers, write a method to find indices m and n such that if you sorted elem ...

  3. [CareerCup] 17.2 Tic Tac Toe 井字棋游戏

    17.2 Design an algorithm to figure out if someone has won a game oftic-tac-toe. 这道题让我们判断玩家是否能赢井字棋游戏, ...

  4. [CareerCup] 17.13 BiNode 双向节点

    17.13 Consider a simple node-like data structure called BiNode, which has pointers to two other node ...

  5. [CareerCup] 17.12 Sum to Specific Value 和为特定数

    17.12 Design an algorithm to find all pairs of integers within an array which sum to a specified val ...

  6. [CareerCup] 17.11 Rand7 and Rand5 随机生成数字

    17.11 Implement a method rand7() given rand5(). That is, given a method that generates a random numb ...

  7. [CareerCup] 17.10 Encode XML 编码XML

    17.10 Since XML is very verbose, you are given a way of encoding it where each tag gets mapped to a ...

  8. [CareerCup] 17.9 Word Frequency in a Book 书中单词频率

    17.9 Design a method to find the frequency of occurrences of any given word in a book. 这道题让我们找书中单词出现 ...

  9. [CareerCup] 17.8 Contiguous Sequence with Largest Sum 连续子序列之和最大

    17.8 You are given an array of integers (both positive and negative). Find the contiguous sequence w ...

随机推荐

  1. LR动态脚本的处理

    在处理SSO修改密码脚本时遇到一个问题,根据用户名的不同,提交请求中数据会不一样.处理此问题,如果经分析用同类型的账号(每个账号含有的子账号类型和数目一致)测试与实际不同类型账号性能没有大的差别,则用 ...

  2. Euler's totient function

    https://en.wikipedia.org/wiki/Euler's_totient_function counts the positive integers up to a given in ...

  3. 最有用的Linux命令行使用技巧集锦

    最近在Quora上看到一个问答题目,关于在高效率Linux用户节省时间Tips.将该题目的回答进行学习总结,加上自己的一些经验,记录如下,方便自己和大家参考. 下面介绍的都是一些命令行工具,这些工具在 ...

  4. charles 使用 技巧

    测试的是Android ,App , 在 手机wifi 网络代理设置为 电脑代理. 然后,手机访问的网络 都通过 电脑端的 charles监控!

  5. day11

    JSP入门   1 JSP概述 1.1 什么是JSP JSP(Java Server Pages)是JavaWeb服务器端的动态资源.它与html页面的作用是相同的,显示数据和获取数据.   1.2 ...

  6. 数据传输:JSON,XML

    一.调用Ajax需要的JSON数据    1.url 处理页面    2.data 传递数据    3.datatype返回数据类型    4.type 提交数据方式    5.success成功之后 ...

  7. WGZX:javaScript 学习心得--1

    标签: javascriptiframedreamweaver浏览器htmltable 2008-09-11 10:50 1071人阅读 评论(0) 收藏 举报  分类: UI(21)  1,docu ...

  8. Java 并发:Executors 和线程池

    让我们开始来从入门了解一下 Java 的并发编程. 本文主要介绍如何开始创建线程以及管理线程池,在 Java 语言中,一个最简单的线程如下代码所示: Runnable runnable = new R ...

  9. Redis学习笔记(9)-管道/分布式

    package cn.com; import java.util.Arrays; import java.util.List; import redis.clients.jedis.Jedis; im ...

  10. Apache的HBase与cdh的hue集成(不建议不同版本之间的集成)

    1.修改hue的配置文件hue.ini [hbase] # Use full hostname with security. hbase_clusters=(Cluster|linux-hadoop3 ...