编程作业一

我的代码：WordNet.java & SAP.java & Outcast.java

这是第二部分的编程作业，因为第二部分课程开始了，第一部分博客就先放放。

问题简介

WordNet 按字面意思就是单词网，它是一个有向图，点里面是同义的单词，边则指向具有更高层次的抽象含义的点。举例来说，有个点里面含{与电路，与门}表示输入全为 1 输出才为 1 的逻辑门，指向点{门，逻辑门}这更高层的抽象概念。另外需要注意的是，它还是一个具有单根的有向无环图（rooted DAG），贴张样图。

任务摘要

WordNet Data Type

Implement an immutable data type WordNet with the following API:

public class WordNet {

  // constructor takes the name of the two input files

  public WordNet(String synsets, String hypernyms)

  // returns all WordNet nouns

  public Iterable<String> nouns()

  // is the word a WordNet noun?

  public boolean isNoun(String word)

  // distance between nounA and nounB (defined below)

  public int distance(String nounA, String nounB)

  // a synset (second field of synsets.txt) that is the common ancestor of nounA and nounB

  // in a shortest ancestral path (defined below)

  public String sap(String nounA, String nounB)

  // do unit testing of this class

  public static void main(String[] args)

}

SAP Data Type

Implement an immutable data type SAP with the following API:

public class SAP {

  // constructor takes a digraph (not necessarily a DAG)

  public SAP(Digraph G)

  // length of shortest ancestral path between v and w; -1 if no such path

  public int length(int v, int w)

  // a common ancestor of v and w that participates in a shortest ancestral path; -1 if no such path

  public int ancestor(int v, int w)

  // length of shortest ancestral path between any vertex in v and any vertex in w; -1 if no such path

  public int length(Iterable<Integer> v, Iterable<Integer> w)

  // a common ancestor that participates in shortest ancestral path; -1 if no such path

  public int ancestor(Iterable<Integer> v, Iterable<Integer> w)

  // do unit testing of this class

  public static void main(String[] args)

Outcast Detection

Implement an immutable data type Outcast with the following API:

public class Outcast {

  public Outcast(WordNet wordnet)         // constructor takes a WordNet object

  public String outcast(String[] nouns)   // given an array of WordNet nouns, return an outcast

  public static void main(String[] args)  // see test client below

}

详细的要求参见：WordNet。

问题分析

SAP

按照 Checklist 里建议的编程步骤，先实现 SAP 这一数据结构。SAP 即为 Shortest ancestral path，最短的公共祖先路径，像下面的图描述的那样。

可以看到 SAP 处理的是点已经被映射到 0 到 N-1 的有向图，具体映射的任务需要我们稍后在 WordNet 中实现。有向图即课程中的 Digraph.java，是 SAP 构造函数的参数，不能随便改，下载 algs4.jar，即可 import。

SAP 不但要支持计算单独的两点 v 和 w 的最短公共祖先路径和最近公共祖先，还要支持计算两个点集的。

这是因为 SAP 是为 WordNet 提供支持的，而同一个单词可能出现在单词网的多个点，比如多义词，这样计算两个单词在网中的“距离”，要处理的就是点集。

然后是具体的实现，要求最短，结合本周的课程，很自然地想到 BFS 这个图搜索策略。先是计算单独的两个点 v 和 w ，大概思路是来两个队列，交替用 BFS 拓展新的节点，拓展到第一个被两点都访问过的点，应该就是最近公共祖先吧，距离加一下也出来了。

看了 Checklist ，发现有种“暴力”做法是直接调用课程里的 BreadthFirstDirectedPaths.java 分别对 v 和 w 来个整套 BFS ，这样就能知道图的某个点是否能被 v 点访问到，如果能访问到也能知道距离，w 点也是。所以接下来遍历图中所有点，对 v 和 w 都能访问的点，计算距离和并维护最短距离和最近公共祖先。

我最初的想法和 Checklist 的 Optional Optimizations 的第二点契合:

If you run the two breadth-first searches from v and w in lockstep (alternating back and forth between exploring vertices in each of the two searches), then you can terminate the BFS from v (or w) as soon as the distance exceeds the length of the best ancestral path found so far.

最后搜索的伪代码大概长这样：

while(qv 或 qw 不为空) {

    if (qv 不为空) {

        if (被点 w 访问过) {

            计算路径，更优则更新 sap

        }

        if (与 v 的距离小于 sap) {

            继续拓展，往 qv 塞点

        }

    }

   if (qw 不为空) {

       类似，略。。。

   }

}

当拓展到的点到起点的距离比目前的最短路径还长时，也就没必要继续拓展啦，没必要来整套 BFS ，而且我这个一直都是在同一张图上搜索。

还有可选优化的第一点，说了好长一段，我到实际实现交替 BFS 时才明白在说什么。大概就是建议我们跟踪搜索过程中被访问过的点，好来有针对性地重新初始化一些 BFS 时要的辅助数组，像标记点是否被访问过的布尔数组 marked ，这样就不用浪费大量时间在重新初始化上，因为很多情况下只有一小部分被改变。至于实现的话，我就开个栈，在点被访问到时把它的 id 丢进去，下次交替 BFS 之前，重新初始化弹出的 id 对应的辅助数组。

至于可选优化第三点，它建议实现一个 software cache ，保存最近的 length() 和 ancestor() 。因为最短路径长度和最近公共祖先，其实在一次交替 BFS 中就都可以得到，length(v, w) 和 ancestor(v, w) 完全可以省下一次搜索。我先试了一下单独两个点版本的 length() 和 ancestor() ，搜索前把点和 recentV 及 recentW 比较下，吻合就直接返回上次保存的结果。感觉没问题啊，本地测试也没找到，但交上去 SAP 有个测试就是过不了。

Test 19: random calls to both version of length() and ancestor(),

         with probabilities p1 and p2, respectively

  * random calls in a random rooted DAG (20 vertices, 100 edges)

   (p1 = 0.5, p2 = 0.5)

   - no path from v or w to ancestor

   - failed on call 34 to ancestor()

   - v = 1, w = 6

   - reference length   = 1

   - student   ancestor = 8

   - reference ancestor = 1

 * random calls in a random digraph (20 vertices, 100 edges)

   (p1 = 0.5, p2 = 0.5)

   - ancestor() is not ancestor on shortest ancestral path

   - failed on call 67 to ancestor()

   - v = 8, w = 5

   - student   ancestor = 0

   - distance from 8 to 0 = 2

   - distance from 5 to 0 = 1

   - reference ancestor = 5

   - reference length   = 2

==> FAILED

我把 cache 去掉，这个测试就过了，无法理解哪里不对。而且讲道理，最短距离是 0 说明参数是同一个点，那 recentV 和 recentW 该是一样，那对现在不同点的查询，怎么会直接返回上次结果啊。

不过，大概是因为实现了另外两个优化，最后还是拿到了额外的分数的。

最后还有，那个处理点集的版本，一开始把点分别全部丢到队列里就好。完整代码：SAP.java 。

WordNet

接着 Checklist 建议先搞清楚怎么正确从 CSV 文件读入数据，用到了 [In](file:///C:/Users/Archeroc/AppData/Local/Temp/360zip$Temp/360$0/edu/princeton/cs/algs4/In.java.html).readLine() 和 String.split() ，前者在课程的 algs4.jar 里，后者给了示例 Domain.java ，总得来说问题不大。

于是该用文件里读来的数据构造 WordNet ，建议至少分成两个子任务。我将其分成了 readSynsets(synsets) 和 readHypernyms(hypernyms) ，前者完成对单词和所在点的 id 映射关系的存储，后者则构建要传给 SAP 实例的有向图。

我用 ST<String, SET<Integer>> synsets （[ST](file:///C:/Users/Archeroc/AppData/Local/Temp/360zip$Temp/360$1/edu/princeton/cs/algs4/ST.java.html) 和 [SET](file:///C:/Users/Archeroc/AppData/Local/Temp/360zip$Temp/360$2/edu/princeton/cs/algs4/SET.java.html) 同样在 algs4.jar 中）来存储单词和它所在点的 id ，因为一个单词可能出现在多个点。查询两个单词距离就方便，把 synsets.get(nounA) 和 synsets.get(nounB) 丢给 SAP 实例相应的方法就好。而且其他方法也很简单，用 ST.keys() 和 ST.contains() 就能实现。

我一路实现下来，直到最后一个 "String sap(String nounA, String nounB)" ，发现有点不对劲。SAP 里面的最近公共祖先方法返回的是点的 id ，这边要的是点里面有哪些单词。我只好遍历所有单词，用 SET.contains(id) 判断，如果包含 id 则把这个单词加到最后返回的字符串上。啊对，我一开始还用了 String 对象，测评系统提醒我该用 StringBuilder 才对，但是最后交上去 WordNet 有些测试还是超时啦。

于是我又加了个 ST<Integer, String> id_nouns 保存点及其含有的所有单词，这样用 id 找单词就快多了。交上去，成功解决了超时问题，而且没有超出内存限制。

第二个任务，从文件构建有向图。构建本身问题不大，但是题目要求我们检测它是不是 rooted DAG ，这就不知道怎么做了。去题目讨论页面找了下，发现又可以直接调用课程实现过的啊。讲拓扑排序时说到 DAG 才有可能有，于是 Topological.java 就有用来检测图有没有环的，它基于的 DirectedCycle.java 也可以。至于只有一个根，我是在构建的时候记录出度为零的点的个数，最后只有一个就说明是单根。完整代码：WordNet.java。

Outcase

这个没什么问题，给一组单词，要求挑出和其它最不相关的单词。具体做法就是计算每个单词和其它单词的距离，调用 WordNet ，最后距离总和最大的即为所求。完整代码：Outcast.java。

测试结果

完成两个可选优化，时间测试上额外通过了 6 个附加测试。

Programming Assignment 1: WordNet的更多相关文章

课程一(Neural Networks and Deep Learning)，第三周（Shallow neural networks）—— 3.Programming Assignment : Planar data classification with a hidden layer
Planar data classification with a hidden layer Welcome to the second programming exercise of the dee ...
Algorithms: Design and Analysis, Part 1 - Programming Assignment #1
自我总结: 1.编程的思维不够,虽然分析有哪些需要的函数,但是不能比较好的汇总整合 2.写代码能力,容易挫败感,经常有bug,很烦心,耐心不够好题目: In this programming ass ...
Algorithms : Programming Assignment 3: Pattern Recognition
Programming Assignment 3: Pattern Recognition 1.题目重述原题目:Programming Assignment 3: Pattern Recogniti ...
Programming Assignment 2: Randomized Queues and Deques
实现一个泛型的双端队列和随机化队列,用数组和链表的方式实现基本数据结构,主要介绍了泛型和迭代器. Dequeue. 实现一个双端队列,它是栈和队列的升级版,支持首尾两端的插入和删除.Deque的API ...
课程一(Neural Networks and Deep Learning)，第二周（Basics of Neural Network programming）—— 2、编程作业常见问题与答案（Programming Assignment FAQ）
Please note that when you are working on the programming exercise you will find comments that say &q ...
Programming Assignment 5: Kd-Trees
用2d-tree数据结构实现在2维矩形区域内的高效的range search 和 nearest neighbor search.2d-tree有许多的应用,在天体分类.计算机动画.神经网络加速.数据 ...
Programming Assignment 4: 8 Puzzle
The Problem. 求解8数码问题.用最少的移动次数能使8数码还原. Best-first search.使用A*算法来解决,我们定义一个Seach Node,它是当前搜索局面的一种状态,记录了 ...
coursera普林斯顿算法课part1里Programming Assignment 2最后的extra challenge
先附上challenge要求: 博主最近在刷coursera普林斯顿大学算法课part1部分的作业,Programming Assignment2最后的这个extra challenge当初想了一段时 ...
Programming Assignment 2: Deques and Randomized Queues
编程作业二作业链接:Deques and Randomized Queues & Checklist 我的代码:Deque.java & RandomizedQueue.java & ...

随机推荐

Linux打开txt文件乱码解决方案
在ubuntu16.04下打开dic_ec.txt,出现中文乱码. 先输入 gsettings set org.gnome.gedit.preferences.encodings auto-detec ...
background-position 详解
一.语法语法:background-position:x, y; 定义:背景图片相对容器原点的起始位置: 取值: 关键字:top | center | bottom | left | cen ...
Prinzipien der Computer Zusammensetzung
1.Die Einfuerung der Computer System 1.1 Computer Zusammensetzung und Computer Architektur Unter Com ...
JavaScript函数——闭包
闭包概念只有函数内部的子函数才能读取局部变量,所以闭包可以理解成"定义在一个函数内部的函数".在本质上,闭包是将函数内部和函数外部连接起来的桥梁例子 function out ...
unity GUI Layout 组件（全）
[expand 扩张][fitter 装配工] [envelope 信封,包装] Layout 布局三种. Horizontal Layout Group 水平布局 Padding:内边距,单位 ...
撩课-Web大前端每天5道面试题-Day11
1. 如何手写一个JQ插件? 方式一: $.extend(src) 该方法就是将src合并到JQ的全局对象中去: $.extend({ log: ()=>{alert('撩课itLike');} ...
Shiro官方快速入门10min例子源码解析框架3-Authentication(身份认证)
在作完预备的初始化和session测试后,到了作为一个权鉴别框架的核心功能部分,确认你是谁--身份认证(Authentication). 通过提交给shiro身份信息来验证是否与储存的安全信息数据是否 ...
Shiro官方快速入门10min例子源码解析框架2-Session
Shiro自身维护了一套session管理组件,它可以独立使用,并不单纯依赖WEB/Servlet/EJB容器等环境,使得它的session可以任何应用中使用. 2-Session)主要介绍在quic ...
grunt-contrib-watch 监控 JS 文件改变来运行预定义的Tasks
依赖于 GruntJs ~0.4.0 监控 JS 文件改变来运行预定义的Tasks Demo: watch: { scripts: { files: ['src/**/*.js'], tasks: [ ...
51nod1538：一道难题(常系数线性递推/Cayley-Hamilton定理)
传送门 Sol 考虑要求的东西的组合意义,问题转化为: 有 $n$ 种小球,每种的大小为 $a_i$,求选出大小总和为 $m$ 的小球排成一排的排列数有递推 \(f_i=\sum_{j= ...

Programming Assignment 1: WordNet