School of Computer Science

The University of Adelaide

Artificial Intelligence

Assignment 2

Semester 1, 2018

due 11:55pm, Thursday 14th May 2018

Introduction

介绍

In this assignment, you will develop several classification models to classify noisy input images into the classes square or circle, as shown in Fig. 1

在这个作业中，你将开发各自的分类模块对嘈杂的输入图像对正方形或则圆形进行分类，如图一所示

Figure 1: Samples of noisy images labelled as square (left)and circle (right).

图一被标记为正方形(左)和圆形(右)的噪声图像的样本。

Your classification models will use the training and testing sets (that are available with this assignment) containing many image samples labelled as square or circle. Your task is to write a Python code that can be run on a Jupyter Notebook session, which will train and validate the following classification models:

您的分类模型将使用包含许多标记为正方形或圆形的图像样本的训练和测试集(该任务可用)。您的任务是编写一个python代码，该代码可以在一个jupyter笔记本会话上运行，它将训练并验证以下分类模型:

1) K Nearest neighbour (KNN) classifier [35 marks]. For the KNN classifier, you can only use standard Python libraries (e.g., numpy) in order to implement all aspects of the training and testing algorithms. You will need to implement two functions: a) one to build a K-d tree from the training set (this function takes the training samples and labels as its parameters), and b) another to test the KNN classifier and compute the classification accuracy, where the parameters are K and the test images and labels. Using matplotlib, plot a graph of the evolution of classification accuracy for the training and testing sets as a function of K, where K = 1 to 10. Clearly identify the value of K, where generalisation is best.

1)k近邻(KNN)分类器[35分]。

对于KNN分类器，您只能使用标准的Python库(例如numpy)来实现训练和测试算法的所有方面。你需要实现两个方法：a)一个从训练集创建K-d树（该函数以训练样本和标签为参数），和b)另一个去测试KNN分类器和计算分类精度

Decision tree classifier [35 marks]. For the decision tree classifier, you can only use standard Python libraries (e.g., numpy) in order to implement all aspects of the training and testing algorithms. Essentially you will need to implement two functions: a) one to train the decision tree using the training samples and labels plus a pre-pruning parameter indicating the minimum information content before stop splitting, and b) another to test the decision tree and compute the classification accuracy (similarly to the KNN classifier, the test function takes as one of its parameters the test images and labels and returns the classification accuracy). Using matplotlib, plot a graph of the evolution of classification accuracy for the training and testing sets as a function of the information content, where information content = 0 to 0.5 bits. Clearly identify the value of information content, where generalisation is best.

决策树分类器[35分]对于决策树分类器，您只能使用标准的Python库(例如，numpy)来实现训练和测试算法的所有方面。本质上需要实现两个功能:1)使用训练样本的决策树和标签+ pre-pruning参数表示停止分裂之前的最小信息内容的一个训练,和b)另一个测试决策树,计算分类精度(类似于KNN分类器,测试函数接受一个参数并返回测试图片和标签分类精度)。使用matplotlib，将训练和测试集的分类精度的演变图作为信息内容的函数，其中信息内容= 0到0.5位。清楚地确定信息内容的价值，概括是最好的。

2) Convolutional neural network (CNN) classifier [20 marks]. For the convolutional neural network, you are allowed to use Keras using TensorFlow backend, similar to the example shown in the code provided. The CNN structure is the lenet structure used in lecture. Using matplotlib, please plot a graph of the evolution of accuracy for the training and testing sets as a function of the number of epochs, where the max number of epochs is 200. Clearly identify the value of information content, where generalisation is best.

卷积神经网络（CNN）分类器[20分].对于卷积神经网络，您可以使用TensorFlow（张量流）后端使用Keras，类似于提供的示例代码中的那样。CNN的结构是课堂上使用的lenet结构。使用matplotlib，请绘制将训练和测试集的精度的进化图成一个时代的函数，eprochs(时期，时代)的最大数是200，最好清楚确定概括信息内容的价值

A sample code that trains and tests a multi-layer perceptron classifier that can run on a Jupyter Notebook session is provided, and it is expected that the submitted code can run on a Jupyter Notebook session in a similar manner. A held-out test set will be used to test the generalisation of the implemented classification models, but this held-out set will only be available after the assignment deadline – please note that this held-out set will contain samples obtained from the same distributions used to generate the training and testing sets.

提供了一种测试多层感知器分类器的示例代码，它可以运行在Jupyter笔记本会话上，并且预期所提交的代码可以以类似的方式运行在Jupyter笔记本上。

held-out测试集将用于测试实现的分类模型的概括,但这held-out集在作业的最后期限之后提供,请注意,这个held-out集将包含获从被用于生成训练和测试的集合的相同的分布中获得的样本。

You must write the program yourself in Python, and the code must be a single file that can run on a Jupyter Notebook session (file type .ipynb). You will only get marks for the parts that you implemented yourself. If you use a library package or language function call for training or testing a KNN or a Decision Tree classifier, then you will be limited to 50% of the available marks (noting that this assignment is a hurdle for the course). If there is evidence you have simply copied code from the web, you will be awarded no marks and referred for plagiarism

你必须使用python编写程序，并且代码必须是一个可以运行在Jupyter笔记本会话（文件类型ipynb）单独的文件,你将只有在哪些你自己实现的部分获得分数，如果您使用一个库包或语言函数调用训练或测试一个KNN或决策树分类器，那么你的有效分数将被限制在50%（注意这个作业是本课程的一个障碍），如果有证据表明你只是从网上复制了代码，那么你将不会被授予任何分数，并且视为剽窃。

Submission

提交

You must submit, by the due date, two files:

你必须在截止日期之前提交两个文件

1. ipynb file containing your code with the three classifiers and all implementations described above

ipynb 文件包含你的三个分类器和以上描述的所有实现代码

2. pdf file with a short written report detailing your implementation in no more than 1 page, and the following results:

pdf文件，一个简短的书面报告，在不超过1页的情况下详细说明你的实现，和以下结果:

a) The training and testing accuracies at the best generalisation operating point for each type of classifier, using a table [5 marks]: 【通过表最好概括每种类型分类器训练和测试精度工作点】

	Training Accuracy	Testing Accuracy
K=1 NN
…
K=10 NN
DT (IC = 0 bits)
…
DT (IC = 0.5 bits)
CNN

b) Running time for training and testing algorithms accuracies of each type of classifier, using a table [5 marks]: 通过表记录训练和测试算法的每一种分类器的精度

	Training Time	Testing Time
K=1 NN
…
K=10 NN
DT (IC = 0 bits)
…
DT (IC = 0.5 bits)
CNN

c) Bonus question: How can the classification accuracy of the decision tree classifier be improved? Please implement your idea (hint: dimensionality reduction) [10 marks].附加问题，如何提高决策树分类器的精度？请实现你的想法（提示，维度减少）。，

Total number of marks: 100 + 10 bonus marks

总分数:100 + 10分。

This assignment is due 11.55pm on Thursday 14th May, 2018. If your submission is late, the maximum mark you can obtain will be reduced by 25% per day (or part thereof) past the due date or any extension you are granted.

这个作业的截止时间是五月十四星期三的晚上十一点五十五分，如果你提交的晚，每超过一天你获得的最大分数会减少25%（或则一部分）

This assignment relates to the following ACS CBOK areas: abstraction, design, hardware and software, data and information, HCI and programming.

这个作业涉及以下ACS CBOK领域:抽象、设计、硬件和软件、数据和信息、HCI和编程。

编程英语之KNN算法的更多相关文章

什么是 kNN 算法？
学习 machine learning 的最低要求是什么? 我发觉要求可以很低,甚至初中程度已经可以. 首先要学习一点 Python 编程,譬如这两本小孩子用的书:[1][2]便可. 数学方面 ...
KNN算法java实现代码注释
K近邻算法思想非常简单,总结起来就是根据某种距离度量检测未知数据与已知数据的距离,统计其中距离最近的k个已知数据的类别,以多数投票的形式确定未知数据的类别. 一直想自己实现knn的java实现,但限于 ...
KNN算法介绍
KNN算法全名为k-Nearest Neighbor,就是K最近邻的意思. 算法描述 KNN是一种分类算法,其基本思想是采用测量不同特征值之间的距离方法进行分类. 算法过程如下: 1.准备样本数据集( ...
[Python]基于K-Nearest Neighbors[K-NN]算法的鸢尾花分类问题解决方案
看了原理,总觉得需要用具体问题实现一下机器学习算法的模型,才算学习深刻.而写此博文的目的是,网上关于K-NN解决此问题的博文很多,但大都是调用Python高级库实现,尤其不利于初级学习者本人对模型的理 ...
KNN算法简介
KNN算法 K-近邻算法原理 K最近邻(kNN,k-NearestNeighbor)分类算法,见名思意. 我们的目的是要预测某个学生在数学课上的成绩... 先来说明几个基本概念:图中每个点代表一个样本 ...
Scratch编程与高中数学算法初步
scratch编程与高中数学算法初步一提到编程,大家可能觉得晦涩难懂,没有一定的英语和数学思维基础的人,一大串的编程代码让人望而步,何况是中小学生. Scratch是一款由麻省理工学院(MIT) ...
深入浅出KNN算法
概述 K最近邻(kNN,k-NearestNeighbor)分类算法所谓K最近邻,就是k个最近的邻居的意思,说的是每个样本都可以用它最接近的k个邻居来代表. kNN算法的核心思想是如果一个样本在特征 ...
【Machine Learning】KNN算法虹膜图片识别
K-近邻算法虹膜图片识别实战作者:白宁超 2017年1月3日18:26:33 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
KNN算法
1.算法讲解 KNN算法是一个最基本.最简单的有监督算法,基本思路就是给定一个样本,先通过距离计算,得到这个样本最近的topK个样本,然后根据这topK个样本的标签,投票决定给定样本的标签: 训练过程 ...

随机推荐

JSON(一）——JSON与JavaScript的关系
JSON是一种轻量级的数据交换格式,全称--JavaScript 对象表示法(JavaScript Object Notation). 类比XML,你可以把JSON看作是一种存储数据的格式类型,一种数 ...
基于JWT标准的用户认证接口实现
前面的话实现用户登录认证的方式常见的有两种:一种是基于 cookie 的认证,另外一种是基于 token 的认证 .本文以基于cookie的认证为参照,详细介绍JWT标准,并实现基于该标签的用户认证 ...
Android：CheckBox控件
1)ChexkBox继承自CompoundButton组件: 2)isChecked()--确定是否选中:setChecked(bool checked)--设置选中或取消选中: 3)监听事件:Com ...
PIL绘图
# coding:utf-8 # PIL的ImageDraw 提供了一系列绘图方法,让我们可以直接绘图.比如要生成字母验证码图片 from PIL import Image, ImageDraw, I ...
[JCIP笔记]（四）踩在巨人的肩上
读完第三章那些繁琐的术语和细节,头疼了整整一个星期.作者简直是苦口婆心,说得我如做梦一般.然而进入第四章,难度骤然降低,仿佛坐杭州的过山公交车突然下坡,鸟鸣花香扑面而来,看到了一片西湖美景. 从开始看 ...
Java 并发编程：Callable和Future
项目中经常有些任务需要异步(提交到线程池中)去执行,而主线程往往需要知道异步执行产生的结果,这时我们要怎么做呢?用runnable是无法实现的,我们需要用callable实现. import java ...
Python系列之 - 面向对象(2)
类的三大特性类的三大特性包括: 封装.继承.多态一封装封装就是将类所用到的所有字段.属性.方法都包含在类代码段里面,当实例调用直接调用类中的方法即可. class People(object) ...
[ZJOI 2013]丽洁体
Description 题库链接给出四个字符串 \(T,A,B,C\) ,问你至少在 \(T\) 串中删去几个单词能使得 \(T\) 串变为 \(A?B?C\) 的形式,其中 \(?\) 表示任意多 ...
[SDOI 2008]Cave 洞穴勘测
Description 辉辉热衷于洞穴勘测.某天,他按照地图来到了一片被标记为JSZX的洞穴群地区.经过初步勘测,辉辉发现这片区域由n个洞穴(分别编号为1到n)以及若干通道组成,并且每条通道连接了恰好 ...
【NOIP2009】Hankson 的趣味题
题目描述 Hanks 博士是 BT (Bio-Tech,生物技术) 领域的知名专家,他的儿子名叫 Hankson.现在,刚刚放学回家的 Hankson 正在思考一个有趣的问题. 今天在课堂上,老师讲解 ...

编程英语之KNN算法

Introduction

介绍

Submission

Total number of marks: 100 + 10 bonus marks

编程英语之KNN算法的更多相关文章

随机推荐

热门专题