NEURAL NETWORKS, PART 3: THE NETWORK

We have learned about individual neurons in the previous section, now it’s time to put them together to form an actual neural network.

The idea is quite simple – we line multiple neurons up to form a layer, and connect the output of the first layer to the input of the next layer. Here is an illustration:

Figure 1: Neural network with two hidden layers.

Each red circle in the diagram represents a neuron, and  the blue circles represent fixed values. From left to right, there are four columns: the input layer, two hidden layers, and an output layer. The output from neurons in the previous layer is directed into the input of each of the neurons in the next layer.

We have 3 features (vector space dimensions) in the input layer that we use for learning: x1, x2 and x3. The first hidden layer has 3 neurons, the second one has 2 neurons, and the output layer has 2 output values. The size of these layers is up to you – on complex real-world problems we would use hundreds or thousands of neurons in each layer.

The number of neurons in the output layer depends on the task. For example, if we have a binary classification task (something is true or false), we would only have one neuron. But if we have a large number of possible classes to choose from, our network can have a separate output neuron for each class.

The network in Figure 1 is a deep neural network, meaning that it has two or more hidden layers, allowing the network to learn more complicated patterns. Each neuron in the first hidden layer receives the input signals and learns some pattern or regularity. The second hidden layer, in turn, receives input from these patterns from the first layer, allowing it to learn “patterns of patterns” and higher-level regularities. However, the cost of adding more layers is increased complexity and possibly lower generalisation capability, so finding the right network structure is important.

Implementation

I have implemented a very simple neural network for demonstration. You can find the code here: SimpleNeuralNetwork.java

The first important method is initialiseNetwork(), which sets up the necessary structures:

1
2
3
4
5
6
7
8
9
public void initialiseNetwork(){
    input = new double[1 + M]; // 1 is for the bias
    hidden = new double[1 + H];
    weights1 = new double[1 + M][H];
    weights2 = new double[1 + H];
 
    input[0] = 1.0; // Setting the bias
    hidden[0] = 1.0;
}

M is the number of features in the feature vectors, H is the number of neurons in the hidden layer. We add 1 to these, since we also use the bias constants.

We represent the input and hidden layer as arrays of doubles. For example, hidden[i] stores the current output value of the i-th neuron in the hidden layer.

The first set of weights, between the input and hidden layer, are stored as a matrix. Each of the (1+M) neurons in the input layer connects to H neurons in the hidden layer, leading to a total of (1+M)×H weights. We only have one output neuron, so the second set of weights between hidden and output layers is technically a (1+H)×1 matrix, but we can just represent that as a vector.

The second important function is forwardPass(), which takes an input vector and performs the computation to reach an output value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public void forwardPass(){
    for(int j = 1; j < hidden.length; j++){
        hidden[j] = 0.0;
        for(int i = 0; i < input.length; i++){
            hidden[j] += input[i] * weights1[i][j-1];
        }
        hidden[j] = sigmoid(hidden[j]);
    }
 
    output = 0.0;
    for(int i = 0; i < hidden.length; i++){
        output += hidden[i] * weights2[i];
    }
    output = sigmoid(output);
}

The first for-loop calculates the values in the hidden layer, by multiplying the input vector with the weight vector and applying the sigmoid function. The last part calculates the output value by multiplying the hidden values with the second set of weights, and also applying the sigmoid.

Evaluation

To test out this network, I have created a sample dataset using the database at quandl.com. This dataset contains sociodemographic statistics for 141 countries:

  • Population density (per suqare km)
  • Population growth rate (%)
  • Urban population (%)
  • Life expectancy at birth (years)
  • Fertility rate (births per woman)
  • Infant mortality (deaths per 1000 births)
  • Enrolment in tertiary education (%)
  • Unemployment (%)
  • Estimated control of corruption (score)
  • Estimated government effectiveness (score)
  • Internet users (per 100)

Based on this information, we want to train a neural network that can predict whether the GDP per capita is more than average for that country (label 1 if it is, 0 if it’s not).

I’ve separated the dataset for training (121 countries) and testing (40 countries). The values have been normalised, by subtracting the mean and dividing by the standard deviation, using a script from a previous article. I’ve also pre-trained a model that we can load into this network and evaluate. You can download these from here: original datatraining datatest data,pretrained model.

You can then execute the neural network (remember to compile and link the binaries):

1
java neuralnet.SimpleNeuralNetwork data/model.txt data/countries-classify-gdp-normalised.test.txt

The output should be something like this:

1
2
3
4
5
6
7
8
9
10
Label: 0    Prediction: 0.01
Label: 0    Prediction: 0.00
Label: 1    Prediction: 0.99
Label: 0    Prediction: 0.00
...
Label: 0    Prediction: 0.20
Label: 0    Prediction: 0.01
Label: 1    Prediction: 0.99
Label: 0    Prediction: 0.00
Accuracy: 0.9

The network is in verbose mode, so it prints out the labels and predictions for each test item. At the end, it also prints out the overall accuracy. The test data contains 14 positive and 26 negative examples; a random system would have had accuracy 50%, whereas a biased system would have accuracy 65%. Our network managed 90%, which means it has learned some useful patterns in the data.

In this case we simply loaded a pre-trained model. In the next section, I will describe how to learn this model from some training data.

NEURAL NETWORKS, PART 3: THE NETWORK的更多相关文章

  1. 神经网络第三部分:网络Neural Networks, Part 3: The Network

    NEURAL NETWORKS, PART 3: THE NETWORK We have learned about individual neurons in the previous sectio ...

  2. [C3] Andrew Ng - Neural Networks and Deep Learning

    About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep l ...

  3. 课程五(Sequence Models),第一 周(Recurrent Neural Networks) —— 1.Programming assignments:Building a recurrent neural network - step by step

    Building your Recurrent Neural Network - Step by Step Welcome to Course 5's first assignment! In thi ...

  4. 课程一(Neural Networks and Deep Learning),第四周(Deep Neural Networks) —— 3.Programming Assignments: Deep Neural Network - Application

    Deep Neural Network - Application Congratulations! Welcome to the fourth programming exercise of the ...

  5. 课程一(Neural Networks and Deep Learning),第二周(Basics of Neural Network programming)—— 4、Logistic Regression with a Neural Network mindset

    Logistic Regression with a Neural Network mindset Welcome to the first (required) programming exerci ...

  6. 【转】Artificial Neurons and Single-Layer Neural Networks

    原文:written by Sebastian Raschka on March 14, 2015 中文版译文:伯乐在线 - atmanic 翻译,toolate 校稿 This article of ...

  7. Deep Learning 23:dropout理解_之读论文“Improving neural networks by preventing co-adaptation of feature detectors”

    理论知识:Deep learning:四十一(Dropout简单理解).深度学习(二十二)Dropout浅层理解与实现.“Improving neural networks by preventing ...

  8. 一天一经典Reducing the Dimensionality of Data with Neural Networks [Science2006]

    别看本文没有几页纸,本着把经典的文多读几遍的想法,把它彩印出来看,没想到效果很好,比在屏幕上看着舒服.若用蓝色的笔圈出重点,这篇文章中几乎要全蓝.字字珠玑. Reducing the Dimensio ...

  9. Stanford机器学习笔记-5.神经网络Neural Networks (part two)

    5 Neural Networks (part two) content: 5 Neural Networks (part two) 5.1 cost function 5.2 Back Propag ...

随机推荐

  1. [TypeScript ] What Happens to Compiled Interfaces

    This lesson covers using your first TypeScript Interface and what happens to the Interface when it i ...

  2. JAVA对象之生

    https://yq.aliyun.com/users/1556056716932876?spm=5176.100239.blogrightarea55811.3.6cvyJd

  3. 怎样通过ajax提交数据

    ajax的出现彻底改变了javascript命运,通过ajax可以直接向服务器提交数据,有两种方式: get方式,数据直接拼接在地址中 post方式,数据由data字段携带 post方式,data中是 ...

  4. Java基础知识强化之集合框架笔记54:Map集合之HashMap集合(HashMap<String,String>)的案例

    1. HashMap集合 HashMap集合(HashMap<String,String>)的案例 2. 代码示例: package cn.itcast_02; import java.u ...

  5. MAC OS X API知识摘抄

    本文为信息为网上各个地方收集整理Carbon和Cocoa,Toolbox,POSIX,JAVA并列成为Mac OS X五个主要的API.与Cocoa相较之下,Carbon是非物件导向(Procedur ...

  6. sql if

    SELECT a.id, a.EduSiteNo, a.EduSiteName, a.SchoolId, a.LinkMan, a.Tel, a.Mobile, a.Fax, a.Address, C ...

  7. Binding 中 Elementname,Source,RelativeSource 三种绑定的方式

    在WPF应用的开发过程中Binding是一个非常重要的部分. 在实际开发过程中Binding的不同种写法达到的效果相同但事实是存在很大区别的. 这里将实际中碰到过的问题做下汇总记录和理解. 1. so ...

  8. XPath操作XML文档

    NET框架下的Sytem.Xml.XPath命名空间提供了一系列的类,允许应用XPath数据模式查询和展示XML文档数据. 3.1XPath介绍 主要的目的是在xml1.0和1.1文档节点树种定位节点 ...

  9. Big Data 應用:第二季(4~6月)台湾地区Game APP 变动分布趋势图

    图表简介: 该示意图表示了台湾地区第二季内所有Game APP类别的分布情形,经由该图表我们可以快速的了解到在这三个月内,哪类型的APP是很稳定:抑或者哪类型的APP是非常不稳定的. 名词解释: 类别 ...

  10. WCF学习笔记一(概述)

    WCF  Windows Communication Foundation 分布式通信框架.WCF是对现有分布式通信技术的整合.是各种分布式计算的集大成者.主要整合技术如下图: WCF的服务不能孤立的 ...