Training (deep) Neural Networks Part: 1
Training (deep) Neural Networks Part: 1
Introduction to Neural Network
Neural networks are made out of logistic units and depending on the way you arrange those units, you might get different neural network architectures. Figure: 1 depicts the schematic diagram of a logistic unit. You can consider it as a function which takes a input vector (X) and produces a real-valued output. Also, as shown in the Figure: 1, the logistic unit contains a vector of weights (W) which is used to control the relative importance of elements of the input vector.
The logistic unit itself works as a binary classifier and we called it as the logistic regression. For instance, with appropriate weight vectors W, it can be used for classifying spam/non-spam emails or to check whether a given credit card transaction as fraudulent or not.
Logistic and tanh are two popular non-linear functions which can be used in logistic units. Figure: 2 shows graphs of these two functions for x∈R.
Single Logistic unit works well with datasets which can be linearly separable. However, it doesn't perform so well with linearly non-separable datasets. This can be easily demonstrated with two synthetically generated datasets shown in Figure: 3.
So it is clear that logistic regression is capable of classifying linearly separable datasets. Unfortunately, most of the datasets we come across in practical machine learning problems do not show linearly separable property. Hence, we need better classifiers than logistic regression.
Building classifiers with more logistic units would be an obvious and natural approach to overcoming the limitations of a single logistic unit. The classifier which addresses limitations of a single logistic unit is known as the Neural Network. Usually, neural networks arrange logistic units into layers and depending on the orientation of these layers, we have different neural network architectures. Figure: 4 shows a typical neural network with 4 layers. The first layer (denoted by L1), is known as the input layer and last layer (denoted by L4) is known as output layer. The layers in between input and output layers are know as hidden layers and usually, neural networks with more than one hidden layer are known as deep neural networks.
In Figure: 4, W1, W2, W3, b1, b2 and b3 denote parameters of the model that maps given input vector X to output vector y. Finding proper values for those parameters is crucial for the predictive performance of neural networks. We use training dataset for estimating suitable values for those weight matrices and vectors.
Training Procedure
In this section, we introduce key steps of neural network training process. First, we formulate neural network training as an optimization problem. Then, the gradient descent will be introduced as a technique for solving this optimization. Finally, we discuss automatic differentiation as an efficient method for calculating error derivatives of a function w.r.t. its parameters.
Empirical Risk minimization
As we pointed out above, neural network training can be considered as an optimization probable. The framework we use to formulate this optimization problem is known as empirical risk minimization. It is a generic principle and useful in many areas of machine learning. For more details about empirical risk minimization, please read Chapter 4 of [1].
Let me explain empirical risk minimization with an example. Suppose you have developed a neural network for classifying handwritten digits. During training time you input training images (actually, raw pixel intensities) and network predicts most probable digit associated with each image. Suppose you have a function (say L) that quantifies the difference between the actual digit and the predicted digit. So during the training process your try to minimize L as much as possible. Mathematically this can be written as follows.
In Equation: (1), N represents the number of training examples contains in the training set, y(i) represents the actual digit belonging to ith training example. Predicted digit for X(i)training image is represented by f(X(i);θ). So the training can be considered as a process of finding a suitable function f(.) parameterized by θ that maps input features to output labels.
In practice, we add an extra term to (1) known as regularization term. It helps to generalize our neural network and hence, it will perform well on new images. So the complete loss function is given in Equation: (2).
In Equation: (2) λ controls the importance of loss function and regularization term. λ is a hyper-parameter and its value is estimated using a cross validation dataset.
So now we have formulated neural network training as an optimization problem. Therefore, next step would be to find a suitable algorithm which can be used to optimize the empirical loss function given in Equation: (2).
Gradient Descent
In this section, we discuss a simple yet powerful optimization algorithm called gradient descent. Actually, we will be using one of its improved versions for training neural networks. However, having a good understanding on vanilla gradient descent is essential to understand those algorithms. Therefore, in this tutorial we are going to use vanilla gradient descent. But, in upcoming tutorials we will be using few improved versions for training neural networks.
Let's start our discussion with a simple example: f(x)=x2. The derivative of f(x) at any given point, x0 represents the slope of the tangent line to the graph at x0. For instance, consider two points p1=(1,1) and p2=(−2,4) and Figure: 5 shows tangent lines at these two points.
From Figure: 5 it is clear that if you follow the negative direction of the derivative and small step at a time, you will be moving towards the minimum of the f(x)=x2 function. For example, if you start at p2 and follow the negative direction of the gradient, your new X coordinate would be little less than 2.0. At this new point, you have to calculate df(x)dx again and follow the negative direction of the derivative. So this is an iterative process and if you perform enough iterations, eventually you will reach the minimum point (i.e. x=0.0) of f(x)=x2. Though we showed this iterative algorithm for univariate function, it is applicable for functions involving multiple variables as well. Mathematically, this can be written as given in Equation: (3).
We iteratively apply Equation: (3), in order to find θ which minimizes f(θ). It is to be noted that, f(θ) could be a function of many variables and those variables are denoted by vector θ. Also, ∇θf(θ) is known as the gradient vector of f(θ) and it contains partial derivatives of f(θ) w.r.t. individual variables. The algorithm (3) is known as the gradient descent.
Now let's move to the implementation of gradient descent algorithm in Python.
1 def get_grad(x):
2 """ This method returns the derivative of f(x)=x^2 function"""
3 return 2*x
4
5 #initial guess
6 x = 10
7 #learning rate
8 eta = 0.01
9
10 num_iterations = 500
11 for i in range(num_iterations):
12 x = x - eta*get_grad(x)
13 if i % 50 == 0:
14 print('Iteration: {:3d} x: {:.3e} f(x): {:.3e}'.format(i, x, x**2))
15 print('Iteration: {:3d} x: {:.3e} f(x): {:.3e}'.format(i, x, x**2))
Program 1: Finding minimum value of f(x)=x2 using the gradient descent algorithm.
Source: https://github.com/upul/GNN/blob/master/chapter2/simple_gradient_descent.py
By just looking at the Figure: 5, it is obvious that the minimum value of the function f(x)=x2is zero and it will be reached when x equals to zero. According to the output of the Program 1 (please run the Program: 1 and see the output), our simple gradient descent algorithm has also managed to find the minimum point of the function f(x)=x2. Actually, finding the minimum point of f(x)=x2 was trivial and we selected it to describe the basic idea of gradient descent. Now let's move to a bit more complicated function: f(x,y)=xe−(x2+y2). It represents a surface in 3D space as shown in Figure: 7.
Now let's say we are going to find the minimum point of the function f(x,y)=xe−(x2+y2) in the region x=[−1,1], and y=[−1,1]. Since, this function has several minima, depending on the initial starting point, you will be reaching different local minima. For instance, in Figure: 7 we have shown two such possibilities. So it should be noted that in gradient descent starting point of the optimization process plays an important role.
Automatic Differentiation (AD)
In previous sections, we formulated neural network training as an optimization problem. Also, the gradient descent was introduced for obtaining minimum points of the loss function given in Equation: (2). However, before applying gradient descent, we need a reliable method for calculating derivatives of the loss function w.r.t. model parameters (i.e. ∂E(θ)∂θ for all θ∈θ). In this section, we will discuss an efficient technique for Calculating derivatives of the loss function (also known as error derivatives) w.r.t. model parameters.
Discovered independently by several different search groups in the 1970s and 1980s, thebackpropagation algorithm has been using as the main tool for calculating error derivatives of the lost function w.r.t. model parameters. The key idea of the backpropagation algorithm is that error derivatives can be calculated by starting at the output layer of the network and moving towards the input layer. So error derivaties of the ith layer is calculated using the derivatives of (i−1)th layer with the help of the chain rule.
However, in this tutorial instead of backpropagation we will be using a more general technique called reverse-mode automatic differentiation for calculating derivatives of the loss w.r.t. model parameters. Actually, backpropagation is a specialized version of the everse-mode automatic differentiation. Unlike backpropagation, reverse-mode automatic differentiation can be used for calculating derivatives of any computational graphs. Though, it is heavily underused in machine learning, automatic differentiation is a well-established technique used in some other scientific disciplines such as fluid dynamics and nuclear engineering.
Let's consider simple function f(x,y)=ex+xy and we would like to calculate ∂f(x,y)∂x, and ∂f(x,y)∂ywhen x=2 and y=3. Figure: 8 shows above function as a directed graph. These directed graphs are known as computational graph in automatic differentiation literature. Reverse-mode automatic differentiation consists of two phases. Figure 8 shows the fist phase and it is knows asforward phase. During the forward phase, we start from the inputs (i.e. x=2 and y=3) and move forward by applying elementary operations at each node.
In Figure: 8, we have introduced three intermediate variables (v1, v2, and v3) to decompose f(x,y)=ex+xy into three elementary operations.
Second phase, commonly known as backward pass starts at the bottom of the graph and movies towards inputs. During the backward pass, we calculate derivatives of the output w.r.t. intermediate variables and finally, w.r.t. input variables. Figure: 9 shows the backward pass off(x,y)=ex+x∗y when x=2 and y=3.
We start backward pass at the bottom of Figure: 9 and first calculate ∂f(x,y)∂v3. Since, f(x,y)equals to v3, ∂f(x,y)∂v3=1. Next, we move one step towards inputs and calculate derivatives of v3w.r.t. v1, and v2. By looking at Figure: 8, you can realize that v1 and v2 are directly influencing v3. Therefore, we can easily calculate ∂v3∂v1 and ∂v3∂v2 and these two derivatives are equals to 1 (Since, v3=v1+v2).
Next two step (i.e. ∂v3∂x and ∂v3∂y) are a little bit trickier than previous steps. First, we look at ∂v3∂y. Figure 8 and 9 tell us, y doesn't directly influence v3, but via the intermediate term v2. Since, y directly influences v2, we can easily calculate ∂v2∂y and it equals to 2 (since, v2=xy, ∂v2∂y=x=2). Also, we have already calculated ∂v3∂v2. Finally, we can combine these two terms using the chain rule, ∂v3∂y=∂v3∂v2∂v2∂y=1∗2=2.
Calculating ∂v3∂x involves one additional step than we did above. In the function f(x,y)=ex+xy, x contains in both terms. Hence, x contributes to the output in two different paths. Therefore, when we are doing backward pass, we have to consider these two paths. In Figure: 9, you can see these two paths as two edges directed towards the node x. Therefore, ∂v3∂x=ex∗1+y∗1=e2+3=10.39.
Now we have a good understanding of the mechanics of reverse-mode automatic differentiation and we are ready to use it for calculating error derivatives of neural networks. Also, it is worth mentioning that, automatic differentiation is neither fully analytical nor numerical algorithm. At the elementary operations level (such as +, ∗ and log()), we use analytical differentiation and keep intermediate result numerically. Also, we use the chain rule for propagating derivatives towards inputs from a given output.
Training Shallow Networks
In previous sections, we have discussed a lot of necessary tools for training neural networks. Now it's time to put those tools into practice. But, before moving to full–fledged neural networks, we would like to start with a simple linear network called Softmax Classifier.
Since, we are in the classification setting, it would be very nice to interpret output values of the network as probabilities. However, the pre-activations of the output layer (denoted by a=XW+b) produces a real-valued vector. Therefore, we need to convert the pre-activation vector to a vector of probabilities. Though, there are several functions we can use for that purpose, as the name suggests in the softmax classifier, we use Softmax function (given in equation (4)) for calculating probabilities from the pre-activations.
Where ai is commonly known as the pre-activation of ith output unit of the network and complete pre-activation vector a is equals to XW+b. Then, probability vector (denoted by p) can be calculated using Equation: (4). The category associated with the highest probability in p will be selected as the predicted category.
Next, in order to use the gradient descent for training our network, we have to devise a suitable cost function which quantifies the discrepancy between predicted and actual classes. In the remaining part of this section, we derive a loss function called cross-entropy loss that will be using in our softmax classifier.
Technically speaking, our output layer calculates conditional probabilities. For instance, if we consider ith element of the output vector and assuming it represents class ci then pi=P(yi=ci|x). If x belongs to true class ck, we would like to increase pk=P(yi=ck|x) and decreasing probabilities of the remaining units of the output vector. In machine learning we usually framing our problems as minimization problems rather than maximizations. Therefore, instead of maximizing pk, we would like to minimize −pk. Actually, in practice we use a simple yet a powerful trick which further simplifies our optimization work. So instead of minimizing −pk, we minimize −loge(pk). So our complete loss function is given in Equation (5).
Where θ={W,b}, N represents number of training examples and W is the weight matrix between input and output layers. Also, we have used L2 regularization loss in above loss function.
So we have discussed a loss function, an efficient technique for calculating error derivatives of the loss function and a optimization algorithm. Hence, now we are ready to train our softmax classifier. Actually, we will be building a handwritten digit recognizer using the softmax classifier.
MNIST Dataset
For building our handwritten digit recognizer, we are going to use MNIST dataset [2]. It is one of the most well-known datasets in the field of machine learning. MNIST dataset consists of 60,000 training and 10,000 testing black and white images of 28x28 pixels. Figure: 11 shows few sample images extracted by MNIST dataset.
Implementing Softmax Classifier in Python/Numpy
Program: 2 shows our SoftmaxLayer implementation. It consists of three methods:forward_pass, backward_pass and update_parameters. forward_pass is easy to understand and it first calculates pre-activation using XW+b. Next, pre-activations are converted to probabilities using Equation: 4 and finally, the empirical risk is estimated using Equation: 5.
However, backward_pass is a little bit completed. Therefore, we use the computational flow graph shown in Figure: 12 to understand backward_pass. In order to use gradient descent, we would like to calculate ∂E(θ)∂W and ∂E(θ)∂b. But, it is easy to consider data loss part first and calculate ∂L(θ)∂W and ∂L(θ)∂b. Derivatives of regularization loss w.r.t. model parameters can be added later to obtain ∂E(θ)∂W and ∂E(θ)∂b.
Consider a training example x and associated target y and for any given element i in the vectora:
Considering complete pre-activation vector a, we can write ∂L(θ)∂a=p−ek where ek is known as the one-hot vector. It contains 1 at kth position and 0 in all other places. Now using the chain rule, we can calculate ∂L(θ)∂W and ∂L(θ)∂b as given below.
Since, the regulation loss doesn't have bias term, ∂E(θ)∂b=(p−ek). However, it has the Wterm. Therefore, ∂E(θ)∂W=XT(p−ek)+λW. In the backward pass, the most complicated task is deriving equations for the error derivatives w.r.t. pre-activation (i.e. a) and model parameters (i.e. W, and b). Since, now we have those equations in our hands backward_pass is just converting Equation: 7 and Equation: 8 to Python/Numpy codes.
1 import numpy as np
2
3 class SoftmaxLayer:
4 """
5 SoftmaxLayer class represents teh Softmax layer.
6 Parameters
7 ----------
8 W : matrix W represents the input to output connection weight
9 b : bias vector
10 reg_parameter : regularization parameter of the L2 regularizer
11 """
12 def __init__(self, W, b, reg_parameter, num_unique_categories):
13 self.W = W
14 self.b = b
15 self.reg_parameter = reg_parameter
16 self.num_unique_categories = num_unique_categories
17
18 def forward_pass(self, x_input, y_input):
19 """
20 Performs forward pass and returns x_out_prob and total_loss
21 """
22 # calculates pre-activation using XW + b
23 x_hid = np.dot(x_input, self.W) + self.b
24
25 # subtract np.max(x_hid) from each element of the x_hid
26 # for numerical stability
27 # detials: http://www.iro.umontreal.ca/~bengioy/dlbook/numerical.html
28 x_hid = x_hid - np.max(x_hid)
29 # calculate output probabilities using Equation 4
30 x_out_prob = np.exp(x_hid) / np.sum(np.exp(x_hid), axis=1, keepdims=True)
31
32 # calculate data loss using -log_e(p_k)
33 num_examples = x_input.shape[0]
34 prob_target = x_out_prob[range(num_examples), y_input]
35 data_loss_vector = -np.log(prob_target)
36 data_loss = np.sum(data_loss_vector) / num_examples
37
38 reg_loss = self.reg_parameter * np.sum(self.W * self.W) * 0.5
39 total_loss = data_loss + reg_loss
40
41 return x_out_prob, total_loss
42
43 def backward_pass(self, x_out_prob, x_input, y_input):
44 """
45 Performs backward pass and calculates error derivatives.
46 """
47 # calculates error derivaties w.r.t. pre-activation
48 num_examples = y_input.shape[0]
49
50 # creating one-hot encoding matrix from y_input
51 one_hot = np.zeros((x_input.shape[0], self.num_unique_categories))
52 one_hot[range(num_examples), y_input] = 1
53
54 # Equation: 7
55 grad_output = x_out_prob - one_hot
56 grad_output = grad_output / num_examples
57
58 # Equation: 8
59 grad_w = np.dot(x_input.T, grad_output)
60
61 # adding regularization loss
62 grad_w = grad_w + self.reg_parameter*self.W
63
64 grad_b = np.sum(grad_output, axis=0, keepdims=True)
65 return grad_w, grad_b
66
67 def update_parameters(self, W, b):
68 """
69 Updating parameters of the model
70 """
71 self.W = W
72 self.b = b
Program 2: Softmax layer implemented in Python/Numpy
Source: https://github.com/upul/GNN/blob/master/lib/layers.py
Program: 3 shows our handwritten digit classifier. We have used batch Gradient Descent withBATCH_SIZE = 500. After completing 1000 training iterations, we managed to get ~0.904 accuracy using our testing set. Certainly, the accuracy of our handwritten digit recognizer is not that good. In upcoming tutorials, we will be implementing it again using more advanced techniques such as neural networks, deep neural networks and convolution neural networks.
1 import sys
2 import numpy as np
3
4 from layers import SoftmaxLayer
5 from datareader import load_mnist
6 from constants import *
7
8 x_train, y_train = load_mnist(MNIST_TRAINING_X , MNIST_TRAINING_y)
9 x_train = x_train.reshape(MNIST_NUM_TRAINING, MNIST_NUM_FEATURES)
10 y_train = y_train.reshape(MNIST_NUM_TRAINING)
11
12 # initialize parameters randomly
13 W = 0.1 * np.random.randn(MNIST_NUM_FEATURES, MNIST_NUM_OUTPUT)
14 b = np.zeros((1, MNIST_NUM_OUTPUT))
15
16 learning_rate = 0.1 # step size of the gradient descent algorithm
17 reg_parameter = 0.01 # regularization strength
18 softmax = SoftmaxLayer(W, b, reg_parameter, MNIST_NUM_OUTPUT)
19
20 num_iter = 1000
21 BATCH_SIZE = 500
22 for i in range(num_iter):
23
24 idx = np.random.choice(MNIST_NUM_TRAINING, BATCH_SIZE, replace=True)
25 x_batch = x_train[idx, :]
26 y_batch = y_train[idx]
27 output_prob, loss = softmax.forward_pass(x_batch, y_batch)
28 if i % 50 == 0:
29 print('iteration: {:3d} loss: {:3e}'.format(i, loss))
30 gradW, gradB = softmax.backward_pass(output_prob, x_batch, y_batch)
31 W = W - learning_rate*gradW
32 b = b - learning_rate*gradB
33 softmax.update_parameters(W, b)
34
35 # prediction
36 pred_prob = np.dot(x_train, W) + b
37 predicted_class = np.argmax(pred_prob, axis=1)
38 print('-----------------------------')
39 print('training setaccuracy: {:f}'.format(np.mean(predicted_class == y_train)))
40 print('-----------------------------')
41 print('\n')
42
43 x_test, y_test = load_mnist(MNIST_TESTING_X, MNIST_TESTING_y)
44 x_test = x_test.reshape(MNIST_NUM_TESTING, MNIST_NUM_FEATURES)
45 y_test = y_test.reshape(MNIST_NUM_TESTING)
46
47
48 pred_prob = np.dot(x_test, W) + b
49 predicted_class = np.argmax(pred_prob, axis=1)
50 print('-----------------------------')
51 print('training set accuracy: {:f}'.format(np.mean(predicted_class == y_test)))
52 print('-----------------------------')
Program 3: MINIST handwritten digit classification program using Softmax classifier
Source: https://github.com/upul/GNN/blob/master/chapter3/softmaxclassifier.py
In Program 3: we have used magic numbers for few variables such as learning_rate andreg_parameter. In machine learning, those parameters are known as hyper-parameters and estimating proper values for hyper-parameters is an important step in developing machine learning applications. In upcoming tutorials, we will discuss hyper-parameter estimating techniques in great detail.
Loading Dataset
You can download MNIST dataset from "THE MNIST DATABASE" site [2]. You need four gz files:train-images-idx3-ubyte.gz, train-labels-idx1-ubyte.gz, t10k-images-idx3-ubyte.gz, andt10k-labels-idx1-ubyte.gz. Unpack downloaded files into a convenient location and replace<PATH> in the constants.py file with the location where you have unpacked those four files.
Python codes related to this and upcoming tutorials reside in the GNN GitHub repository [4]. Therefore, please to clone it and add lib folder to your PYTHONPATH environment variable.
1 MNIST_TRAINING_X = '<PATH>/train-images.idx3-ubyte'
2 MNIST_TRAINING_y = '<PATH>/dataset_gnn_book/train-labels.idx1-ubyte'
3
4 MNIST_TESTING_X = '<PATH>/t10k-images.idx3-ubyte'
5 MNIST_TESTING_y = '<PATH>/t10k-labels.idx1-ubyte'
6
7 MNIST_NUM_TRAINING = 60000
8 MNIST_NUM_TESTING = 10000
9 MNIST_NUM_FEATURES = 784
10 MNIST_NUM_OUTPUT = 10
Program 2: constants.py contains parameters of our handwritten dataset.
Source: https://github.com/upul/GNN/blob/master/chapter3/constants.py
Conclusion
In this tutorial, we mainly focused on setting up the background. We started our discussion with the logistic regression and then moved to Neural Networks. Next, we introduced a powerful framework that we are going to use for training neural networks called "Empirical Risk Minimization". "Gradient Descent" was introduced as an algorithm for minimizing empirical risk. In order to work with gradient descent, we need derivatives of the lost function w.r.t. parameters of our neural network model (i.e. ∂E(θ)∂θ for all θ∈θ). For that purpose, we introduced a powerful technique called "Automatic Differentiation". Finally, we trained a shallow network called "Softmax Classifier" for recognizing handwritten digits.
In the next tutorial, we are going to extend our shallow network into a full-blown neural network. Also, we will discuss a bunch of new machine learning concepts and techniques in the next tutorial.
References
[1]. Machine learning: a probabilistic perspective (adaptive computation and machine learning series) KP Murphy - MIT Press. 2012
[2]. http://yann.lecun.com/exdb/mnist/
[3]. https://github.com/upul/GNN
Training (deep) Neural Networks Part: 1的更多相关文章
- Training Deep Neural Networks
http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html //转载于 Training Deep Neural ...
- CVPR 2018paper: DeepDefense: Training Deep Neural Networks with Improved Robustness第一讲
前言:好久不见了,最近一直瞎忙活,博客好久都没有更新了,表示道歉.希望大家在新的一年中工作顺利,学业进步,共勉! 今天我们介绍深度神经网络的缺点:无论模型有多深,无论是卷积还是RNN,都有的问题:以图 ...
- 论文翻译:BinaryConnect: Training Deep Neural Networks with binary weights during propagations
目录 摘要 1.引言 2.BinaryConnect 2.1 +1 or -1 2.2确定性与随机性二值化 2.3 Propagations vs updates 2.4 Clipping 2.5 A ...
- 论文翻译:BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or −1
目录 摘要 引言 1.BinaryNet 符号函数 梯度计算和累积 通过离散化传播梯度 一些有用的成分 算法1 使用BinaryNet训练DNN 算法2 批量标准化转换(Ioffe和Szegedy,2 ...
- 为什么深度神经网络难以训练Why are deep neural networks hard to train?
Imagine you're an engineer who has been asked to design a computer from scratch. One day you're work ...
- This instability is a fundamental problem for gradient-based learning in deep neural networks. vanishing exploding gradient problem
The unstable gradient problem: The fundamental problem here isn't so much the vanishing gradient pro ...
- [C4] Andrew Ng - Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
About this Course This course will teach you the "magic" of getting deep learning to work ...
- [Box] Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
目录 概 主要内容 LSGD Box 初始化 Box for Resnet 代码 Cyr E C, Gulian M, Patel R G, et al. Robust Training and In ...
- On Explainability of Deep Neural Networks
On Explainability of Deep Neural Networks « Learning F# Functional Data Structures and Algorithms is ...
随机推荐
- PowerShell 操作 Azure SQL Active Geo-Replication
前文中我们比较全面的介绍了 Azure SQL Database Active Geo-Replication 的主要特点和优势.接下来我们将从自动化的角度介绍如何通过 PowerShell 在项目中 ...
- 【转载】kafka 基础知识
1. kafka介绍 1.1. 主要功能 根据官网的介绍,ApacheKafka®是一个分布式流媒体平台,它主要有3种功能: 1:It lets you publish and ...
- Linux内核设计(第一周)——从汇编语言出发理解计算机工作原理
Linux内核设计(第一周)——从汇编语言出发理解计算机工作原理 计算机工作原理 汇编指令 C语言代码汇编分析 by苏正生 原创作品转载请注明出处 <Linux内核分析>MOOC课程htt ...
- SDN网路虚拟化平台概述
SDN网络虚拟化平台是介于物理网络拓扑以及控制器之间的中间层.虚拟化平台主要是完成物理网络拓扑到虚拟网络资源的映射,管理物理网络,并向租户提供相互隔离的虚拟网络. 为了实现网络虚拟化,虚拟化平台首先需 ...
- Java设计模式之单例模式(七种写法)
Java设计模式之单例模式(七种写法) 第一种,懒汉式,lazy初始化,线程不安全,多线程中无法工作: public class Singleton { private static Singleto ...
- DevExpress15.2+VS2015 破解、汉化
破解 下载有效的激活工具DEV15.X在VS2015 (亲测),地址 http://download.csdn.net/download/u011149525/9581176 解压后的注册说明: 感谢 ...
- java 栈和堆
- Nginx upstream 配置
1.轮询(默认)每个请求按时间顺序逐一分配到不同的后端服务器,如果后端服务器down掉,能自动剔除. 2.weight指定轮询几率,weight和访问比率成正比,用于后端服务器性能不均的情况.例如:u ...
- HGOI20181030 模拟题解
problem:给定一个序列,问你能不能通过一次交换把他弄成有序 sol: 对于0%的数据,满足数列是一个排列,然后我就打了这档分(自己瞎造的!) 对于100%的数据,显然我们先对数列进行排序然后上下 ...
- 【枚举Day1】20170529-2枚举算法专题练习 题解
题目: http://www.cnblogs.com/ljc20020730/p/6918328.html 评测器:cena 评测记录: 1.OneMoreRectangle 一个矩形 ●如果任意枚举 ...