Comparing Differently Trained Models

At the end of the previous post, we mentioned that the solution found by L-BFGS made different errors compared to the model we trained with SGD and momentum. So, one question is what solution is better in terms of generalization and if they focus on different aspects, how do they differ for individual methods.

To make the analysis easier, but to be at least a little realistic, we train a linear SVM classifier (W, bias) for a “werewolf” theme. In other words, all movies with that theme are marked with “+1” and we sample random movies for the ‘rest’ that are marked with -1. For the features, we use the 1,500 most frequent keywords. All random seeds were fixed which means both models start at the same “point”.

In our first experiment, we only care to minimize the errors. The SGD method (I) uses standard momentum and a L1 penalty of 0.0005 in combination with mini-batches. The learning rate and momentum was kept at a fixed value. The L-BFGS method (II) minimizes the same loss function. Both methods were able to get an accuracy of 100% for the training data and the training has been stopped as soon as the error was zero.

(I) loss=1.64000 ||W||=3.56,
bias=-0.60811 (SGD)
(II) loss=0.04711 ||W||=3.75, bias=-0.58073
(L-BFGS)

As we can see, the L2 norm of the final weight vector is
similar, also the bias, but of course we do not care for absolute norms but
rather for the correlation of both solutions. For that reason, we converted both
weight vectors W to unit-norm and determined the cosine similarity: correlation
= W_sgd.T * W_lbfgs = 0.977.

Since we do not have any empirical data for such correlations, we analyzed
the magnitude of the features in the weight vectors. More precisely the top-5
most important features:

(I) werewolf=0.6652, vampire=0.2394,
creature=0.1886, forbidden-love=0.1392, teenagers=0.1372
(II)
werewolf=0.6698, vampire=0.2119, monster=0.1531, creature=0.1511,
teenagers=0.1279

If we also consider the top-12 features of both
models, which are pretty similar,

(I) werewolf, vampire, creature,
forbidden-love, teenagers, monster, pregnancy, undead, curse, supernatural,
mansion, bloodsucker
(II) werewolf, vampire, monster, creature, teenagers,
curse, forbidden-love, supernatural, pregnancy, hunting, undead,
beast

we can see some patterns here: First, a lot of the movies in
the dataset seem to combine the theme with love stories that may involve
teenagers. This makes sense because this is actually a very popular pattern
these days and second, vampires and werewolves are very likely to co-occur in
the same movie.

Those patterns were learned by both models, regardless of the actual
optimization method but with minor differences which can be seen by considering
the magnitude of the individual weights in W. However, as the correlation of the
parameters vectors confirmed, both solutions are pretty close together.

Bottom line, we should be careful with interpretations since the data at hand
was limited, but nevertheless the results confirmed that with proper
initializations and hyper-parameters, good solutions can be both achieved with
1st and 2nd order methods. Next, we will study the ability of models to
generalize for unseen data.

 

Comparing Differently Trained Models的更多相关文章

  1. 大规模视觉识别挑战赛ILSVRC2015各团队结果和方法 Large Scale Visual Recognition Challenge 2015

    Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Legend: Yellow background = winner in thi ...

  2. (转)The Evolved Transformer - Enhancing Transformer with Neural Architecture Search

    The Evolved Transformer - Enhancing Transformer with Neural Architecture Search 2019-03-26 19:14:33 ...

  3. (转) Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance

    Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance 2018-1 ...

  4. Keras vs. PyTorch

    We strongly recommend that you pick either Keras or PyTorch. These are powerful tools that are enjoy ...

  5. Understanding Tensorflow using Go

    原文: https://pgaleone.eu/tensorflow/go/2017/05/29/understanding-tensorflow-using-go/ Tensorflow is no ...

  6. How to handle Imbalanced Classification Problems in machine learning?

    How to handle Imbalanced Classification Problems in machine learning? from:https://www.analyticsvidh ...

  7. understanding backpropagation

    几个有助于加深对反向传播算法直观理解的网页,包括普通前向神经网络,卷积神经网络以及利用BP对一般性函数求导 A Visual Explanation of the Back Propagation A ...

  8. 论文翻译——Deep contextualized word representations

    Abstract We introduce a new type of deep contextualized word representation that models both (1) com ...

  9. 论文翻译——Character-level Convolutional Networks for Text Classification

    论文地址 Abstract Open-text semantic parsers are designed to interpret any statement in natural language ...

随机推荐

  1. 详细聊聊k8s deployment的滚动更新(二)

    一.知识准备 ● 本文详细探索deployment在滚动更新时候的行为 ● 相关的参数介绍:   livenessProbe:存活性探测.判断pod是否已经停止   readinessProbe:就绪 ...

  2. APP相关问题汇总

    APP试用过程中,我们的APP存在不少的问题,下面是一些试用者和我们自己发现的一些问题以及一些建议. 1.APP界面有些老气,界面之间过渡僵硬 2.在试用中会出现闪退情况 3.由于我们使用的是绝对布局 ...

  3. RabbitMQ-从基础到实战(3)— 消息的交换(上)

    转载请注明出处 0.目录 RabbitMQ-从基础到实战(1)— Hello RabbitMQ RabbitMQ-从基础到实战(2)— 防止消息丢失 RabbitMQ-从基础到实战(4)— 消息的交换 ...

  4. ElasticSearch 2 (13) - 深入搜索系列之结构化搜索

    ElasticSearch 2 (13) - 深入搜索系列之结构化搜索 摘要 结构化查询指的是查询那些具有内在结构的数据,比如日期.时间.数字都是结构化的.它们都有精确的格式,我们可以对这些数据进行逻 ...

  5. Linux命令(二十二) 改变文件权限 chomd

    目录 1.命令简介 2.常用参数介绍 3.实例 4.直达底部 命令简介 chmod 命令是用来改变文件权限或目录的命令,可以将指定文件的拥有着改为指定的用户或组,用户可以是用户名或用户ID,组可以是组 ...

  6. KM算法 PK 最小费用最大流

    用到了KM算法 ,发现自己没有这个模板,搜索学习一下上海大学final大神,http://www.cnblogs.com/kuangbin/p/3228861.html #include <st ...

  7. 三星a9上测试egret与pixi.js的渲染性能

    for (let i = 0; i < 500; i++) { let shape = new egret.Shape(); shape.graphics.beginFill(0xff0000) ...

  8. 【版本管理】git远程管理

    GitHub相关:       第1步:注册github账号,创建SSH Key. 在用户主目录下,看看有没有.ssh目录,如果有,再看看这个目录下有没有id_rsa和id_rsa.pub这两个文件, ...

  9. CUDA ---- GPU架构(Fermi、Kepler)

    GPU架构 SM(Streaming Multiprocessors)是GPU架构中非常重要的部分,GPU硬件的并行性就是由SM决定的. 以Fermi架构为例,其包含以下主要组成部分: CUDA co ...

  10. Several ports (8005, 8080, 8009) required by Tomcat

    转载:http://blog.csdn.net/tomoto_zh/article/details/51931945 先找到Java项目中  Servers找到Server.xml然后 把8005, ...