Comparing Differently Trained Models

At the end of the previous post, we mentioned that the solution found by L-BFGS made different errors compared to the model we trained with SGD and momentum. So, one question is what solution is better in terms of generalization and if they focus on different aspects, how do they differ for individual methods.

To make the analysis easier, but to be at least a little realistic, we train a linear SVM classifier (W, bias) for a “werewolf” theme. In other words, all movies with that theme are marked with “+1” and we sample random movies for the ‘rest’ that are marked with -1. For the features, we use the 1,500 most frequent keywords. All random seeds were fixed which means both models start at the same “point”.

In our first experiment, we only care to minimize the errors. The SGD method (I) uses standard momentum and a L1 penalty of 0.0005 in combination with mini-batches. The learning rate and momentum was kept at a fixed value. The L-BFGS method (II) minimizes the same loss function. Both methods were able to get an accuracy of 100% for the training data and the training has been stopped as soon as the error was zero.

(I) loss=1.64000 ||W||=3.56,
bias=-0.60811 (SGD)
(II) loss=0.04711 ||W||=3.75, bias=-0.58073
(L-BFGS)

As we can see, the L2 norm of the final weight vector is
similar, also the bias, but of course we do not care for absolute norms but
rather for the correlation of both solutions. For that reason, we converted both
weight vectors W to unit-norm and determined the cosine similarity: correlation
= W_sgd.T * W_lbfgs = 0.977.

Since we do not have any empirical data for such correlations, we analyzed
the magnitude of the features in the weight vectors. More precisely the top-5
most important features:

(I) werewolf=0.6652, vampire=0.2394,
creature=0.1886, forbidden-love=0.1392, teenagers=0.1372
(II)
werewolf=0.6698, vampire=0.2119, monster=0.1531, creature=0.1511,
teenagers=0.1279

If we also consider the top-12 features of both
models, which are pretty similar,

(I) werewolf, vampire, creature,
forbidden-love, teenagers, monster, pregnancy, undead, curse, supernatural,
mansion, bloodsucker
(II) werewolf, vampire, monster, creature, teenagers,
curse, forbidden-love, supernatural, pregnancy, hunting, undead,
beast

we can see some patterns here: First, a lot of the movies in
the dataset seem to combine the theme with love stories that may involve
teenagers. This makes sense because this is actually a very popular pattern
these days and second, vampires and werewolves are very likely to co-occur in
the same movie.

Those patterns were learned by both models, regardless of the actual
optimization method but with minor differences which can be seen by considering
the magnitude of the individual weights in W. However, as the correlation of the
parameters vectors confirmed, both solutions are pretty close together.

Bottom line, we should be careful with interpretations since the data at hand
was limited, but nevertheless the results confirmed that with proper
initializations and hyper-parameters, good solutions can be both achieved with
1st and 2nd order methods. Next, we will study the ability of models to
generalize for unseen data.

 

Comparing Differently Trained Models的更多相关文章

  1. 大规模视觉识别挑战赛ILSVRC2015各团队结果和方法 Large Scale Visual Recognition Challenge 2015

    Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Legend: Yellow background = winner in thi ...

  2. (转)The Evolved Transformer - Enhancing Transformer with Neural Architecture Search

    The Evolved Transformer - Enhancing Transformer with Neural Architecture Search 2019-03-26 19:14:33 ...

  3. (转) Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance

    Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance 2018-1 ...

  4. Keras vs. PyTorch

    We strongly recommend that you pick either Keras or PyTorch. These are powerful tools that are enjoy ...

  5. Understanding Tensorflow using Go

    原文: https://pgaleone.eu/tensorflow/go/2017/05/29/understanding-tensorflow-using-go/ Tensorflow is no ...

  6. How to handle Imbalanced Classification Problems in machine learning?

    How to handle Imbalanced Classification Problems in machine learning? from:https://www.analyticsvidh ...

  7. understanding backpropagation

    几个有助于加深对反向传播算法直观理解的网页,包括普通前向神经网络,卷积神经网络以及利用BP对一般性函数求导 A Visual Explanation of the Back Propagation A ...

  8. 论文翻译——Deep contextualized word representations

    Abstract We introduce a new type of deep contextualized word representation that models both (1) com ...

  9. 论文翻译——Character-level Convolutional Networks for Text Classification

    论文地址 Abstract Open-text semantic parsers are designed to interpret any statement in natural language ...

随机推荐

  1. Asp.Net_from标签中的Enctype=multipart/form-data作用

    ENCTYPE="multipart/form-data"用于表单里有图片上传. <form name="userInfo" method="p ...

  2. 图-图的表示、搜索算法及其Java实现

    1.图的表示方法 图:G=(V,E),V代表节点,E代表边. 图有两种表示方法:邻接链表和邻接矩阵 邻接链表因为在表示稀疏图(边的条数|E|远远小于|V|²的图)时非常紧凑而成为通常的选择. 如果需要 ...

  3. 人类又被AI碾压,这次是星际争霸

    还记得2017年,那个血洗围棋界的“阿尔法狗”吗?     这个由谷歌旗下 DeepMind 公司开发的 AI ,对阵世界顶尖围棋选手,打出完全碾压式的战绩: AlphaGo vs. 樊麾 - 5 : ...

  4. 第二次Scrum meeting

    第二次Scrum meeting 任务及其要求: 成员 12.11 12.12 陈谋 完成Tags的爬取工作(已完成) stackoverflow的问题抽取 卢惠明 视频链接的挖掘和整理(未完成) 视 ...

  5. Linux内核分析第三周总结

    构造一个简单的Linux系统MenuOS 操作系统的"两把宝剑":中断上下文的切换(保存现场和恢复现场).进程上下文的切换 Linux内核源代码简介 --------------- ...

  6. 课程回顾5in1

    提出过的问题 问题1:敏捷开发在现阶段急于使用或试行,会不会得到相反的结果? 整个开发流程在施行了一整个学期,有积极的影响,也有消极的影响.例如通过这个流程的实施,规划短期的项目进度,使得成员能逐步了 ...

  7. multer详解

    Express默认并不处理HTTP请求体中的数据,对于普通请求体(JSON.二进制.字符串)数据,可以使用body-parser中间件.而文件上传(multipart/form-data请求),可以基 ...

  8. node之文件的静态资源的托管

    /** * 文件的静态资源托管 */ let express = require('express'); let path =require('path'); let app = express(); ...

  9. [转贴] VIM 常用快捷键 --一直记不住

    vim 常用快捷键  原帖地址: https://www.cnblogs.com/tianyajuanke/archive/2012/04/25/2470002.html 1.vim ~/.vimrc ...

  10. Kivy 中文教程 实例入门 简易画板 (Simple Paint App):2. 实现绘图功能

    1. 理解 kivy 坐标系统 上一节中,咪博士带大家实现了画板程序的基础框架,以及一个基本的自定义窗口部件(widget).在上一节的末尾,咪博士留了一道关于 kivy 坐标系统的思考题给大家.通过 ...