Extending Markov to Hidden Markov

a tutorial on hidden markov modelsHidden Markov Modelshidden markov models tutorialmarkov chainsmarkov chains examples,markov chains tutorialmarkov models

 

When we talked about Markov Process and training the Markov Model in previous blog posts, we assumed that we knew the states of the process. That’s often true, and hence Markov model is great tool for predicting and modeling systems where discrete events happen in discrete time-steps. There are some special cases though where we are interested in states underlying the events observed, and events do not map to states in one-to-one or one-to-many fashion as had been requirement so far.

Consider example we have described before – about observing customer’s purchase events at a retail store. If we want to predict next purchase product from sequences of past purchases, Markov model may do good job. But what if you want to remind customer about purchase? Reminder is only helpful if customer is out-of-stock on that product. However, observed purchase event doesn’t necessarily directly correlate to out-of-stock state. Sometimes customer will buy when he is out-of-stock, other times he will be buy because he is at the store, or wants to stock up, or has some discount offer. Sometimes customer may not buy even when he is out-of-stock because he didn’t get time to do so. In this example, true state of interest is out-of-stock but observed state is purchase or no purchase.

By way of another example, we compared customer’s clicks on bank’s website to Markov “memoryless” process. This is good enough if want to improve web layout, but not good enough if want to figure out why customer is visiting website in the first place. Intent behind clicks is the state we are interested in, but all we observe is webpage visits. Maybe she is interested in finding interest rates, or maybe looking for nearest ATM, or maybe wants to read up about new pension plan. Cases like these call for Hidden Markov Model (HMM) where unknown hidden states are of interest but correspond to multiple observed states1.

HMM may be represented as directed graph with hidden edges2.

Apart from Transition Matrix which governs probability of state transition from one hidden state to another,HMM also involves an Emission Matrix which governs probability of observation of observed state from underlying hidden state. Goal of HMM learning is estimation of both these matrices.

Learning HMM

Learning HMM isn’t as simple as learning MM is, but we will give schematic overview in this post.

First, like with Markov Process, we need to know number of observed states, which is obvious from data. However, we also need to make assumption about order of Markov process, and number of hidden states – both are not available. Cross-validation and Akaike Information Criteriondiscussed in previous post come handy. Here, we need to train multiple HMMs with varying number of hidden states and varying order of Markov process and select simplest model which explains training data well.

We can gain understanding of HMM training algorithm by following mental exercises:

Exercise 1 – If we knew Emission and Transition Probabilities, and an observed sequence, can we compute probability of observing that sequence?

Let’s say our observed sequence is S1-S2-S3-... and underlying hidden sequence is H1-H2-H3-... , then probability of observing the given observed sequence is

P(H1)*B(S1|H1)*A(H2|H1)*B(S2|H2)*A(H3|H2)*B(S3|H3)*....

Where B(Si|Hj) is Emission Probability of observing state when hidden state is , and is Transition Probability of transitioning to hidden state from hidden state , assuming Markov Process of order one. However, since we don’t know true underlying sequence we can do so over all combinations of underlying sequences3 and sum over computed probability.

Exercise 2 – If we knew Emission and Transition Probabilities, and an observed sequence, can we make best guess about underlying hidden sequence?

If we compute probabilities of observing given state sequence under all possible combinations of hidden state sequences, one of the hidden sequences will correspond to highest probability of observed sequence. In absence of any other information, that is our best guess of underlying sequence under Maximum Likelihood Estimation method.

Exercise 3 – Given number of observed sequences, and assumption on number of observed and hidden states, can we make best estimate of Emission and Transition Probabilities which will explain our sequences?

This, of course, is HMM training. Here we build on previous steps, and starting with random probabilities, compute joint probabilities of observing all observed sequences (as in, again, Maximum Likelihood Estimation), and search for right set of Emission and Transition probabilities which will maximize this joint probability.

We have intentionally skipped mathematics of the algorithm – called Viterbi Algorithm – for HMM training, but interested reader is encouraged to go to classical paper by Lawrence R Rabiner or slightly simpler variant by Mark Stamp. However, many software packages provide easy implementation of HMM training (depmixS4 in R).

In last three posts, we discussed practical cases when sequence modeling through Markov process may come handy, and provided overview of training Markov Models. Markov models are often easy to train, interpret and implement and can be relevant in many business problems with right design and state identification.


1If hidden states correspond to observed states in one-to-one map, what happens?
2This is not completely true representation for this graph, because underlying process graph is very simple, and making anything more realistic means ugly picture. However, idea is that K number of hidden states map to Nnumber of observed state in many-to-many fashion.
3For K hidden states and L length of observed sequence, we will have KL combinations.


Other Articles by the same author

Curse Dimensionality

Semi-Supervised Clustering

Other Related Links that you may like

Overview of Text Mining

Role of Business Analyst

Extending Markov to Hidden Markov的更多相关文章

  1. [Bayesian] “我是bayesian我怕谁”系列 - Markov and Hidden Markov Models

    循序渐进的学习步骤是: Markov Chain --> Hidden Markov Chain --> Kalman Filter --> Particle Filter Mark ...

  2. [综]隐马尔可夫模型Hidden Markov Model (HMM)

    http://www.zhihu.com/question/20962240 Yang Eninala杜克大学 生物化学博士 线性代数 收录于 编辑推荐 •2216 人赞同 ×××××11月22日已更 ...

  3. PRML读书会第十三章 Sequential Data(Hidden Markov Models,HMM)

    主讲人 张巍 (新浪微博: @张巍_ISCAS) 软件所-张巍<zh3f@qq.com> 19:01:27 我们开始吧,十三章是关于序列数据,现实中很多数据是有前后关系的,例如语音或者DN ...

  4. 隐马尔可夫模型(Hidden Markov Model,HMM)

    介绍 崔晓源 翻译 我们通常都习惯寻找一个事物在一段时间里的变化规律.在很多领域我们都希望找到这个规律,比如计算机中的指令顺序,句子中的词顺序和语音中的词顺序等等.一个最适用的例子就是天气的预测. 首 ...

  5. 理论沉淀:隐马尔可夫模型(Hidden Markov Model, HMM)

    理论沉淀:隐马尔可夫模型(Hidden Markov Model, HMM) 参考链接:http://www.zhihu.com/question/20962240 参考链接:http://blog. ...

  6. Hidden Markov Model

    Markov Chain 马尔科夫链(Markov chain)是一个具有马氏性的随机过程,其时间和状态参数都是离散的.马尔科夫链可用于描述系统在状态空间中的各种状态之间的转移情况,其中下一个状态仅依 ...

  7. NLP —— 图模型(一)隐马尔可夫模型(Hidden Markov model,HMM)

    本文简单整理了以下内容: (一)贝叶斯网(Bayesian networks,有向图模型)简单回顾 (二)隐马尔可夫模型(Hidden Markov model,HMM) 写着写着还是写成了很规整的样 ...

  8. 隐马尔可夫模型(Hidden Markov Model)

    隐马尔可夫模型(Hidden Markov Model) 隐马尔可夫模型(Hidden Markov Model, HMM)是一个重要的机器学习模型.直观地说,它可以解决一类这样的问题:有某样事物存在 ...

  9. Tagging Problems & Hidden Markov Models---NLP学习笔记(原创)

    本栏目来源于对Coursera 在线课程 NLP(by Michael Collins)的理解.课程链接为:https://class.coursera.org/nlangp-001 1. Taggi ...

随机推荐

  1. 解决SSH登录用户执行的命令部分环境变量参数不生效的问题

    问题概况 linux机器在/etc/profile配置完成环境变量后,SSH到目标机器执行命令,但是获取不到已配置的环境变量值. 例如场景: 在/etc/profile配置了http代理 export ...

  2. IE=edge 让浏览器使用最新的渲染模式

    Bootstrap不支持IE的兼容模式.为了让IE浏览器运行最新的渲染模式,建议将此 <meta> 标签加入到你的页面中: <metahttp-equiv="X-UA-Co ...

  3. Android Studio发布Release版本之坑--Unknown host 'd29vzk4ow07wi7.cloudfront.net'

    使用Android Studio发布Release版本时,出现Unknown host 'd29vzk4ow07wi7.cloudfront.net'...错误. 解决方法:修改本机的DNS为8.8. ...

  4. Vim操作指南

    vim具有6种基本模式和5种派生模式. 基本模式 普通模式 插入模式 可视模式 选择模式 命令行模式 Ex模式 派生模式 操作符等待模式 插入普通模式 插入可视模式 插入选择模式 替换模式 1.移动光 ...

  5. wpf-典型的mvvm模式通用中小型管理系统框架0

    之前就一直在想着写这么一系列博客,将前段时间(也算有点久了)自己编写的一套框架分享下,和园子里的诸位大牛交流交流,奈何文思枯竭,提键盘而无从敲起,看来只有coding时才不会有这种裤子都脱了,才发现对 ...

  6. Leetcode题库——46.全排列

    @author: ZZQ @software: PyCharm @file: permute.py @time: 2018/11/15 19:42 要求:给定一个没有重复数字的序列,返回其所有可能的全 ...

  7. SDN网路虚拟化平台概述

    SDN网络虚拟化平台是介于物理网络拓扑以及控制器之间的中间层.虚拟化平台主要是完成物理网络拓扑到虚拟网络资源的映射,管理物理网络,并向租户提供相互隔离的虚拟网络. 为了实现网络虚拟化,虚拟化平台首先需 ...

  8. Spark 实践——基于 Spark MLlib 和 YFCC 100M 数据集的景点推荐系统

    1.前言 上接 YFCC 100M数据集分析笔记 和 使用百度地图api可视化聚类结果, 在对 YFCC 100M 聚类出的景点信息的基础上,使用 Spark MLlib 提供的 ALS 算法构建推荐 ...

  9. 关于vue-eslint自动补全代码,以及自动生成雪碧图

    一.配置eslint module.exports={ "printWidth": 240, //一行的字符数,如果超过会进行换行,默认为80 "tabWidth&quo ...

  10. C#动态对象(dynamic)示例(实现方法和属性的动态)

    C#的动态对象的属性实现比较简单,如果要实现动态语言那种动态方法就比较困难,因为对于dynamic对象,扩展方法,匿名方法都是不能用直接的,这里还是利用对象和委托来模拟这种动态方法的实现,看起来有点J ...