[转载] Calculating Entropy
From: johndcook.com/blog
For a set of positive probabilities pi summing to 1, their entropy is defined as

(For this post, log will mean log base 2, not natural log.)
This post looks at a couple questions about computing entropy. First, are there any numerical problems computing entropy directly from the equation above?
Second, imagine you don’t have the pi values directly but rather counts ni that sum to N. Then pi = ni/N. To apply the equation directly, you’d first need to compute N, then go back through the data again to compute entropy. If you have a large amount of data, could you compute the entropy in one pass?
To address the second question, note that

so you can sum ni and ni log ni in the same pass.
One of the things you learn in numerical analysis is to look carefully at subtractions. Subtracting two nearly equal numbers can result in a loss of precision. Could the numbers above be nearly equal? Maybe if the ni are ridiculously large. Not just astronomically large — astronomically large numbers like the number of particles in the universe are fine — but ridiculously large, numbers whose logarithms approach the limits of machine-representable numbers. (If we’re only talking about numbers as big as the number of particles in the universe, their logs will be at most three-digit numbers).
Now to the problem of computing the sum of ni log ni. Could the order of the terms matter? This also applies to the first question of the post if we look at summing the pi log pi. In general, you’ll get better accuracy summing a lot positive numbers by sorting them and adding from smallest to largest and worse accuracy by summing largest to smallest. If adding a sorted list gives essentially the same result when summed in either direction, summing the list in any other order should too.
To test the methods discussed here, I used two sets of count data, one on the order of a million counts and the other on the order of a billion counts. Both data sets had approximately a power law distribution with counts varying over seven or eight orders of magnitude. For each data set I computed the entropy four ways: two equations times two orders. I convert the counts to probabilities and use the counts directly, and I sum smallest to largest and largest to smallest.
For the smaller data set, all four methods produced the same answer to nine significant figures. For the larger data set, all four methods produced the same answer to seven significant figures. So at least for the kind of data I’m looking at, it doesn’t matter how you calculate entropy, and you might as well use the one-pass algorithm to get the result faster.
[转载] Calculating Entropy的更多相关文章
- 【转载】7 Steps for Calculating the Largest Lyapunov Exponent of Continuous Systems
原文地址:http://sprott.physics.wisc.edu/chaos/lyapexp.htm The usual test for chaos is calculation of the ...
- [转载] TLS协议分析 与 现代加密通信协议设计
https://blog.helong.info/blog/2015/09/06/tls-protocol-analysis-and-crypto-protocol-design/?from=time ...
- 转载:scikit-learn学习之决策树算法
版权声明:<—— 本文为作者呕心沥血打造,若要转载,请注明出处@http://blog.csdn.net/gamer_gyt <—— 目录(?)[+] ================== ...
- 【转载】计算机视觉(CV)前沿国际国内期刊与会议
计算机视觉(CV)前沿国际国内期刊与会议这里的期刊大部分都可以通过上面的专家们的主页间接找到1.国际会议 2.国际期刊 3.国内期刊 4.神经网络 5.CV 6.数字图象 7.教育资源,大学 8.常见 ...
- 【转载】nginx 并发数问题思考:worker_connections,worker_processes与 max clients
注:这个文章主要是作者一直在研究nginx作为http server和反向代理服务器时候所谓最大的max_clients和 worker_connections的计算公式, 其实最后的结论也没有卡上公 ...
- ZOJ3827 ACM-ICPC 2014 亚洲区域赛的比赛现场牡丹江I称号 Information Entropy 水的问题
Information Entropy Time Limit: 2 Seconds Memory Limit: 131072 KB Special Judge Informatio ...
- 最大似然估计 (Maximum Likelihood Estimation), 交叉熵 (Cross Entropy) 与深度神经网络
最近在看深度学习的"花书" (也就是Ian Goodfellow那本了),第五章机器学习基础部分的解释很精华,对比PRML少了很多复杂的推理,比较适合闲暇的时候翻开看看.今天准备写 ...
- 基于Spark自动扩展scikit-learn (spark-sklearn)(转载)
转载自:https://blog.csdn.net/sunbow0/article/details/50848719 1.基于Spark自动扩展scikit-learn(spark-sklearn)1 ...
- softmax,softmax loss和cross entropy的区别
版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/u014380165/article/details/77284921 我们知道卷积神经网络(CNN ...
随机推荐
- 关闭Ubuntu 12.04的内部错误提示
刚装完系统后,才安装一个输入法重启电脑后,竟然就提示'内部错误'需要提交报告,什么状况? 发扬'不求甚解'的光荣传统,我又不搞Linux开发,对我来说只是个工具而已,工具出问题了解决问题即可不想劳神深 ...
- azure之MSSQL服务性能测试
azure给我们提供非常多的服务而这些服务可以让企业轻而易举地在上构建一个健壮的服务体系.但在使用azure的相关产品服务时还是应该对相关的服务有一些简单的性能了解才能更好的让企业购买适合自己的服务产 ...
- MERGE 用法
1.不带输出的SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER proc [dbo].[InsertShiGongJiao] ), @com ...
- jQuery的XX如何实现?——1.框架
源码链接:内附实例代码 jQuery使用许久了,但是有一些API的实现实在想不通.于是抽空看了jQuery源码,现在把学习过程中发现的一些彩蛋介绍给大家(⊙0⊙). 下面将使用简化的代码来介绍,主要关 ...
- 使用media Queries实现一个响应式的菜单
Media queries是CSS3引入的一个特性,使用它可以方便的实现各种响应式效果.在这个示例中我们将会使用media queries实现一个响应式的菜单.这个菜单会根据当前浏览器屏幕的大小变化而 ...
- 常用js方法整理common.js
项目中常用js方法整理成了common.js var h = {}; h.get = function (url, data, ok, error) { $.ajax({ url: url, data ...
- linux(CentOS)-nodejs项目部署
系统:CentOS 64位(查看系统位数请执行命令:getconf LONG_BIT) 1.到http://nodejs.org/download/找到系统对应的安装文件 执行如下命令: wget h ...
- Atitit. .net c# web 跟客户端winform 的ui控件结构比较
Atitit. .net c# web 跟客户端winform 的ui控件结构比较 .net 4.5 webform Winform 命名空间 System.Web.UI.WebControls ...
- paip.java 调用c++ dll so总结
paip.java 调用c++ dll so总结 ///////JNA (这个ms sun 的) 我目前正做着一个相关的项目,说白了JNA就是JNI的替代品,以前用JNI需要编译一层中间库,现在JNA ...
- Sql Server2005恢复备份数据库问题-Error:3154 3219
解决办法: 1.新建一个同名数据库New_HeasySchoolDB2.执行下面的sql语句: restore database New_HeasySchoolDB from disk = 'D:/N ...