PP: UMAP: uniform manifold approximation and projection for dimension reduction
From Tutte institute for mathematics and computing
Problem: dimension reduction
Theoretical foundations:
At a high level, UMAP uses local manifold approximations and patches together their local fuzzy simplicial set representations to construct a topological representation of the high dimensional data. Given some low dimensional representation of the data, a similar process can be used to construct an equivalent topological representation. UMAP then optimizes the layout of the data representation in the low dimensional space, to minimize the cross-entropy between the two topological representations.
解释:使用local manifold approximations and local fuzzy simplicial set presentations, 在高维空间上构建了一个拓扑表征topological representation,在低维空间上,同样构建一个等价的拓扑表征,之后运用交叉熵作为优化函数,来计算两个空间拓扑表征的差异性,从而使差异性最小化。
Construction of fuzzy topological representations:
1. approximating a manifold on which the data is assumed to lie;
2. constructing a fuzzy simplicial set representation of the approximated manifold.
解释:
疑问:一组高维数据究竟落在哪?高维数据应该用哪个空间进行衡量?Euclidean space, topological space, Riemannian space还是啥空间测量?还是应用不同的空间策略都能得到相似的结果?
1. approximating a manifold on which the data is assumed to lie,
Suppose the manifold is not known in advance and we wish to approximate geodesic distance on it. Let the input data be X = {X1 , . . . , XN }.
A Computational view of UMAP:
Two phases.
In the first phase, a particular weighted k-neighbour graph is constructed. In the second phase, a low dimensional layout of this graph is computed
1. weighted k-neighbour graph construction
Use the nearest neighbor descent algorithm of [1]
2. low dimensional layout
Use force-directed graph layout in low dimensional space.
Implementation and hyper-parameters:
Supplementary knowledge:
1. simplicial sets. 单纯集
In mathematics, a simplicial set is an object made up of "simplices单纯形" in a specific way. Simplicial sets are higher-dimensional generalizations of directed graphs, partially ordered sets and categories.
simplex: 单纯形,
In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron四边形 to arbitrary dimensions.
For example,
- a 0-simplex is a point,
- a 1-simplex is a line segment,
- a 2-simplex is a triangle,
- a 3-simplex is a tetrahedron,
- a 4-simplex is a 5-cell.
2. Hadamard product/ pointwise product
3. What is n-skeleton?
4. mathematical conception
Convergent Sequence, 收敛序列
The concept of a space is an extremely general and important mathematical construct. Members of the space obey certain addition properties. Spaces which have been investigated and found to be of interest are usually named after one or more of their investigators.
The everyday type of space familiar to most people is called Euclidean space. In Einstein's theory of Special Relativity, Euclidean three-space plus time (the "fourth dimension") are unified into the so-called Minkowski space. One of the most general type of mathematical spaces is the topological space.
Metric Space
A metric space is a set
with a global distance function (the metric
) that, for every two points
in
, gives the distance between them as a nonnegative real number
. A metric space must also satisfy
1.
iff
,
2.
,
3. The triangle inequality
.
Euclidean space:
Euclidean
-space, sometimes called Cartesian space or simply
-space, is the space of all n-tuples of real numbers, (
,
, ...,
). Such
-tuples are sometimes called points, although other nomenclature may be used (see below). The totality of
-space is commonly denoted
,.
Topological space:
A topological space, also called an abstract topological space, is a set
together with a collection of open subsets
that satisfies the four conditions:
1. The empty set
is in
.
2.
is in
.
3. The intersection of a finite number of sets in
is also in
.
4. The union of an arbitrary number of sets in
is also in
.
Triangle inequality
Let
and
be vectors. Then the triangle inequality is given by
![]() |
(1)
|
Equivalently, for complex numbers
and
,
![]() |
5. the difference between Euclidean space and Riemannian space

黎曼将二维曲面的球面几何、双曲几何(即罗巴切夫斯基几何)和欧氏几何统一在下述黎曼度规表达式中

这个弧长微分ds表达式中的α,是2维曲面的高斯曲率。当α=+1时,度规所描述的是三角形内角和E大于180°的球面几何;当α=-1时,所描述的是内角和E小于180°的双曲几何;当α=0,则对应于通常的欧几里德几何(图2)。黎曼引入度规的概念,将三种几何统一在一起,使得非欧几何焕发出蓬勃的生机。
Reference
1. Efficient k-nearest neighbor graph construction for generic similarity measures
2. 欧氏空间与黎曼空间
PP: UMAP: uniform manifold approximation and projection for dimension reduction的更多相关文章
- 局部敏感哈希-Locality Sensitivity Hashing
一. 近邻搜索 从这里开始我将会对LSH进行一番长篇大论.因为这只是一篇博文,并不是论文.我觉得一篇好的博文是尽可能让人看懂,它对语言的要求并没有像论文那么严格,因此它可以有更强的表现力. 局部敏感哈 ...
- Machine Learning/Random Projection
这次突然打算写点dimension reduction的东西, 虽然可以从PCA, manifold learning之类的东西开始, 但很难用那些东西说出好玩的东西. 这次选择的是一个不太出名但很有 ...
- NEU(Fst Network Embedding Enhancement via High Order Proximity Approximation)
NEU(Fst Network Embedding Enhancement via High Order Proximity Approximation) NEU:通过对高阶相似性的近似,加持快速网络 ...
- PP: Learning representations for time series clustering
Problem: time series clustering TSC - unsupervised learning/ category information is not available. ...
- Computer Graphics Research Software
Computer Graphics Research Software Helping you avoid re-inventing the wheel since 2009! Last update ...
- WikiBooks/Cg Programming
https://en.wikibooks.org/wiki/Cg_Programming Basics Minimal Shader(about shaders, materials, and gam ...
- 降维工具箱drtool
工具箱下载:http://leelab.googlecode.com/svn/trunk/apps/drtoolbox/ ——————————————————————————————————————— ...
- matlab 降维工具 转载【https://blog.csdn.net/tarim/article/details/51253536】
降维工具箱drtool 这个工具箱的主页如下,现在的最新版本是2013.3.21更新,版本v0.8.1b http://homepage.tudelft.nl/19j49/Matlab_Toolb ...
- 斯坦福CS课程列表
http://exploredegrees.stanford.edu/coursedescriptions/cs/ CS 101. Introduction to Computing Principl ...
随机推荐
- Spring Boot源码(四):Bean装配
为了演示Spring中对象是如何创建并放到spring容器中,这里新建一个maven项目: 其中pom.xm文件中只引入了一个依赖: <dependencies> <dependen ...
- 13 年的 Bug 调试经验总结 【转载】
在<Learning From Your Bugs>一文中,我写了关于我是如何追踪我所遇到的一些最有趣的bug.最近,我回顾了我所有的194个条目(从13岁开始),看看有什么经验教训是我可 ...
- 【剑指Offer】62、序列化二叉树
题目描述 请实现两个函数,分别用来序列化和反序列化二叉树 二叉树的序列化是指:把一棵二叉树按照某种遍历方式的结果以某种格式保存为字符串,从而使得内存中建立起来的二叉树可以持久保存.序列化可以基于先序. ...
- 【sklearn朴素贝叶斯算法】高斯分布/多项式/伯努利贝叶斯算法以及代码实例
朴素贝叶斯 朴素贝叶斯方法是一组基于贝叶斯定理的监督学习算法,其"朴素"假设是:给定类别变量的每一对特征之间条件独立.贝叶斯定理描述了如下关系: 给定类别变量\(y\)以及属性值向 ...
- Beego :布局页面
1:页面布局 一个html页面由:head部分,body部分,内部css,内部js,外联css,外联的js这几部分组成.因此,一个布局文件也就需要针对这些进行拆分. 2> 新建一个lay ...
- github的版本控制master和branch develop
一.git版本控制原理 master(主分支), develop(分支),虽然是主分支和分支,却是平级关系,develop可以理解为开发库,master为生产库. 本地版本:master, devel ...
- PAT (Basic Level) Practice (中文)1038 统计同成绩学生 (20 分)
本题要求读入 N 名学生的成绩,将获得某一给定分数的学生人数输出. 输入格式: 输入在第 1 行给出不超过 1 的正整数 N,即学生总人数.随后一行给出 N 名学生的百分制整数成绩,中间以空格分隔.最 ...
- Codeforces Round #622 (Div. 2).C2 - Skyscrapers (hard version)
第二次写题解,请多多指教! http://codeforces.com/contest/1313/problem/C2 题目链接 不同于简单版本的暴力法,这个数据范围扩充到了五十万.所以考虑用单调栈的 ...
- Android 基础知识 -- BroadcastReceiver
BroadcastReceiver 广播,是一种事件传递机制,可以跨应用进行事件传递(系统级). 在使用广播的时候,不宜添加过多的逻辑或者耗时(广播内不允许开辟线程)操作,超过10秒,导致ANR 1 ...
- jQuery---基本的选择器
基本选择器 名称 用法 描述 ID选择器 $(“#id”); 获取指定ID的元素 类选择器 $(“.class”); 获取同一类class的元素 标签选择器 $(“div”); 获取同一类标签的所有元 ...

