Dimensionality in statistics refers to how many attributes a dataset has. For example, healthcare data is notorious for having vast amounts of variables (e.g. blood pressure, weight, cholesterol level). In an ideal world, this data could be represented in a spreadsheet, with one column representing each dimension. In practice, this is difficult to do, in part because many variables are inter-related (like weight and blood pressure).

Note: Dimensionality means something slightly different in other areas of mathematics and science. For example, in physics, dimensionality can usually be expressed in terms of fundamental dimensions like mass, time, or length. Inmatrix algebra, two units of measure have the same dimensionality if both statements are true:

  1. A function exists that maps one variable onto another variable.
  2. The inverse of the function in (1) does the reverse.

High Dimensional Data

High Dimensional means that the number of dimensions is staggeringly惊人地 high — so high that calculations become extremely difficult. With high dimensional data, the number of features can exceed the number of observations. For example, microarrays, which measure gene expression, can contain tens of hundreds of samples. Each sample can contain tens of thousands of genes.

1. What is the dimension of time series.

Classification of time series is a somewhat tricky matter. Most classification algorithms have an implicit assumption that the data you are classifying are stationary, and they usually work in vector spaces.

So there are two "things" that can be multidimensional here: your original time series and the result of your preprocessing before feeding data to a classifier.

To answer your question straight: a time series is multidimensional if it is a measurement of more than one variable throughout time, it is not multidimensional because of its length.
 
How would you go about classifying time series? Well, it depends on your intent, on the nature of the process you are measuring, etc. But in general terms, you will split your time series in small fragments and construct a multi-dimensional vector that represents each fragment, or you will fit a model (autoregressive, splines, whatever) and use the obtained parameters of the model as the vector representing that fragment. Additionally, you may synthesize new time series from the first one: derivatives, integratives, filtered time series, and build a truly multi-dimensional time series, that you will still need to preprocess.
 
The key is that classifiers will, in general, not treat time explicitely, you have to hide the temporal dimension from your time series and find a way to encode it in a single vector.

Supplementary knowledge:

1. downsample.降采样

2. curse of dimensionality维度灾难

当维数提高时,空间的体积提高太快,因而可用数据变得很稀疏。稀疏性对于任何要求有统计学意义的方法而言都是一个问题,为了获得在统计学上正确并且有可靠的结果,用来支撑这一结果所需要的数据量通常随着维数的提高而呈指数级增长。

wiki

3. 缩写iid: independent and identically distributed random variables. 独立同分布.

Reference:

1. 时间序列数据(2)——维度篇

2. What is meant by 'high dimensional' time series?

3. 万物皆Embedding,从经典的word2vec到深度学习基本操作item2vec

Dimensionality and high dimensional data: definition, examples, curse of..的更多相关文章

  1. CREATE TABLE——数据定义语言 (Data Definition Language, DDL)

    Sql语句分为三大类: 数据定义语言,负责创建,修改,删除表,索引和视图等对象: 数据操作语言,负责数据库中数据的插入,查询,删除等操作: 数据控制语言,用来授予和撤销用户权限. 数据定义语言 (Da ...

  2. How to Delete XML Publisher Data Definition Template

    DECLARE  -- Change the following two parameters  VAR_TEMPLATECODE  VARCHAR2(100) := 'CUX_CHANGE_RPT1 ...

  3. Hive 5、Hive 的数据类型 和 DDL Data Definition Language)

    官方帮助文档:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL Hive的数据类型 -- 扩展数据类型data_t ...

  4. sql基础之DDL(Data Definition Languages)

    好久没写SQL语句了,复习一下. DDL数据定义语言,DDL定义了不同的数据段.数据库.表.列.索引等数据库对象的定义.经常使用的DDL语句包含create.drop.alter等等. 登录数据:my ...

  5. 02-2--数据库MySQL:DDL(Data Definition Language:数据库定义语言)操作数据库中的表(二)

    DDL对数据库的操作:http://blog.csdn.net/baidu_37107022/article/details/72334560 DDL对数据库中表的操作 1)方法概览 2)演示 //创 ...

  6. 数据定义语言(DDL Data Definition Language)基础学习笔记

    创建数据库 create database if not exists STUDY character set utf8 ; 查看新建数据库的语句 SHOW CREATE DATABASE STUDY ...

  7. MySQL中的DDL(Data Definition Language,数据定义语言)

    create(创建表) 标准的建表语句: create table [模式名.]表名 ( #可以有多个列定义 columnName1 dataType [default expr(这是默认值)], . ...

  8. mysql数据库-mysql数据定义语言DDL (Data Definition Language)归类(六)

    0x01 创建数据库并指定字符集和排序规则 -- 三种实例写法 create database temptab2 character set utf8 collate utf8_general_ci; ...

  9. Seven Techniques for Data Dimensionality Reduction

    Seven Techniques for Data Dimensionality Reduction Seven Techniques for Data Dimensionality Reductio ...

随机推荐

  1. 灵活运用SQL Server2008 SSIS变量

      在SSIS开发ETL(Extract-Transform-Load),数据抽取.转换.装载的过程.我们需要自己定义变量 一.SSIS变量简介 SSIS(SQL Server Integration ...

  2. docker - 如何清理硬盘中无关占用

    背景 在使用docker进行容器化管理后会发现本次硬盘文件占用量在不断上升,并且即使是删除掉容器或者镜像也并不能释放掉对应的硬盘空间.本文将提供对应的docker命令用于真正释放掉该部分应被删除释放的 ...

  3. BLOB-数据库中用来存储二进制文件的字段类型

    BLOB (binary large object)----二进制大对象,是一个可以存储二进制文件的容器. 在计算机中,BLOB常常是数据库中用来存储二进制文件的字段类型. BLOB是一个大文件,典型 ...

  4. 程序员:我终于知道post和get的区别

    版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明. 本文链接:https://blog.csdn.net/kebi007/article/detail ...

  5. Python数值运算

    算术运算 a=10 b=2 + 加-两个对象相加 a+b输出结果12 - 减-得到负数或是一个数减去另一个数 a - b输出结果8 * 乘-两个数相乘或是返回一个被重复若干次的字符串 a * b输出结 ...

  6. Uva1635 二项式递推+质因子分解+整数因子分解

    题意: 给定n个数a1,a2····an,依次求出相邻两个数值和,将得到一个新数列,重复上述操作,最后结果将变为一个数,问这个数除以m的余数与那些数无关? 例如n=3,m=2时,第一次得到a1+a2, ...

  7. Android显示单元--像素、分辨率、颜色

    1.像素 老子曾说“天下难事必作于易,天下大事必作于细”,Android开发也是一样,再复杂的App也无非就是数百万个像素点的排列组合.像素虽然看似简单,但是里面大有学问.如果在开发时对像素单位不以为 ...

  8. Sublime text3 最新版破解,永久有效

    下载sublimeText3的安装包并安装(已经安装的可以忽略) 在hosts文件中添加:127.0.0.1    license.sublimehq.com(hosts文件地址:C:\Windows ...

  9. IIS搭建负载均衡WebFarm+Arr

    本文所述仅针对IIS7.0或更高版本 一.IIS简介 IIS( Internet Information Services),微软官方Windows平台上面web容器服务.支持http协议和ftp协议 ...

  10. 纪中集训2020.02.05【NOIP提高组】模拟B 组总结反思——【佛山市选2010】组合数计算,生成字符串 PPMM

    目录 JZOJ2290. [佛山市选2010]组合数计算 比赛时 之后 JZOJ2291. [佛山市选2010]生成字符串 比赛时 之后 JZOJ2292. PPMM 比赛时 之后 JZOJ2290. ...