操作代码:https://satijalab.org/seurat/

依赖的算法

CCA

CANONICAL CORRELATION ANALYSIS | R DATA ANALYSIS EXAMPLES

MNN

The Mutual Nearest Neighbor Method in Functional Nonparametric Regression

Comprehensive Integration of Single-Cell Data

实在是没想到,这篇seurat的V3里面的整合方法居然发在了Cell主刊。

果然:大佬+前沿领域=无限可能

可以看到bioRxiv上是November 02, 2018发布的,然后Cell主刊June 06, 2019正式发表。

方法的创意应该在2017年底就有了,那时候我才刚来做single cell。

Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters.

As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function.

Here, we develop a strategy to “anchor” diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.

After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations.

Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns.

Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.

亮点1:通过锚定的方法来整合多种数据,不同平台,不同形态。

亮点2:同时能整合scATAC-seq数据

亮点3:空间基因表达模式分析

至今为止的单细胞重大突破:

  • immunophenotype (Stoeckius et al., 2017; Peterson et al., 2017),
  • genome sequence (Navin et al., 2011; Vitak et al., 2017),
  • lineage origins (Raj et al., 2018; Spanjaard et al., 2018; Alemany et al., 2018),
  • DNA methylation landscape (Luo et al., 2018; Kelsey et al., 2017),
  • chromatin accessibility (Cao et al., 2018; Lake et al., 2018; Preissl et al., 2018),
  • spatial positioning

单细胞数据整合的两大问题:

  1. how can disparate single-cell datasets, produced across individuals, technologies, and modalities be harmonized into a single reference
  2. once a reference has been constructed, how can its data and meta-data improve the analysis of new experiments?

These questions are well suited to established fields in statistical learning.

第二个问题就类似reference assembly (Li et al., 2010) and mapping (Langmead et al., 2009) for genomic DNA sequences

identify shared subpopulations across datasets

  • canonical correlation analysis (CCA)
  • mutual nearest neighbors (MNNs)

第二种整合的问题:

  • only a subset of cell types are shared across datasets
  • significant technical variation masks shared biological signal.

这篇文章解决了三个问题:

  • reference assembly
  • transfer learning for transcriptomic, epigenomic, proteomic,
  • spatially resolved single-cell data

核心凝练

Through the identification of cell pairwise correspondences between single cells across datasets, termed ‘‘anchors,’’ we can transformdatasets into a shared space, even in the presence of extensive technical and/or biological differences.

This enables the construction of harmonized atlases at the tissue or organismal scale, as well as effective transfer of discrete or continuous data from a reference onto a query dataset.

一些单细胞的常识

false negatives (‘‘drop-outs’’) due to transcript abundance and protocol-specific biases

expression derived from fluorescence in situ hybridization (FISH) exhibits probe-specific noise due to sequence specificity and background binding

结果

Identifying Anchor Correspondences across Single-Cell Datasets

基本的假设:we assume that there are correspondences between datasets and that at least a subset of cells represent a shared biological state.

Constructing Integrated Atlases at the Scale of Organs and Organisms

评估不同工具在整合不同平台和不同subtype数据的准确性

Leveraging Anchor Correspondences to Classify Cell States

开始整合case和control,cell state

Projecting Cellular States across Modalities

整合scATAC-seq

Transferring Continuous and Multimodal Data across Experiments

Predicting Protein Expression in Human Bone Marrow Cells

CITE-seq,预测蛋白表达

Spatial Mapping of Single-Cell Sequencing Data in the Mouse Cortex

小鼠大脑皮层的空间比对


what's my problem?

我也早就意识到这是个重要的有价值的问题了,但是孤军奋战,没有真正的提炼这个问题,也没有深入思考和理解,更没有想去利用统计思维来解决这个问题。

可以看到大佬早就看到这个有价值的问题,而且已经召集人马来讨论、思考,用统计学的方法系统的提出了自己的解决方案,也最终凭借自己的实力和名气把结果发表在最顶级的杂志上了。

是什么在阻挠我,让我一直在原地打转?

单细胞数据整合方法 | Comprehensive Integration of Single-Cell Data的更多相关文章

  1. 单细胞数据normalization方法 | SCTransform

    SCTransform Normalization and variance stabilization of single-cell RNA-seq data using regularized n ...

  2. 单细胞参考文献 single cell

    许多分析软件 : https://github.com/seandavi/awesome-single-cell#software-packages Smart-seq.CEL-seq.SCRB-se ...

  3. 四种数据持久化方式(下) :SQLite3 和 Core Data

    在上文,我们介绍了iOS开发中的其中2种数据持久化方式:属性列表.归档解档. 本节将继续介绍另外2种iOS持久化数据的方法:数据库 SQLite3.Core Data 的运用: 在本节,将通过对4个文 ...

  4. 单细胞测序技术(single cell sequencing)

    单细胞测序技术(single cell sequencing) 2018-03-02 11:02   来源: 一呼百诺  点击次数:6587关键词:   前言 单细胞生物学最近几年是非常热门的研究方向 ...

  5. Kettle学习系列之数据仓库、数据整合、ETL、ELT和EII之间的区别?

    不多说,直接上干货! 在数据仓库领域里,的一个重要概念就是数据整合(data intergration).数据整合它就是把不同数据库中的数据整合到一起,对外提供统一的数据视图. 数据整合最典型的案例就 ...

  6. spring与mybatis三种整合方法

    spring与mybatis三种整合方法 本文主要介绍Spring与Mybatis三种常用整合方法,需要的整合架包是mybatis-spring.jar,可通过链接 http://code.googl ...

  7. ThinkPHP + Discuz 整合方法

    ThinkPHP + Discuz 整合方法以下是Discuz6的整合方法,discuz5请按照里面的说明进行相应的修改,也可以使用了. 1.在项目的action目录下信件PublicAction.c ...

  8. FU-A分包方式,以及从RTP包里面得到H.264数据和AAC数据的方法。。

    [原创] RFC3984是H.264的baseline码流在RTP方式下传输的规范,这里只讨论FU-A分包方式,以及从RTP包里面得到H.264数据和AAC数据的方法. 1.单个NAL包单元 12字节 ...

  9. 用IBM WebSphere DataStage进行数据整合: 第 1 部分

    转自:http://www.ibm.com/developerworks/cn/data/library/techarticles/dm-0602zhoudp/ 引言 传统的数据整合方式需要大量的手工 ...

随机推荐

  1. Mysql慢查询日志以及优化

    慢查询日志设置 当语句执行时间较长时,通过日志的方式进行记录,这种方式就是慢查询的日志. 1.临时开启慢查询日志(如果需要长时间开启,则需要更改mysql配置文件) set global slow_q ...

  2. 冠捷显示成功的信息化建设(MES应用案例)

    企业介绍 冠捷科技集团是驰誉全球的大型高科技跨国企业,产品包括彩色显示器( CRT monitor ).液晶显示器( LCD monitor ).液晶电视( LCD-TV )与等离子电视( PDP ) ...

  3. scrapy爬虫中间件-urlLength

    浏览器里面能输入的最大url是有限制的 safari 最多 一万多 ie最少  2083 urllength中间件源码 谷歌和火狐正常 八千多 """ Url Lengt ...

  4. 介于JAVAswing和Socket写的聊天室

    在厦门的第一阶段给我们复习了JAVASE基础,第一阶段的小玩具叫我们自选题材,我自己选了聊天室这个内容,这个小玩具无论是线程,还是网络编程,都会涉及到,比较有综合性,所以我选了这个: 这是我的包体结构 ...

  5. video基础介绍&封装react-video基础组件,ES6

    好几个月没有写博客了,人都赖了,今天抽了一点时间把最近项目react中video整理了一下(感觉这个以后用的活比较多) 1.前三部部分详细归纳了video的基础知识,属性和功能: 2.第四部分是封装了 ...

  6. DP-LIS and LCS

    最长上升子串 f[i]=f[I-1]+1(f[I]>f[I-1]) f[I]=1;(f[I]<=f[I-1]) 输出max(f(I)) 最长上升子序列 f[I]=max(f[I],f[j] ...

  7. 第二次作业之——AchaoCalculator

    AchaoCalculator(阿超计算器) GIT地址 我的GitHub GIT用户名 Pastrain 学号后五位 62213 博客地址 我的博客地址 作业链接 作业内容 Part.1 配置VS中 ...

  8. MSP430 LaunchPad开发板入门教程集合

    MSP-EXP430G2开发板是德州仪器提供的开发工具,也称为LaunchPad,用于学习和练习如何使用其微控制器产品.该开发板属于MSP430 Value Line系列,我们可以对所有MSP430系 ...

  9. Manthan, Codefest 19 (open for everyone, rated, Div. 1 + Div. 2)-D. Restore Permutation-构造+树状数组

    Manthan, Codefest 19 (open for everyone, rated, Div. 1 + Div. 2)-D. Restore Permutation-构造+树状数组 [Pro ...

  10. pyinstaller打包多个py文件和去除cmd黑框

    1.打包多个py文件并且去除cmd黑框 格式:pyinstaller.exe -F 路径\文件名.py空格路径\文件名.py空格--noconsole pyinstaller.exe -F ui.py ...