Graph Analytics 有哪些类型

  

  node type (labels)

  

node schema: attributes 组成了schema.

  

同样的, Edge也有 Edge Type 和Edge Schema

  

如果是一个完整的图,会包含以下信息

  

weight 权重,比如两点之间的 距离

  

1. Path Analytics

一些定义:

  walk - 图上随便走的轨迹

  path - 没有中间node重复的walk,比如 A-B-C-D-B-E 这个属于walk 不属于 path, path 是constrained 的walk

  Cycle - 3个以上节点组成的环

  Acyclic - 无环图, 比如最常见的说法 DAG- directed acyclic graph.

  trail - 没有edge 重复的walk

  dimeter of graph - 从任何一个node出发, 到任何另一个node, 走最短路径,最后得到的最大hops 就是 dimeter

  

  

  最短路径问题

  Dijkstra's Algorithm

  

  

  上面是简化版本的,现在考虑加上constrain

  

  

Dijkstra 算法在大数据情况下表现不好,计算量很大

2. Connectivity Analytics

  名词 degree 指这个Node有多少个连接的edge

  主要关注图的 robust 和 相似性

  

  

  连接度 Connectivity

  下图是基于node 的connectivity, 去掉三个node 就隔开了图,所有connectivity 等于 3

  

  下图是基于edge 的connectivity, 去掉4个edge就隔开了图,所有说 edge connectivity = 4.  

  

node degree: 就是有多少条edge连接到node上

    

  • Define a node's in-degree (# of incoming edges) and out-degree (# of outgoing edges)
  • Describe a process for calculating the similarity of two graphs by finding the following for each graph and comparing them: find the degree of each node in a graph, then create a histogram of how many nodes have a certain degree.
  • Define listener nodes (greater in-degree than out-degree), talker nodes (greater out-degree than in-degree) and communicator nodes (high in-degree and high out-degree).

node degree 进一步分成 indegree 和 outdegree

  

  

3. Community Analytics

  • List the three types of analytics questions asked about communities (static, evolution, and predictions) and identify a real world example of a question in those categories.
  • Calculate the internal degree and external degree of nodes in a community.
  • Recognize that, for a community to really be a community, the relative number of total edges within that community (cluster) should be high relative to the total number of edges from nodes in the community to nodes outside it.

  

  

  

  

 

怎么从network里面找到 community呢?这里介绍了两种方法,一种是关注 Local property

Clique - 小集团的意思,每个node都和其他node有连接

  

实际上,find 大于3或者4的clique比较难,所有我们要relax clique的定义,有两种relaxation, 一种基于distance, 一种基于density, 其中两种基于distance的定义是n-clique 和 n-clan, 基于density的是k-core.

下面是n-clique的说明,Mike 不属于绿色点形成的2-clique

  

n-clan 纠正了这种情况

  

  

前面是讲基于 local property 的community 发现算法,接着将基于 grobal property 的community 发现算法

By the end of this video you will be able to...

  • Describe modularity conceptually -- a graph is more modular if there are more edges in that community than would be likely if edges were randomly assigned.
  • Given two graphs, identify visually which is more modular.
  • Identify the 6 ways in which communities are often tracked over time.
  • Identify which of the 6 changes would happen when two companies merge and when a company spins off a start-up.

grobal property 特别关注的property 是 modularity

然后讲到了基于modularity 的community 发现方法叫 Louvain method. 具体详情见 https://www.youtube.com/watch?v=dGa-TXpoPz8

  

  

相对于static graph, 经常见的还有 evolving graph. 有6种类型的evolving method.

  

4. Centrality Analytics

By the end of this video you will be able to...

  • Define an influencer in a social network as a node (e.g. person) who can reach all other nodes quickly.
  • Identify the 2 key (central) player problems: Which nodes when removed will maximally disrupt a network and which set of nodes can reach (almost) all other nodes in a network.
  • Identify the 4 types of centrality discussed: degree, group, closeness, and betweenness.
  • Identify which type of centrality you would want to identify in the following 2 cases: when you have information you want to inject quickly into a graph that follows shortest paths for dissemination and when you have a commodity that flows through a network.

  

  

两个概念 Centrality and Centralization

Centralization is for a network. As the number of nodes in a network with high centrality increases (orange nodes), then the centralization of a network decreases -- because there is less variation in the centrality values of the nodes in the network.

  

有超过30种的测量centrality 的方法

  

    

  

  

讲 connectivity 时候提到了power law 算法,不知道具体是什么

Ref

https://afteracademy.com/blog/dijkstras-algorithm  Dijkstras 算法有图解,很清楚

Coursera, Big Data 5, Graph Analytics for Big Data, Week 3的更多相关文章

  1. 【Python学习笔记】Coursera课程《Using Python to Access Web Data》 密歇根大学 Charles Severance——Week6 JSON and the REST Architecture课堂笔记

    Coursera课程<Using Python to Access Web Data> 密歇根大学 Week6 JSON and the REST Architecture 13.5 Ja ...

  2. 【Python学习笔记】Coursera课程《Using Python to Access Web Data 》 密歇根大学 Charles Severance——Week2 Regular Expressions课堂笔记

    Coursera课程<Using Python to Access Web Data > 密歇根大学 Charles Severance Week2 Regular Expressions ...

  3. 论文解读(SimGRACE)《SimGRACE: A Simple Framework for Graph Contrastive Learning without Data Augmentation》

    论文信息 论文标题:SimGRACE: A Simple Framework for Graph Contrastive Learning without Data Augmentation论文作者: ...

  4. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  5. Intel® Threading Building Blocks (Intel® TBB) Developer Guide 中文 Parallelizing Data Flow and Dependence Graphs并行化data flow和依赖图

    https://www.threadingbuildingblocks.org/docs/help/index.htm Parallelizing Data Flow and Dependency G ...

  6. SSIS Data Flow 的 Execution Tree 和 Data Pipeline

    一,Execution Tree 执行树是数据流组件(转换和适配器)基于同步关系所建立的逻辑分组,每一个分组都是一个执行树的开始和结束,也可以将执行树理解为一个缓冲区的开始和结束,即缓冲区的整个生命周 ...

  7. Data Being Added Conflicts with Existing Data

    While developing a page with multiple scrolls levels, and especially when using a grid, you may get ...

  8. Competing in a data science contest without reading the data

    Competing in a data science contest without reading the data Machine learning competitions have beco ...

  9. data Mining with Weka: Trailer More Data Mining with Weka 用weka 进行数据挖掘 Weka 用weka 进行更多数据挖掘

    https://www.youtube.com/user/WekaMOOC 大学公开课  视频教程 weka 入门教程 data Mining with Weka: Trailer  More Dat ...

  10. Use Dynamic Data Masking to obfuscate your sensitive data

    Data privacy is a major concern today for any organization that manages sensitive data or personally ...

随机推荐

  1. 算法金 | 推导式、生成器、向量化、map、filter、reduce、itertools,再见 for 循环

    大侠幸会,在下全网同名「算法金」 0 基础转 AI 上岸,多个算法赛 Top 「日更万日,让更多人享受智能乐趣」 不要轻易使用 For 循环 For 循环,老铁们在编程中经常用到的一个基本结构,特别是 ...

  2. Linux 更新网络时间

    下载包 yum install -y ntpdate 同步网络时间 ntpdate 0.asia.pool.ntp.org 若上面的时间服务器不可用,也可以改用如下服务器进行同步: time.nist ...

  3. 洛谷P1020

    又是一道做的很麻的题,准确来说感觉这不是一道很好的dfs题,没有体现dfs的一些特点 反而感觉是在考察dp,刚开始也是按照我的思路交了3次都没过 原本以为所选的数应该都是由上一次的最大值推出来的,后面 ...

  4. 【解决方案】智能UI自动化测试

    你的UI自动化追得上业务的变更和UI更迭吗?当今瞬息万变的时代,成千上万的App围绕着现代人生活的点点滴滴.为了满足用户的好的体验和时刻的新鲜感,这些App需要时刻保持变化,也给 UI自动化落地实施带 ...

  5. [oeasy]python0067_ESC键进化历史_键盘演化过程_ANSI_控制序列_转义序列_CSI

    光标位置 回忆上次内容 上次了解了 新的转义模式 \33 逃逸控制字符 esc 这个字符让输出退出标准输出流 进行控制信息的设置 可以设置光标输出的位置 ASR33中的ALT MODE 是 今天的ES ...

  6. 使用 useNuxtData 进行高效的数据获取与管理

    title: 使用 useNuxtData 进行高效的数据获取与管理 date: 2024/7/22 updated: 2024/7/22 author: cmdragon excerpt: 深入讲解 ...

  7. WPF控件库

    Welcome to CookPopularControl 介绍 CookPopularControl是支持.NetFramework4.6.1与.Net5.0的WPF控件库,其中参考了一些资料,目前 ...

  8. windows环境xampp搭建php电商项目/搭建禅道

    windows环境xampp搭建php论坛/电商项目 一,首先下载xampp https://www.apachefriends.org/zh_cn/index.html 下载之后解压到E盘或者F盘的 ...

  9. 如何理解IOC中的“反转”和DI中的“注入”

    在理解 IOC 中的"反转"和 DI 中的"注入"之前,首先要理解原本的控制流程. 在传统的应用程序中,对象之间的依赖关系通常由调用方(例如客户端或者上层模块) ...

  10. 【ECharts】01 快速上手

    简单介绍: ECharts 是一个使用 JavaScript 实现的开源可视化库,涵盖各行业图表,满足各种需求. ECharts 遵循 Apache-2.0 开源协议,免费商用. ECharts 兼容 ...