网站行为跟踪 Website Activity Tracking

访客信息处理

Log Aggregation   日志聚合

Apache Kafka http://kafka.apache.org/uses

In comparison to log-centric systems like Scribe or Flume    Scribe or Flume 是以日志处理为中心

Use cases

Here is a description of a few of the popular use cases for Apache Kafka®. For an overview of a number of these areas in action, see this blog post.

Messaging

Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications.

In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong durability guarantees Kafka provides.

In this domain Kafka is comparable to traditional messaging systems such as ActiveMQ or RabbitMQ.

Website Activity Tracking

The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.

Activity tracking is often very high volume as many activity messages are generated for each user page view.

Metrics

Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.

Log Aggregation

Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.

Stream Processing

Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing. For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic; further processing might normalize or deduplicate this content and published the cleansed article content to a new topic; a final processing stage might attempt to recommend this content to users. Such processing pipelines create graphs of real-time data flows based on the individual topics. Starting in 0.10.0.0, a light-weight but powerful stream processing library called Kafka Streams is available in Apache Kafka to perform such data processing as described above. Apart from Kafka Streams, alternative open source stream processing tools include Apache Storm and Apache Samza.

Event Sourcing

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.

Commit Log

Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The log compaction feature in Kafka helps support this usage. In this usage Kafka is similar to Apache BookKeeper project.

网站行为跟踪 Website Activity Tracking Log Aggregation 日志聚合 In comparison to log-centric systems like Scribe or Flume的更多相关文章

  1. 1.2 Use Cases中 Website Activity Tracking官网剖析(博主推荐)

    不多说,直接上干货! 一切来源于官网 http://kafka.apache.org/documentation/ Website Activity Tracking 网站活动追踪 The origi ...

  2. 1.2 Use Cases中 Log Aggregation官网剖析(博主推荐)

    不多说,直接上干货! 一切来源于官网 http://kafka.apache.org/documentation/ Log Aggregation 日志聚合 Many people use Kafka ...

  3. /VAR/LOG/各个日志文件分析

     /VAR/LOG/各个日志文件分析 author:headsen  chen    2017-10-24   18:00:24 部分内容取自网上搜索,部分内容为自己整理的,特此声明. 1.   /v ...

  4. 超酷的实时颜色数据跟踪javascript类库 - Tracking.js

    来源:GBin1.com 今天介绍这款超棒的Javascript类库是 - Tracking.js,它能够独立不依赖第三方类库帮助开发人员动态跟踪摄像头输出相关数据. 这些数据包括了颜色或者是人, 这 ...

  5. SQL Server 更改跟踪(Chang Tracking)监控表数据

    一.本文所涉及的内容(Contents) 本文所涉及的内容(Contents) 背景(Contexts) 主要区别与对比(Compare) 实现监控表数据步骤(Process) 参考文献(Refere ...

  6. 【转载,备忘】SQL Server 更改跟踪(Chang Tracking)监控表数据

    一.本文所涉及的内容(Contents) 本文所涉及的内容(Contents) 背景(Contexts) 主要区别与对比(Compare) 实现监控表数据步骤(Process) 参考文献(Refere ...

  7. /var/log各种日志

    文章为装载 1)/var/log/secure:记录登录系统存取数据的文件;例如:pop3,ssh,telnet,ftp等都会记录在此. 2)/ar/log/btmp:记录登录这的信息记录,被编码过, ...

  8. logback的使用和logback.xml详解,在Spring项目中使用log打印日志

    logback的使用和logback.xml详解 一.logback的介绍 Logback是由log4j创始人设计的另一个开源日志组件,官方网站: http://logback.qos.ch.它当前分 ...

  9. SharePoint ULS Log Viewer 日志查看器

    SharePoint ULS Log Viewer 日志查看器 项目描写叙述 这是一个Windows应用程序,更加轻松方便查看SharePoint ULS日志文件.支持筛选和简单的视图. 信息 这是一 ...

随机推荐

  1. 分布式协调服务ZooKeeper工作原理

    分布式协调服务ZooKeeper工作原理 原创 2016-02-19 杜亦舒 性能与架构 性能与架构 性能与架构 微信号 yogoup 功能介绍 网站性能提升与架构设计 大数据处理框架Hadoop.R ...

  2. Memcache应用场景介绍,说明[zz]

    转于:http://www.cnblogs.com/literoad/archive/2012/12/23/2830178.html 面临的问题 对于高并发高访问的 Web应用程序来说,数据库存取瓶颈 ...

  3. C#将URL中的参数转换成字典Dictionary<string, string>

    /// <summary> /// 将获取的formData存入字典数组 /// </summary> public static Dictionary<String, ...

  4. 浅谈 Objective-C 下对象的初始化

    转自:http://www.oschina.net/question/54100_32468 众所周知,Objective-C是一门面向对象的语言,一般情况下,我们在Objective-C中定义一个类 ...

  5. APK反编译之一:基础知识

    作者:lpohvbe | http://blog.csdn.net/lpohvbe/article/details/7981386 这部分涉及的内容比较多,我会尽量从最基础开始说起,但需要读者一定的a ...

  6. hadoop单节点配置

    首先按照官网的单机去配置,如果官网不行的话可以参考一下配置,这个是配置成功过的.但是不一定每次都成功 http://hadoop.apache.org/docs/r2.6.5/ centos 6.7 ...

  7. 查看网络连接数目(解决TIME_WAIT过多造成的问题_转)

      转自:解决TIME_WAIT过多造成的问题 (eroswang的csdn) #netstat -n | awk '/^tcp/ {++S[$NF]} END { for(a in S) print ...

  8. Xcode 调试方法总结

    前言:编写代码过程中出现错误.异常是不可避免的.通常我们都需要进行大量的调试去寻找.解决问题.这时,熟练掌握调试技巧将很大程度上的提高工作效率.接下来就说说开发过程中Xcode的调试方法. 1. En ...

  9. oracle11g卸载,10g之类版本通用

    鉴于oracle一些比较稀奇的问题,本人没碰到的不能帮忙解决的.而且比较着急赶时间的亲们,我就只能推荐先卸载在安装的办法了,这个方法一般用时也就1个小时到1个半小时之间,切记按步骤删除,别漏删了,不然 ...

  10. ChemDraw绘制DNA结构的技巧

    对生物有一定了解的朋友都知道DNA是染色体的重要组成部分,DNA结构中包含重要的遗传物质,孩子的DNA来自父母DNA的组合,这就是为什么“一家人相像”的奥秘所在.ChemDraw虽然号称是化学结构绘制 ...