进阶功能_Logstash_数据采集_用户指南_日志服务-阿里云 https://help.aliyun.com/document_detail/49025.html

Logstash Reference [6.4] | Elastic https://www.elastic.co/guide/en/logstash/current/index.html

https://opensource.com/article/17/10/logstash-fundamentals

No longer a simple log-processing pipeline, Logstash has evolved into a powerful and versatile data processing tool. Here are basics to get you started.

Logstash, an open source tool released by Elastic, is designed to ingest and transform data. It was originally built to be a log-processing pipeline to ingest logging data into ElasticSearch. Several versions later, it can do much more.

At its core, Logstash is a form of Extract-Transform-Load (ETL) pipeline. Unstructured log data is extracted, filters transform it, and the results are loaded into some form of data store.

Logstash can take a line of text like this syslog example:

Sep 11 14:13:38 vorthys sshd[16998]: Received disconnect from 192.0.2.11 port 53730:11: disconnected by user

and transform it into a much richer datastructure:

 {
  "timestamp": "1505157218000",
  "host": "vorthys",
  "program": "sshd",
  "pid": "16998",
  "message": "Received disconnect from 192.0.2.11 port 53730:11: disconnected by user",
  "sshd_action": "disconnect",
  "sshd_tuple": "192.0.2.11:513730"
}
Depending on what you are using for your backing store, you can find events like this by using indexed fields rather than grepping terabytes of text. If you're generating tens to hundreds of gigabytes of logs a day, that matters.
 

Internal architecture

Logstash has a three-stage pipeline implemented in JRuby:

The input stage plugins extract data. This can be from logfiles, a TCP or UDP listener, one of several protocol-specific plugins such as syslog or IRC, or even queuing systems such as Redis, AQMP, or Kafka. This stage tags incoming events with metadata surrounding where the events came from.

The filter stage plugins transform and enrich the data. This is the stage that produces the sshd_action and sshd_tuple fields in the example above. This is where you'll find most of Logstash's value.

The output stage plugins load the processed events into something else, such as ElasticSearch or another document-database, or a queuing system such as Redis, AQMP, or Kafka. It can also be configured to communicate with an API. It is also possible to hook up something like PagerDuty to your Logstash outputs.

Have a cron job that checks if your backups completed successfully? It can issue an alarm in the logging stream. This is picked up by an input, and a filter config set up to catch those events marks it up, allowing a conditional output to know this event is for it. This is how you can add alarms to scripts that would otherwise need to create their own notification layers, or that operate on systems that aren't allowed to communicate with the outside world.

Threads

In general, each input runs in its own thread. The filter and output stages are more complicated. In Logstash 1.5 through 2.1, the filter stage had a configurable number of threads, with the output stage occupying a single thread. That changed in Logstash 2.2, when the filter-stage threads were built to handle the output stage. With one fewer internal queue to keep track of, throughput improved with Logstash 2.2.

If you're running an older version, it's worth upgrading to at least 2.2. When we moved from 1.5 to 2.2, we saw a 20-25% increase in overall throughput. Logstash also spent less time in wait states, so we used more of the CPU (47% vs 75%).

Configuring the pipeline

Logstash can take a single file or a directory for its configuration. If a directory is given, it reads the files in lexical order. This is important, as ordering is significant for filter plugins (we'll discuss that in more detail later).

Here is a bare Logstash config file:

input { }
filter { }
output { }

Each of these will contain zero or more plugin configurations, and there can be multiple blocks.

Input config

An input section can look like this:

 input {
  syslog {
    port => 514
    type => “syslog_server”
  }
}
 

This tells Logstash to open the syslog { } plugin on port 514 and will set the document type for each event coming in through that plugin to be syslog_server. This plugin follows RFC 3164 only, not the newer RFC 5424.

Here is a slightly more complex input block:

# Pull in syslog data
input {
  file {
    path => [
      "/var/log/syslog",
      "/var/log/auth.log"
    ]
    type => "syslog"
  }
}

# Pull in application-log data. They emit data in JSON form.
input {
  file {
    path => [
      "/var/log/app/worker_info.log",
      "/var/log/app/broker_info.log",
      "/var/log/app/supervisor.log"
    ]
    exclude => "*.gz"
    type    => "applog"
    codec   => "json"
  }
}

This one uses two different input { } blocks to call different invocations of the file { } plugin: One tracks system-level logs, the other tracks application-level logs. By using two different input { } blocks, a Java thread is spawned for each one. For a multi-core system, different cores keep track of the configured files; if one thread blocks, the other will continue to function.

Both of these file { } blocks could be put into the same input { } block; they would simply run in the same thread—Logstash doesn't really care.

Filter config

The filter section is where you transform your data into something that's newer and easier to work with. Filters can get quite complex. Here are a few examples of filters that accomplish different goals:

filter {
  if [program] == "metrics_fetcher" {
    mutate {
      add_tag => [ 'metrics' ]
    }
  }
}

In this example, if the program field, populated by the syslog plugin in the example input at the top, reads metrics_fetcher, then it tags the event metrics. This tag could be used in a later filter plugin to further enrich the data.

filter {
  if "metrics" in [tags] {
    kv {
      source => "message"
      target => "metrics"
    }
  }
}

This one runs only if metrics is in the list of tags. It then uses the kv { } plugin to populate a new set of fields based on the key=value pairs in the message field. These new keys are placed as sub-fields of the metrics field, allowing the text pages_per_second=42 faults=0 to become metrics.pages_per_second = 42 and metrics.faults = 0 on the event.

Why wouldn't you just put this in the same conditional that set the tag value? Because there are multiple ways an event could get the metrics tag—this way, the kv filter will handle them all.

Because the filters are ordered, being sure that the filter plugin that defines the metrics tag is run before the conditional that checks for it is important. Here are guidelines to ensure your filter sections are optimally ordered:
  1. Your early filters should apply as much metadata as possible.
  2. Using the metadata, perform detailed parsing of events.
  3. In your late filters, regularize your data to reduce problems downstream.
    • Ensure field data types get cast to a unified value. priority could be boolean, integer, or string.

      • Some systems, including ElasticSearch, will quietly convert types for you. Sending strings into a boolean field won't give you the results you want.
      • Other systems will reject a value outright if it isn't in the right data type.
    • The mutate { } plugin is helpful here, as it has methods to coerce fields into specific data types.

Here are useful plugins to extract fields from long strings:

  • date: Many logging systems emit a timestamp. This plugin parses that timestamp and sets the timestamp of the event to be that embedded time. By default, the timestamp of the event is when it was ingested, which could be seconds, hours, or even days later.
  • kv: As previously demonstrated, it can turn strings like backup_state=failed progress=0.24 into fields you can perform operations on.
  • csv: When given a list of columns to expect, it can create fields on the event based on comma-separated values.
  • json: If a field is formatted in JSON, this will turn it into fields. Very powerful!
  • xml: Like the JSON plugin, this will turn a field containing XML data into new fields.
  • grok: This is your regex engine. If you need to translate strings like The accounting backup failed into something that will pass if [backup_status] == 'failed', this will do it.

Output config

Elastic would like you to send it all into ElasticSearch, but anything that can accept a JSON document, or the datastructure it represents, can be an output. Keep in mind that events can be sent to multiple outputs. Consider this example of metrics:

output {
  # Send to the local ElasticSearch port, and rotate the index daily.
  elasticsearch {
    hosts => [
      "localhost",
      "logelastic.prod.internal"
    ]
    template_name => "logstash"
    index         => "logstash-{+YYYY.MM.dd}"
  }

if "metrics" in [tags] {
    influxdb {
      host         => "influx.prod.internal"
      db           => "logstash"
      measurement  => "appstats"
      # This next bit only works because it is already a hash.
      data_points  => "%{metrics}"
      send_as_tags => [ 'environment', 'application' ]
    }
  }
}

Remember the metrics example above? This is how we can output it. The events tagged metrics will get sent to ElasticSearch in their full event form. In addition, the subfields under the metrics field on that event will be sent to influxdb, in the logstash database, under the appstats measurement. Along with the measurements, the values of the environment and application fields will be submitted as indexed tags.

There are a great many outputs. Here are some grouped by type:

  • API enpoints: Jira, PagerDuty, Rackspace, Redmine, Zabbix
  • Queues: Redis, Rabbit, Kafka, SQS
  • Messaging Platforms: IRC, XMPP, HipChat, email, IMAP
  • Document Databases: ElasticSearch, MongoDB, Solr
  • Metrics: OpenTSDB, InfluxDB, Nagios, Graphite, StatsD
  • Files and other static artifacts: File, CSV, S3

There are many more output plugins.

A codec is a special piece of the Logstash configuration. We saw one used on the file {} example above.

# Pull in application-log data. They emit data in JSON form.
input {
  file {
    path => [
      "/var/log/app/worker_info.log",
      "/var/log/app/broker_info.log",
      "/var/log/app/supervisor.log"
    ]
    exclude => "*.gz"
    type    => "applog"
    codec   => "json"
  }
}

In this case, the file plugin was configured to use the json codec. This tells the file plugin to expect a complete JSON data structure on every line in the file. If your logs can be emitted in a structure like this, your filter stage will be much shorter than it would if you had to grok, kv, and csv your way into enrichment.

The json_lines codec is different in that it will separate events based on newlines in the feed. This is most useful when using something like the tcp { } input, when the connecting program streams JSON documents without re-establishing the connection each time.

The multiline codec gets a special mention. As the name suggests, this is a codec you can put on an input to reassemble a multi-line event, such as a Java stack dump, into a single event.

input {
  file {
    path => '/var/log/stackdumps.log'
    type => 'stackdumps'
    codec => multiline {
      pattern => "^\s"
      what    => previous
    }
  }
}

This codec tells the file plugin to treat any log line that starts with white space as belonging to the previous line. It will be appended to the message field with a new line and the contents of the log line. Once it hits a log line that doesn't start with white space, it will close the event and submit it to the filter stage.

Warning: Due to the highly distributed nature of Logstash, the multiline codec needs to be run as close to the log source as possible. If it reads the file directly, that's perfect. If the events are coming through another system, such as a centralized syslog system, reassembly into a single event will be more challenging.

Architectures

Logstash can scale from all-in-one boxes up to gigantic infrastructures that require complex event routing before events are processed to satisfy different business owners.

In this example, Logstash is running on each of the four application boxes. Each independent config sends processed events to a centralized ElasticSearch cluster. This can scale quite far, but it means your log-processing resources are competing with your application resources.

This example shows an existing centralized logging infrastructure based on Syslog that we are adding onto. Here, Logstash is installed on the centralized logging box and configured to consume the file output of rsyslog. The processed results are then sent into ElasticSearch.

Further reading

Learn more in Jamie Riedesel's talk, S, M, and L Logstash Architectures: The Foundations, at LISA17, which will be held October 29-November 3 in San Francisco, California.

 
 

Logstash Reference Getting started with Logstash的更多相关文章

  1. 构建Logstash+tomcat镜像(让logstash收集tomcat日志)

    1.首先pull logstash镜像作为父镜像(logstash的Dockerfile在最下面): 2.构建my-logstash镜像,使其在docker镜像实例化时,可以使用自定义的logstas ...

  2. logstash快速入门实战指南-Logstash简介

    作者其他ELK快速入门系列文章 Elasticsearch从入门到精通 Kibana从入门到精通 Logstash是一个具有实时流水线功能的开源数据收集引擎.Logstash可以动态统一来自不同来源的 ...

  3. logstash日志分析的配置和使用

    logstash是一个数据分析软件,主要目的是分析log日志.整一套软件可以当作一个MVC模型,logstash是controller层,Elasticsearch是一个model层,kibana是v ...

  4. 使用logstash+elasticsearch+kibana快速搭建日志平台

    日志的分析和监控在系统开发中占非常重要的地位,系统越复杂,日志的分析和监控就越重要,常见的需求有: * 根据关键字查询日志详情 * 监控系统的运行状况 * 统计分析,比如接口的调用次数.执行时间.成功 ...

  5. 【原创】运维基础之Docker(2)通过docker部署zookeeper nginx tomcat redis kibana/elasticsearch/logstash mysql kafka mesos/marathon

    通过docker可以从头开始构建集群,也可以将现有集群(配置以及数据)平滑的迁移到docker部署: 1 docker部署zookeeper # usermod -G docker zookeeper ...

  6. Filebeat+Logstash+ElasticSearch+Kibana搭建Apache访问日志解析平台

    对于ELK还不太熟悉的同学可以参考我前面的两篇文章ElasticSearch + Logstash + Kibana 搭建笔记.Log stash学习笔记(一),本文搭建了一套专门访问Apache的访 ...

  7. logstash日志分析的配置和使用(转)

    logstash是一个数据分析软件,主要目的是分析log日志.整一套软件可以当作一个MVC模型,logstash是controller层,Elasticsearch是一个model层,kibana是v ...

  8. logstash 主题综合篇

    一.[logstash-input-file]插件使用详解(配置) logstash input 监听多个目标文件. 二.Logstash Reference(官方参数配置说明)

  9. Ubuntu 16.04安装Elasticsearch,Logstash和Kibana(ELK)Filebeat

    https://www.howtoing.com/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-ubuntu-16-04 ...

随机推荐

  1. import { Subject } from 'rxjs/Subject';

    shared-service.ts import { Observable } from 'rxjs/Observable'; import { Injectable } from '@angular ...

  2. atitit.系统架构图 的设计 与工具 attilax总结

    atitit.系统架构图 的设计 与工具 attilax总结 1. 架构图的4个版式(标准,(左右)悬挂1 2. 架构图的层次结构(下属,同事,助手)1 3. wps ppt1 4. 使用EDraw画 ...

  3. binutils工具集之---objdump

    在嵌入式软件开发中,有时需要知道所生成的程序文件中的段信息以分析问题,或者需要查看c语言对应的汇编代码,此时,objdump工具就可以帮大忙了.obj——object  dump:转储. #inclu ...

  4. Android之2D图形(圆、直线、点)工具类 (持续更新)

    public class Circle { private PointF centerPoint; private float radius; public PointF getCenterPoint ...

  5. Unix系统编程(三)通用的I/O

    UNIX  IO模型的显著特点之一是输出输出概念的通用性,这意味着,4个同样的系统调用open,read,write和close可以对所有类型的文件执行IO操作,包括终端之类的设备.因此仅用这些系统调 ...

  6. Linux制作wifi热点/无线路由

    参考: http://blog.csdn.net/u011641885/article/details/495121991.工具/原料    有无线网卡(usb接口的RT3070无线网卡).有线网卡的 ...

  7. 类加载器与Web容器

    在关于类加载器中已经介绍了Jvm的类加载机制,然而对于运行在Java EE容器中的Web应用来说,类加载器的实现方式与一般的Java应用有所不同.不同的Web容器的实现方式也会有所不同. Tomcat ...

  8. mybatis实现分页

    实现分页实质上就是截取查询结果,还是先贴代码,再来分析 下面是mapper.xml里面的配置 <select id="queryRoleByPage" resultMap=& ...

  9. sql语句中3表删除和3表查询

    好久没来咱们博客园了,主要近期在忙一些七七八八的杂事,包括打羽毛球比赛的准备和自己在学jqgrid的迷茫.先不扯这些没用的了,希望大家能记得小弟,小弟在此谢过大家了. 回归正题:(以下的sql是本人在 ...

  10. 修改一些IntelliJ IDEA 11的设置,使Eclipse的使用者更容易上手(转)

    用惯了Eclipse,再来使用IntelliJ IDEA真是很难适应. 设置1:字体 修改IDE的字体:设置-Appearance-Look and Feel-OverRide设置你想要的字体.我设置 ...