在大数据整个处理流程过程中,数据的流向是一个很重要的问题,本篇博客主要记录数据是怎么从http发送到kafka的。

使用技术点:

1. java的Vert.x框架

  (关于java框架Vert.x的使用示例请移步:http://www.programcreek.com/java-api-examples/index.php?api=io.vertx.core.Vertx)

2. KafkaProducer 的使用

  (使用示例移步:http://www.programcreek.com/java-api-examples/index.php?api=org.apache.kafka.clients.producer.KafkaProducer)

在导数据的过程中需要实现的功能:

1. 解析路径,将路径的最后一个字符串作为appkey;

2. 数据缓存,当kafka无法正常访问时在本地cache目录缓存数据;

3. 安全验证,对请求的appkey进行合法性验证;

4. 自动更新appkey列表,每间隔一段时间获取一次最新的appkey列表;

5. 增加ip字段,给每份数据增加ip字段;

6. 记录日志,记录基本的统计信息日志,及异常错误信息。

框架结构图如下所示:

数据整体计算图如下所示:

计算流程图如下所示:

在IDEA中的代码结构为:

主要代码如下:

Configuration.java

package com.donews.data;

import com.typesafe.config.Config;
import com.typesafe.config.ConfigFactory; /**
* Created by reynold on 16-6-23.
*
*/
public class Configuration {
public static final Config conf= ConfigFactory.load();
}

Counter.java

package com.donews.data;

import io.vertx.core.Vertx;
import io.vertx.core.logging.Logger;
import io.vertx.core.logging.LoggerFactory; import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
/**
* Created by reynold on 16-6-22.
*
*/
public class Counter {
private Logger LOG = LoggerFactory.getLogger(Counter.class);
AtomicLong messages = new AtomicLong(0L);
AtomicLong bytes = new AtomicLong(0L);
private long start = System.currentTimeMillis(); private void reset() {
messages.set(0L);
bytes.set(0L);
start = System.currentTimeMillis();
} public void start(Vertx vertx) {
LOG.info("start Counter");
long delay = Configuration.conf.getDuration("server.counter.delay", TimeUnit.MILLISECONDS);
vertx.setPeriodic(delay, h -> {
long time = System.currentTimeMillis() - start;
double rps = messages.get() * 1000.0 / time;
double mbps = (bytes.get() * 1000.0 / 1024.0 / 1024.0) / time;
Runtime runtime = Runtime.getRuntime();
double totalMem = runtime.totalMemory() * 1.0 / 1024 / 1024;
double maxMem = runtime.maxMemory() * 1.0 / 1024 / 1024;
double freeMem = runtime.freeMemory() * 1.0 / 1024 / 1024;
LOG.info("{0}:Message/S, {1}:MBytes/S", rps, mbps);
LOG.info("totalMem:{0}MB maxMem:{1}MB freeMem:{2}MB", totalMem, maxMem, freeMem);
reset();
});
} }

KafkaHttpServer.java

package com.donews.data;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import io.vertx.core.Vertx;
import io.vertx.core.http.HttpServer;
import io.vertx.core.http.HttpServerResponse;
import io.vertx.core.json.JsonArray;
import io.vertx.core.json.JsonObject;
import io.vertx.core.logging.Logger;
import io.vertx.core.logging.LoggerFactory;
import io.vertx.ext.web.Router;
import io.vertx.ext.web.RoutingContext;
import io.vertx.ext.web.handler.BodyHandler; import java.io.*;
import java.sql.*;
import java.time.Instant;
import java.util.HashSet;
import java.util.Set;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.atomic.AtomicInteger; /**
* Created by reynold on 16-6-22.
*
*/ public class KafkaHttpServer {
private static final Logger LOG = LoggerFactory.getLogger(KafkaHttpServer.class);
private final Counter statistic = new Counter();
private static final String DBDRIVER = "com.mysql.jdbc.Driver";
private static final String URL = Configuration.conf.getString("mysql.url");
private static final String USER = Configuration.conf.getString("mysql.user");
private static final String PASSWORD = Configuration.conf.getString("mysql.password");
private static HashSet<String> appkeys = new HashSet<>();
private static boolean deleteFile = true; private void error(HttpServerResponse response, String message) {
response.setStatusCode(500).end(new JsonObject()
.put("code", 3)
.put("msg", message)
.encode());
} private void ok(HttpServerResponse response, String message) {
response.putHeader("Access-Control-Allow-Origin", "*");
response.setStatusCode(200).end(new JsonObject()
.put("code", 0)
.put("msg", message)
.encode());
} private void startService(int port) {
KafkaProducerWrapper sender = new KafkaProducerWrapper();
Vertx vertx = Vertx.vertx();
HttpServer server = vertx.createHttpServer();
Router router = Router.router(vertx);
router.route().handler(BodyHandler.create());
router.route("/mininfo/logs").handler(ctx -> {
try {
JsonArray array = ctx.getBodyAsJsonArray();
String[] messages = new String[array.size()];
for (int i = 0; i < array.size(); i++) {
JsonObject message = array.getJsonObject(i);
message.put("ip", ctx.request().remoteAddress().host());
if (!message.containsKey("timestamp")) {
message.put("timestamp", Instant.now().toString());
}
messages[i] = array.getJsonObject(i).encode();
}
sendMessages(sender, ctx, "appstatistic_production", messages);
} catch (Exception e) {
error(ctx.response(), e.getMessage());
} });
router.routeWithRegex("/mininfo/v1/logs/[^/]+").handler(routingContext -> {
String path = routingContext.request().path();
String topic = path.substring(path.lastIndexOf("/") + 1);
LOG.info("现在处理的topic(appkey)为:" + topic);
if (appkeys.contains(topic)) {
LOG.info("经过验证,该topic(appkey)有效");
String[] messages = routingContext.getBodyAsString().split("\n");
//用于执行阻塞任务(有序执行和无序执行),默认顺序执行提交的阻塞任务
vertx.executeBlocking(future -> {
sendMessages(sender, routingContext, topic, messages);
future.complete();
}, result -> {
});
} else {
LOG.info("您的topic(appkey)还没有配置,请在mysql中配置先");
error(routingContext.response(), "please configurate " + topic + "(appkey) in Mysql first! After 10mins it`ll take action");
}
});
router.route("/mininfo/v1/ip").handler(ctx -> {
LOG.info("x-real-for" + ctx.request().getHeader("x-real-for"));
LOG.info("x-forwarded-for" + ctx.request().getHeader("x-forwarded-for"));
ok(ctx.response(), ctx.request().getHeader("x-forwarded-for"));
});
router.route("/*").handler(ctx -> error(ctx.response(), "wrong! check your path..."));
server.requestHandler(router::accept).listen(port, result -> {
if (result.succeeded()) {
LOG.info("listen on port:{0}", String.valueOf(port));
this.statistic.start(vertx);
} else {
LOG.error(result.cause());
vertx.close();
}
});
//如果你需要在你的程序关闭前采取什么措施,那么关闭钩子(shutdown hook)是很有用的,类似finally
Runtime.getRuntime().addShutdownHook(new Thread(sender::close));
} private void sendMessages(KafkaProducerWrapper sender, RoutingContext ctx, String topic, String[] messages) {
AtomicInteger counter = new AtomicInteger(0);
for (String message : messages) {
if (message == null || "".equals(message)) {
ok(ctx.response(), "Success");
continue;
}
//将ip增加到数据的ip字段
JSONObject jsonObject = JSON.parseObject(message);
if (jsonObject.get("ip") == null) {
LOG.info("正在增加ip字段");
String ip;
String header = ctx.request().getHeader("x-forwarded-for");
if (!(header == null || header.trim().length() == 0 || header.trim().equals("null"))) {
ip = header.split(",")[0];
} else {
ip = ctx.request().remoteAddress().host();
}
jsonObject.put("ip", ip);
LOG.info("ip增加成功");
}
//topic, message, callback,以匿名函数的形式实现接口中的onCompletion函数
sender.send(topic, jsonObject.toString(), (metadata, exception) -> {
if (exception != null) {
LOG.warn(exception);
String msg = new JsonObject()
.put("error", exception.getMessage())
.put("commit", counter.get())
.encode();
error(ctx.response(), msg);
cacheLocal(jsonObject.toString(), "/home/yuhui/httpkafka/data_bak/" + topic + ".txt");
LOG.info("连接kafka失败,写入cache缓存目录以备份数据");
} else {
statistic.messages.incrementAndGet(); // Counter
statistic.bytes.addAndGet(message.length());
if (counter.incrementAndGet() == messages.length) {
ok(ctx.response(), "Success");
}
}
});
}
} /**
* 将发送到kafka失败的消息缓存到本地
*
* @param message message
* @param cachePath cachePath
*/
private void cacheLocal(String message, String cachePath) {
try {
FileWriter fileWriter = new FileWriter(cachePath, true);
BufferedWriter bw = new BufferedWriter(fileWriter);
bw.write(message);
bw.newLine();
bw.flush();
bw.close();
} catch (IOException e) {
e.printStackTrace();
}
} /**
* 发送缓存数据到kafka,发送成功,删除缓存数据,失败过10分钟重试
*
* @param path 保存缓存数据的[目录]
*/
private static void sendToKafka(String path) {
String message;
KafkaProducerWrapper sender = new KafkaProducerWrapper();
File file = new File(path);
if (file.isDirectory()) {
String[] fileList = file.list();
if (fileList != null && fileList.length != 0) {
LOG.info("正在将缓存目录中的备份数据发送到kafka中...");
for (String str : fileList) {
String topic = str.split("\\.")[0];
try {
BufferedReader reader = new BufferedReader(new FileReader(path + str));
while ((message = reader.readLine()) != null) {
sender.send(topic, message, (metadata, exception) -> {
if (metadata != null) {
LOG.info("缓存的备份数据正在一条一条的插入kafka中");
} else {
//程序错误重新运行
// exception.printStackTrace();
LOG.error("kafka连接异常为:===> 10分钟后会自动重试," + exception.getMessage(), exception);
deleteFile = false;
}
});
}
if (deleteFile) {
LOG.info("开始删除已经插入到kafka中的缓存备份数据");
deleteFile(path, topic);
LOG.info("删除完毕!");
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
} else {
LOG.info("缓存目录中没有备份文件");
}
}
} private static void deleteFile(String path, String appkey) {
String appkeyPath = path + "/" + appkey + ".txt";
File file = new File(appkeyPath);
file.delete();
LOG.info("成功删除appkey为" + appkey + "的缓存数据");
} private static Set<String> getAppkeys() {
Set<String> appkeys = new HashSet<>();
String sql = "select appkey from config_table";
try {
Class.forName(DBDRIVER);
Connection conn = DriverManager.getConnection(URL, USER, PASSWORD);
PreparedStatement ps = conn.prepareStatement(sql);
ResultSet rs = ps.executeQuery();
while (rs.next()) {
appkeys.add(rs.getString(1));
}
rs.close();
conn.close();
} catch (ClassNotFoundException | SQLException e) {
e.printStackTrace();
}
return appkeys;
} public static void main(String[] args) throws Exception {
Timer timer = new Timer();
timer.schedule(new TimerTask() {
@Override
public void run() {
appkeys.addAll(getAppkeys());
LOG.info("同步完数据库中的appkey(每隔十分钟)");
sendToKafka("/home/leixingzhi7/httpkafka/data_bak/");
// sendToKafka("C:\\Dell\\UpdatePackage\\log");
}
}, 0L, 10 * 60 * 1000L); try {
int port = Configuration.conf.getInt("server.port");
KafkaHttpServer front = new KafkaHttpServer();
front.startService(port);
} catch (Exception e) {
e.printStackTrace();
}
}
}

KafkaProducerWrapper.java

package com.donews.data;

import com.typesafe.config.Config;
import io.vertx.core.logging.Logger;
import io.vertx.core.logging.LoggerFactory;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord; import java.util.Properties; /**
* Created by reynold on 16-6-22.
*
*/
public class KafkaProducerWrapper {
private Logger LOG = LoggerFactory.getLogger(KafkaProducerWrapper.class);
private KafkaProducer<String, String> producer = init(); private KafkaProducer<String, String> init() {
Config conf = Configuration.conf.getConfig("kafka");
Properties props = new Properties();
props.put("bootstrap.servers", conf.getString("bootstrap.servers"));
props.put("acks", conf.getString("acks"));
props.put("retries", conf.getInt("retries"));
props.put("batch.size", conf.getInt("batch.size"));
props.put("linger.ms", conf.getInt("linger.ms"));
props.put("buffer.memory", conf.getLong("buffer.memory"));
props.put("key.serializer", conf.getString("key.serializer"));
props.put("value.serializer", conf.getString("value.serializer"));
LOG.info("KafkaProducer Properties: {0}", props.toString());
return new KafkaProducer<>(props);
} public void send(String topic, String message, Callback callback) {
producer.send(new ProducerRecord<>(topic, message), callback);
} public void close() {
producer.close();
LOG.info("Kafka Producer Closed");
} public static void main(String[] args) {
//KafkaProducerWrapper sender=new KafkaProducerWrapper();
//sender.producer.partitionsFor("xxxxx").forEach(System.out::println);
}
}

application.conf

server {
port = 20000
counter.delay = 30s
}
kafka {
bootstrap.servers = "XXX"
acks = all
retries = 1
batch.size = 1048576
linger.ms = 1
buffer.memory = 33554432
key.serializer = "org.apache.kafka.common.serialization.StringSerializer"
value.serializer = "org.apache.kafka.common.serialization.StringSerializer"
}
mysql {
url = "jdbc:mysql://XXX/user_privileges"
user = "XXX"
password = "XXX"
}

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion> <groupId>com.donews.data</groupId>
<artifactId>kafkahttp</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<!-- https://mvnrepository.com/artifact/com.typesafe/config -->
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.0</version>
</dependency> <dependency>
<groupId>io.vertx</groupId>
<artifactId>vertx-web</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.9.0.1</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.11</version>
</dependency>
</dependencies> <build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin> <plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4</version>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<!-- put your configurations here -->
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals> <configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.donews.data.KafkaHttpServer</mainClass>
</transformer>
</transformers>
<outputFile>${project.build.directory}/${project.artifactId}-fat.jar</outputFile>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build> </project>

通过HTTP向kafka发送数据的更多相关文章

  1. 第1节 kafka消息队列:11、kafka的数据不丢失机制,以及kafka-manager监控工具的使用;12、课程总结

    12.kafka如何保证数据的不丢失 12.1生产者如何保证数据的不丢失 kafka的ack机制:在kafka发送数据的时候,每次发送消息都会有一个确认反馈机制,确保消息正常的能够被收到 如果是同步模 ...

  2. 【转】解决Maxwell发送Kafka消息数据倾斜问题

    最近用Maxwell解析MySQL的Binlog,发送到Kafka进行处理,测试的时候发现一个问题,就是Kafka的Offset严重倾斜,三个partition,其中一个的offset已经快200万了 ...

  3. kafka + spark Streaming + Tranquility Server发送数据到druid

    花了很长时间尝试druid官网上说的Tranquility嵌入代码进行实时发送数据到druid,结果失败了,各种各样的原因造成了失败,现在还没有找到原因,在IDEA中可以跑起,放到线上就死活不行,有成 ...

  4. 【原创】开发Kafka通用数据平台中间件

    开发Kafka通用数据平台中间件 (含本次项目全部代码及资源) 目录: 一. Kafka概述 二. Kafka启动命令 三.我们为什么使用Kafka 四. Kafka数据平台中间件设计及代码解析 五. ...

  5. Kafka消费者-从Kafka读取数据

    (1)Customer和Customer Group (1)两种常用的消息模型 队列模型(queuing)和发布-订阅模型(publish-subscribe). 队列的处理方式是一组消费者从服务器读 ...

  6. Kafka权威指南 读书笔记之(四)Kafka 消费者一一从 Kafka读取数据

    KafkaConsumer概念 消费者和消费者群组 Kafka 消费者从属于消费者群组.一个群组里的消费者订阅的是同一个主题,每个消费者接收主题一部分分区的消息. 往群组里增加消费者是横向伸缩消费能力 ...

  7. Kafka权威指南 读书笔记之(三)Kafka 生产者一一向 Kafka 写入数据

    不管是把 Kafka 作为消息队列.消息总线还是数据存储平台来使用 ,总是需要有一个可以往 Kafka 写入数据的生产者和一个从 Kafka 读取数据的消费者,或者一个兼具两种角色的应用程序. 开发者 ...

  8. 物联网架构成长之路(8)-EMQ-Hook了解、连接Kafka发送消息

    1. 前言 按照我自己设计的物联网框架,对于MQTT集群中的所有消息,是要持久化到磁盘的,这里采用一个消息队列中间件Kafka作为数据缓冲,缓冲结果存到数据仓库中,以供后续作为数据分析.由于MQTT集 ...

  9. kafka重复数据问题排查记录

    问题 向kafka写数据,然后读kafka数据,生产的数据量和消费的数据量对不上. 开始怀疑人生,以前奠定的基础受到挑战... 原来的测试为什么没有覆盖生产量和消费量的对比? 消费者写的有问题?反复检 ...

随机推荐

  1. Django框架(十一):模板介绍、模板语言、模板继承、HTML转义

    1. 模板介绍 1.1 模板的功能 产生html,控制页面上展示的内容.模板文件不仅仅是一个html文件. 模板文件包含两部分内容: 静态内容:css.js.html. 动态内容:用于动态去产生一些页 ...

  2. c# 异常处理 try --catch

    初学 try---catch 语法 try { 可能会出现异常的代码; 异常出现的那行代码下面的代码全不会执行,直接跳到catch中执行 ... ... } //try和catch之间不能有其他的代码 ...

  3. UML-从需求到设计--迭代进化

    按照UP原则,初始阶段做10%的需求,而细化阶段开始为这10%的需求设计解决方案.后续章节介绍如何设计.

  4. ZJNU 1426 - YNingC的困惑

    注意到N最大只有1e6,但是xy最大有2e8,直接模拟2e8会超时 所以可以将1e6个区间离散化后模拟,模拟时的最坏情况为2e6满足题意 /* Written By StelaYuri */ #inc ...

  5. Spring-Boot 访问外部接口的几种方案总结

    一.简介 在Spring-Boot项目开发中,存在着本模块的代码需要访问外面模块接口,或外部url链接的需求,针对这一需求目前存在着三种解决方案,下面将对这三种方案进行整理和说明. 二.Spring- ...

  6. 1017A.The Rank#排名

    题目出处:http://codeforces.com/problemset/problem/1017/A #include<iostream> using namespace std; i ...

  7. JS事件高级

    1. 注册事件(绑定事件) 1.1注册事件概述 1.2 addEventListener 事件监听方式 1.3 attachEvent 事件监听方式 1.4 注册事件兼容性解决方案 2. 删除事件(解 ...

  8. Python3.7离线安装Requests无法正常使用问题

    继续搬砖...... 春节前,克服了网络受限的情况下离线安装Python库文件问题,传送门如下: https://www.cnblogs.com/mrgavin/p/12202214.html htt ...

  9. vim下看C++代码

    看C++代码, 缺少合适的编辑器,捣鼓vim. 安装Vundle, 用于插件管理 git clone https://github.com/VundleVim/Vundle.vim.git ~/.vi ...

  10. Kubernetes系列三:二进制安装Kubernetes环境

    安装环境: # 三个节点信息 192.168.31.11 主机名:env11 角色:部署Master节点/Node节点/ETCD节点 192.168.31.12 主机名:env12 角色:部署Node ...