Flink 异步IO访问外部数据(mysql篇)
最近看了大佬的博客,突然想起Async I/O方式是Blink 推给社区的一大重要功能,可以使用异步的方式获取外部数据,想着自己实现以下,项目上用的时候,可以不用现去找了。
最开始想用scala 实现一个读取 hbase数据的demo,参照官网demo:
/**
* An implementation of the 'AsyncFunction' that sends requests and sets the callback.
*/
class AsyncDatabaseRequest extends AsyncFunction[String, (String, String)] { /** The database specific client that can issue concurrent requests with callbacks */
lazy val client: DatabaseClient = new DatabaseClient(host, post, credentials) /** The context used for the future callbacks */
implicit lazy val executor: ExecutionContext = ExecutionContext.fromExecutor(Executors.directExecutor()) override def asyncInvoke(str: String, resultFuture: ResultFuture[(String, String)]): Unit = { // issue the asynchronous request, receive a future for the result
val resultFutureRequested: Future[String] = client.query(str) // set the callback to be executed once the request by the client is complete
// the callback simply forwards the result to the result future
resultFutureRequested.onSuccess {
case result: String => resultFuture.complete(Iterable((str, result)))
}
}
} // create the original stream
val stream: DataStream[String] = ... // apply the async I/O transformation
val resultStream: DataStream[(String, String)] =
AsyncDataStream.unorderedWait(stream, new AsyncDatabaseRequest(), 1000, TimeUnit.MILLISECONDS, 100)
失败了,上图标红的部分实现不了
1、Future 找不到可以用的实现类
2、unorderedWait 一直报错
源码example 里面也有Scala 的案例
def main(args: Array[String]) {
val timeout = 10000L val env = StreamExecutionEnvironment.getExecutionEnvironment val input = env.addSource(new SimpleSource()) val asyncMapped = AsyncDataStream.orderedWait(input, timeout, TimeUnit.MILLISECONDS, 10) {
(input, collector: ResultFuture[Int]) =>
Future {
collector.complete(Seq(input))
} (ExecutionContext.global)
} asyncMapped.print() env.execute("Async I/O job")
}
主要部分是这样的,菜鸡表示无力,想继承RichAsyncFunction,可以使用open 方法初始化链接。
网上博客翻了不少,大部分是翻译官网的原理,案例也没有可以执行的,苦恼。
失败了。
转为java版本的,昨天在群里问,有个大佬给我个Java版本的: https://github.com/perkinls/flink-local-train/blob/c8b4efe33620352aea0100adef4fae2a068a3b65/src/main/scala/com/lp/test/asyncio/AsyncIoSideTableJoinMysqlJava.java 还没看过,因为Java版的官网的案例能看懂。
下面开始上mysql 版本 的 源码(hbase 的还没测试过,本机的hbase 挂了):
业务如下:
接收kafka数据,转为user对象,调用async,使用user.id 查询对应的phone,放回user对象,输出
主类:
import com.alibaba.fastjson.JSON;
import com.venn.common.Common;
import org.apache.flink.formats.json.JsonNodeDeserializationSchema;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode;
import org.apache.flink.streaming.api.datastream.AsyncDataStream;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import java.util.concurrent.TimeUnit; public class AsyncMysqlRequest { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
FlinkKafkaConsumer<ObjectNode> source = new FlinkKafkaConsumer<>("async", new JsonNodeDeserializationSchema(), Common.getProp()); // 接收kafka数据,转为User 对象
DataStream<User> input = env.addSource(source).map(value -> {
String id = value.get("id").asText();
String username = value.get("username").asText();
String password = value.get("password").asText(); return new User(id, username, password);
});
// 异步IO 获取mysql数据, timeout 时间 1s,容量 10(超过10个请求,会反压上游节点)
DataStream async = AsyncDataStream.unorderedWait(input, new AsyncFunctionForMysqlJava(), 1000, TimeUnit.MICROSECONDS, 10); async.map(user -> { return JSON.toJSON(user).toString();
})
.print(); env.execute("asyncForMysql"); }
}
函数类:
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.async.ResultFuture;
import org.apache.flink.streaming.api.functions.async.RichAsyncFunction;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.*; public class AsyncFunctionForMysqlJava extends RichAsyncFunction<AsyncUser, AsyncUser> { Logger logger = LoggerFactory.getLogger(AsyncFunctionForMysqlJava.class);
private transient MysqlClient client;
private transient ExecutorService executorService; /**
* open 方法中初始化链接
*
* @param parameters
* @throws Exception
*/
@Override
public void open(Configuration parameters) throws Exception {
logger.info("async function for mysql java open ...");
super.open(parameters); client = new MysqlClient();
executorService = Executors.newFixedThreadPool(30);
} /**
* use asyncUser.getId async get asyncUser phone
*
* @param asyncUser
* @param resultFuture
* @throws Exception
*/
@Override
public void asyncInvoke(AsyncUser asyncUser, ResultFuture<AsyncUser> resultFuture) throws Exception { executorService.submit(() -> {
// submit query
System.out.println("submit query : " + asyncUser.getId() + "-1-" + System.currentTimeMillis());
AsyncUser tmp = client.query1(asyncUser);
// 一定要记得放回 resultFuture,不然数据全部是timeout 的
resultFuture.complete(Collections.singletonList(tmp));
});
} @Override
public void timeout(AsyncUser input, ResultFuture<AsyncUser> resultFuture) throws Exception {
logger.warn("Async function for hbase timeout");
List<AsyncUser> list = new ArrayList();
input.setPhone("timeout");
list.add(input);
resultFuture.complete(list);
} /**
* close function
*
* @throws Exception
*/
@Override
public void close() throws Exception {
logger.info("async function for mysql java close ...");
super.close();
}
}
MysqlClient:
import com.venn.flink.util.MathUtil;
import org.apache.flink.shaded.netty4.io.netty.channel.DefaultEventLoop;
import org.apache.flink.shaded.netty4.io.netty.util.concurrent.Future;
import org.apache.flink.shaded.netty4.io.netty.util.concurrent.SucceededFuture; import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException; public class MysqlClient { private static String jdbcUrl = "jdbc:mysql://192.168.229.128:3306?useSSL=false&allowPublicKeyRetrieval=true";
private static String username = "root";
private static String password = "123456";
private static String driverName = "com.mysql.jdbc.Driver";
private static java.sql.Connection conn;
private static PreparedStatement ps; static {
try {
Class.forName(driverName);
conn = DriverManager.getConnection(jdbcUrl, username, password);
ps = conn.prepareStatement("select phone from async.async_test where id = ?");
} catch (ClassNotFoundException | SQLException e) {
e.printStackTrace();
}
} /**
* execute query
*
* @param user
* @return
*/
public AsyncUser query1(AsyncUser user) { try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
} String phone = "0000";
try {
ps.setString(1, user.getId());
ResultSet rs = ps.executeQuery();
if (!rs.isClosed() && rs.next()) {
phone = rs.getString(1);
}
System.out.println("execute query : " + user.getId() + "-2-" + "phone : " + phone + "-" + System.currentTimeMillis());
} catch (SQLException e) {
e.printStackTrace();
}
user.setPhone(phone);
return user; } // 测试代码
public static void main(String[] args) {
MysqlClient mysqlClient = new MysqlClient(); AsyncUser asyncUser = new AsyncUser();
asyncUser.setId("526");
long start = System.currentTimeMillis();
asyncUser = mysqlClient.query1(asyncUser); System.out.println("end : " + (System.currentTimeMillis() - start));
System.out.println(asyncUser.toString());
}
}
函数类(错误示范:asyncInvoke 方法中阻塞查询数据库,是同步的):
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.async.ResultFuture;
import org.apache.flink.streaming.api.functions.async.RichAsyncFunction;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.List; public class AsyncFunctionForMysqlJava extends RichAsyncFunction<User, User> { // 链接
private static String jdbcUrl = "jdbc:mysql://192.168.229.128:3306?useSSL=false";
private static String username = "root";
private static String password = "123456";
private static String driverName = "com.mysql.jdbc.Driver"; java.sql.Connection conn;
PreparedStatement ps;
Logger logger = LoggerFactory.getLogger(AsyncFunctionForMysqlJava.class); /**
* open 方法中初始化链接
* @param parameters
* @throws Exception
*/
@Override
public void open(Configuration parameters) throws Exception {
logger.info("async function for hbase java open ...");
super.open(parameters); Class.forName(driverName);
conn = DriverManager.getConnection(jdbcUrl, username, password);
ps = conn.prepareStatement("select phone from async.async_test where id = ?");
} /**
* use user.getId async get user phone
*
* @param user
* @param resultFuture
* @throws Exception
*/
@Override
public void asyncInvoke(User user, ResultFuture<User> resultFuture) throws Exception {
// 使用 user id 查询
ps.setString(1, user.getId());
ResultSet rs = ps.executeQuery();
String phone = null;
if (rs.next()) {
phone = rs.getString(1);
}
user.setPhone(phone);
List<User> list = new ArrayList();
list.add(user);
// 放回 result 队列
resultFuture.complete(list);
} @Override
public void timeout(User input, ResultFuture<User> resultFuture) throws Exception {
logger.info("Async function for hbase timeout");
List<User> list = new ArrayList();
list.add(input);
resultFuture.complete(list);
} /**
* close function
*
* @throws Exception
*/
@Override
public void close() throws Exception {
logger.info("async function for hbase java close ...");
super.close();
conn.close();
}
}
测试数据如下:
{"id" : 1, "username" : "venn", "password" : 1561709530935}
{"id" : 2, "username" : "venn", "password" : 1561709536029}
{"id" : 3, "username" : "venn", "password" : 1561709541033}
{"id" : 4, "username" : "venn", "password" : 1561709546037}
{"id" : 5, "username" : "venn", "password" : 1561709551040}
{"id" : 6, "username" : "venn", "password" : 1561709556044}
{"id" : 7, "username" : "venn", "password" : 1561709561048}
执行结果如下:
submit query : 1-1-1562763486845
submit query : 2-1-1562763486846
submit query : 3-1-1562763486846
submit query : 4-1-1562763486849
submit query : 5-1-1562763486849
submit query : 6-1-1562763486859
submit query : 7-1-1562763486913
submit query : 8-1-1562763486967
submit query : 9-1-1562763487021
execute query : 1-2-phone : 12345678910-1562763487316
1> {"password":"1562763486506","phone":"12345678910","id":"1","username":"venn"}
submit query : 10-1-1562763487408
submit query : 11-1-1562763487408
execute query : 9-2-phone : 1562661110630-1562763487633
1> {"password":"1562763487017","phone":"1562661110630","id":"9","username":"venn"} # 这里可以看到异步,提交查询的到 11 了,执行查询 的只有 1/9,返回了 1/9(unorderedWait 调用)
submit query : 12-1-1562763487634
execute query : 8-2-phone : 1562661110627-1562763487932
1> {"password":"1562763486963","phone":"1562661110627","id":"8","username":"venn"}
submit query : 13-1-1562763487933
execute query : 7-2-phone : 1562661110624-1562763488228
1> {"password":"1562763486909","phone":"1562661110624","id":"7","username":"venn"}
submit query : 14-1-1562763488230
execute query : 6-2-phone : 1562661110622-1562763488526
1> {"password":"1562763486855","phone":"1562661110622","id":"6","username":"venn"}
submit query : 15-1-1562763488527
execute query : 4-2-phone : 12345678913-1562763488832
1> {"password":"1562763486748","phone":"12345678913","id":"4","username":"venn"}
hbase、redis或其他实现类似
欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文
Flink 异步IO访问外部数据(mysql篇)的更多相关文章
- 【翻译】Flink 异步I / O访问外部数据
本文来自官网翻译: Asynchronous I/O for External Data Access 需要异步I / O操作 先决条件 异步I / O API 超时处理 结果顺序 活动时间 容错保证 ...
- Flink学习笔记:异步I/O访问外部数据
本文为<Flink大数据项目实战>学习笔记,想通过视频系统学习Flink这个最火爆的大数据计算框架的同学,推荐学习课程: Flink大数据项目实战:http://t.cn/EJtKhaz ...
- salesforce 零基础学习(三十三)通过REST方式访问外部数据以及JAVA通过rest方式访问salesforce
本篇参考Trail教程: https://developer.salesforce.com/trailhead/force_com_dev_intermediate/apex_integration_ ...
- 【Python之路】特别篇--事件驱动与异步IO
通常,我们写服务器处理模型的程序时,有以下几种模型: (1)每收到一个请求,创建一个新的进程,来处理该请求: (2)每收到一个请求,创建一个新的线程,来处理该请求: (3)每收到一个请求,放入一个事件 ...
- Python自动化 【第十篇】:Python进阶-多进程/协程/事件驱动与Select\Poll\Epoll异步IO
本节内容: 多进程 协程 事件驱动与Select\Poll\Epoll异步IO 1. 多进程 启动多个进程 进程中启进程 父进程与子进程 进程间通信 不同进程间内存是不共享的,要想实现两个进程间 ...
- Python开发【第九篇】:协程、异步IO
协程 协程,又称微线程,纤程.英文名Coroutine.一句话说明什么是协程,协程是一种用户态的轻量级线程. 协程拥有自己的寄存器上下文和栈.协程调度切换时,将寄存器上下文和栈保存到其他地方,在切换回 ...
- Python学习-day10(番外篇) 阻塞IO 非阻塞IO 同步IO 异步IO
这个章节的内容是关于IO的概念,谈一谈什么是 阻塞IO 非阻塞IO 同步IO 异步IO.以下摘要是我对这四种IO的一个形象理解. 场景是去去银行办理业务.节点有三个,1)到银行提交申请:2)取号:3) ...
- Flink 中定时加载外部数据
社区中有好几个同学问过这样的场景: flink 任务中,source 进来的数据,需要连接数据库里面的字段,再做后面的处理 这里假设一个 ETL 的场景,输入数据包含两个字段 “type, useri ...
- 大数据工具篇之Hive与MySQL整合完整教程
大数据工具篇之Hive与MySQL整合完整教程 一.引言 Hive元数据存储可以放到RDBMS数据库中,本文以Hive与MySQL数据库的整合为目标,详细说明Hive与MySQL的整合方法. 二.安装 ...
随机推荐
- 在idea中编写自动拉取、编译、启动springboot项目的shell脚本
idea 开发环境搭建 idea中安装shell开发插件 服务器具备的条件 已经安装 lsof(用于检查端口占用) 已安装 git 安装 maven 有 java 环境 背景 代码提交到仓库后,需要在 ...
- kafka一致性语义保证
一.消息传递语义:三种,至少一次,至多一次,精确一次 1.at lest once:消息不丢,但可能重复 2.at most once:消息会丢,但不会重复 3.Exactly Once:消息不丢,也 ...
- Acwing P283 多边形 题解
Analysis 总体来说是一个区间DP 此题首先是一个环,要你进行删边操作,剩下的在经过运算得到一个最大值 注意事项: 1.删去一条边,剩下的构成一条线,相当于求此的最大值,经典区间DP该有的样子: ...
- asp.net之大文件断点续传
ASP.NET上传文件用FileUpLoad就可以,但是对文件夹的操作却不能用FileUpLoad来实现. 下面这个示例便是使用ASP.NET来实现上传文件夹并对文件夹进行压缩以及解压. ASP.NE ...
- Linux系统性能10条命令
概述 通过执行以下命令,可以在1分钟内对系统资源使用情况有个大致的了解. uptime dmesg | tail vmstat 1 mpstat -P ALL 1 pidstat 1 iostat - ...
- 2019 ASP.NET / ASP.NET Core 学习路线 (有中文翻译)
1. 点击此处查看 一个了不起的.NET Core 库.工具.框架和软件的集合 2. 以下路线 适用于 ASP NET 与 ASP NET Core (建议另存为到本地, 然后再查看) 点击此处查看 ...
- window.location.href在微信端不起作用的解决方法?
在我从第一张图的某个活动进去到详情页,点击返回是可以的,这里我是用了一个click事件,window.location.href="某死链接" 但是第二次进去点击之后点击事件是可以 ...
- .netFramework 升级NetCore 问题汇总及解决方案
升级版本: NetCore sdk 2.2.108 .AspNetCore 2.2.0.EFCore 2.2.6 所有程序引用均从NuGet上下载,并支持NetCore 问题: 问题1:No coer ...
- [转] FileZilla Server超详细配置
FileZilla Server下载安装完成后,必须启动软件进行设置,由于此软件是英文,本来就是一款陌生的软件,再加上英文(注:本站提供中文版本,请点击下载),配置难度可想而知,站长从网上找到一篇非常 ...
- MySql数据库转设计文档(mysql-font工具和sql语句导出)
一.工具导出 1.使用的是MySQL-Front工具,这个工具使用非常方便,尤其是导出数据的时候,几百万的数据一两分钟就导完了,推荐使用. MySQL-Front下载(只有3.93M):http:// ...