storm trident 的介绍与使用
一.trident 的介绍
trident 的英文意思是三叉戟,在这里我的理解是因为之前我们通过之前的学习topology spout bolt 去处理数据是没有问题的,但trident 的对spout bolt 更高层次的一个抽象,其实现功能是一样的,只不过是trident做了更多的优化和封装.如果对处理的性能要求比较高,建议要采用spout bolt 来处理,反之则可以用trident
trident 你可以这样理解,本身这个拓扑就是分散的,如果一个spout 可以有2个bolt,跟三叉戟比较像。(个人理解)
因为trident是对storm 更高一层的抽象,它与之前学的spout bolt处理数据流的方式不一样,trident 是以batch(一组tuples)为单位进行处理的。
二.trident API操作
trident采用批处理的方式来处理数据,其API的操作是对数据处理的方式改成了函数。对数据处理的操作有:filter sum aggregator等
function函数的操作都是对流中的tuple进行操作的
下面介绍 trident 常用的API
.each(Fields inputFields, Filter filter)
作用:操作batch中的每一个tuple内容,一般与Filter或者Function函数配合使用。 .peek(Consumer action)
作用:不作任务操作,传的参数是consumer,类似于System.out.println .partitionBy(Fields fields)
作用:将tuples中的数据按设置的字段重定向到下一处理逻辑,设置相同字段的tuple一定会被分配到同一个线程中处理。
三.trident 的常用函数
.FilterFunction 过滤
作用:对一组batch 中的tuple数据过滤 实现过程是:自定义类实现BaseFilter接口,重写isKeep()方法,在each()方法中使用自定义的类即可 .SumFunction 求和
作用:对流中的数据进行加减
实现过程:自定义类实现BaseFunction接口,重写execute方法,在each()方法中使用
.MapFunction (一对一函数)
作用: 对一个tuple进行自定义操作
实现过程:自定义类实现MapFunction接口,重写execute()方法,通过map()方法使用
.ProjectionFunction (投影函数)
作用:投影函数,只保留stream中指定字段的数据。
实现过程:在project()方法中定义所需字段即可
例:有一个Stream包含如下字段: ["x","y","z"],使用投影: project(new Fields("y", "z")) 则输出的流仅包含 ["y","z"]字段 .repatition(重定向)
作用:重定向是指tuple通过下面哪种方式路由到下一层 shuffle: 通过随机分配算法来均衡tuple到各个分区 broadcast: 每个tuple都被广播到所有的分区,这种方式在drcp时非常有用,比如在每个分区上做stateQuery partitionBy:根据指定的字段列表进行划分,具体做法是用指定字段列表的hash值对分区个数做取模运算,确保相同字段列表的数据被划分到同一个分区 global: 所有的tuple都被发送到这个分区上,这个分区用来处理整个Stream的tuple数据的,但这个线程是独立起的 batchGlobal:一个batch中的tuple都被发送到同一个分区,不同的batch会去往不同的分区 partition: 通过一个自定义的分区函数来进行分区,这个自定义函数实现了 backtype.storm.grouping.CustomStreamGrouping
.Aggregation(聚合)
在storm的trident中处理数据是以批的形式进行处理的,所以在聚合时也是对批量内的数据进行的。经过aggregation的tuple,是被改变了原有的数据状态 在Aggregator接口中有3个方法需要实现 init() : 当batch接收到数据时执行。并对tuple中的数据进行初始化
aggregate(): 在接收到batch中的每一个tuple时执行,该方法一个重定向方法,它会随机启动一个单独的线程来进行聚合操作
complete() : 在一个batch的结束时执行
它是对当前partition上的各个batch执行聚合操作,它不是一个重定向操作,即统计batch上的tuple的操作
6.2 aggregator 对一批batch中的tuple数据进行聚合
6.3 ReduceAggregator
对一批batch中第n个元素的操作
对一批batch中的tuple进行聚合操作,它是一个重定向操作
持久化聚合器,在聚合之前先将数据存到一个位置,然后再对数据进行聚合操作
6.6 AggregateChina
聚合链,对一批batch 中的tuple进行多条件聚合操作
7.GroupBy
GroupBy会根据指定字段,把整个stream切分成一个个grouped stream,如果在grouped stream上做聚合操作,那么聚合就会发生在这些grouped stream上而不是整个batch。 如果groupBy后面跟的是aggregator,则是聚合操作,如果跟的是partitionAggregate,则不是聚合操作。
四.trident常用函数示例
1.FilterFunction
需求:在 一组数据中,过滤出第1个值与第2个值相加的值是偶数的
public class FilterTrident {
private static final Logger LOG = LoggerFactory.getLogger(FilterTrident.class);
@SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException {
FixedBatchSpout spout = new FixedBatchSpout(new Fields("a","b","c","d"), 3,
new Values(1,4,7,10),
new Values(1,1,3,11),
new Values(2,2,7,1),
new Values(2,5,7,2));
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false);
TridentTopology topology = new TridentTopology();
// peek: 不做任务操作,因为参数的consumer
// each:spout中的指定元素进行操作
topology.newStream("filter", spout).parallelismHint(1)
.localOrShuffle()
.peek(input -> LOG.info("peek1 ================{},{},{},{}",input.get(0),input.get(1),input.get(2),input.get(3)))
.parallelismHint(2)
.localOrShuffle()
.each(new Fields("a","b"),new CheckEvenSumFilter())
.parallelismHint(2)
.localOrShuffle()
.peek(input -> LOG.info("peek2 +++++++++++++++++++{},{},{},{}",
input.getIntegerByField("a"),input.getIntegerByField("b"),
input.getIntegerByField("c"),input.getIntegerByField("d"))
).parallelismHint(1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("FilterTrident", conf, topology.build());
LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","FilterTrident");
LOG.warn("==================================================");
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("FilterTrident");
cluster.shutdown();
}
private static class CheckEvenSumFilter extends BaseFilter{
@Override
public boolean isKeep(TridentTuple tuple) {
Integer a = tuple.getIntegerByField("a");
Integer b = tuple.getIntegerByField("b");
return (a + b) % 2 == 0;
}
}
}
2.SumFunction
需求:对一组数据中的前2个数求各
public class SumFunctionTrident {
private static final Logger LOG = LoggerFactory.getLogger(SumFunctionTrident.class);
@SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException {
FixedBatchSpout spout = new FixedBatchSpout(new Fields("a","b","c","d"), 3,
new Values(1,4,7,10),
new Values(1,1,3,11),
new Values(2,2,7,1),
new Values(2,5,7,2));
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false);
TridentTopology topology = new TridentTopology();
// peek: 不做任务操作,因为参数的consumer
// each:spout中的指定元素进行操作
topology.newStream("function", spout).parallelismHint(1)
.localOrShuffle()
.peek(input -> LOG.info("peek1 ================{},{},{},{}",input.get(0),input.get(1),input.get(2),input.get(3)))
.parallelismHint(2)
.localOrShuffle()
.each(new Fields("a","b"),new SumFunction(),new Fields("sum"))
.parallelismHint(2)
.localOrShuffle()
.peek(input -> LOG.info("peek2 ================{},{},{},{},{}",
input.getIntegerByField("a"),input.getIntegerByField("b"),input.getIntegerByField("c"),input.getIntegerByField("d"),input.getIntegerByField("sum")))
.parallelismHint(1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("SumFunctionTrident", conf, topology.build());
LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","SumFunctionTrident");
LOG.warn("==================================================");
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("HelloTridentTopology");
cluster.shutdown();
}
private static class SumFunction extends BaseFunction{
@Override
public void execute(TridentTuple tuple, TridentCollector collector) {
Integer a = tuple.getIntegerByField("a");
Integer b = tuple.getIntegerByField("b");
collector.emit(new Values(a+b));
}
}
}
3.MapFunction
需求:对一组batch中的tuple进行大小写转换
public class MapFunctionTrident {
private static final Logger LOG = LoggerFactory.getLogger(MapFunctionTrident.class);
@SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException, AlreadyAliveException, InvalidTopologyException, AuthorizationException {
boolean isRemoteMode = false;
if(args.length > 0){
isRemoteMode = true;
}
FixedBatchSpout spout = new FixedBatchSpout(new Fields("line"),3,
new Values("hello stream"),
new Values("hello kafka"),
new Values("hello hadoop"),
new Values("hello scala"),
new Values("hello java")
);
spout.setCycle(true);
TridentTopology topology = new TridentTopology();
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false);
topology.newStream("hello", spout).parallelismHint(1)
.localOrShuffle()
.map(new MyMapFunction(),new Fields("upper"))
.parallelismHint(2)
.partition(Grouping.fields(ImmutableList.of("upper")))
.peek(input ->LOG.warn("================>> peek process value:{}",input.getStringByField("upper")))
.parallelismHint(3);
if(isRemoteMode){
StormSubmitter.submitTopology("HelloTridentTopology", conf, topology.build());
LOG.warn("==================================================");
LOG.warn("the remote topology {} is submitted.","HelloTridentTopology");
LOG.warn("==================================================");
}else{
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("HelloTridentTopology", conf, topology.build());
LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","HelloTridentTopology");
LOG.warn("==================================================");
TimeUnit.SECONDS.sleep(5);
cluster.killTopology("HelloTridentTopology");
cluster.shutdown();
}
}
private static class MyMapFunction implements MapFunction{
private static final Logger LOG = LoggerFactory.getLogger(MyMapFunction.class);
@Override
public Values execute(TridentTuple input) {
String line = input.getStringByField("line");
LOG.warn("================>> myMapFunction process execute:value :{}",line);
return new Values(line.toUpperCase());
}
}
}
4.ProjectionFunctionTrident
需求:对一组tuple的数据,取部分数据
public class ProjectionFunctionTrident {
private static final Logger LOG = LoggerFactory.getLogger(ProjectionFunctionTrident.class);
public static void main(String [] args) throws InterruptedException{
@SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("x","y","z"), 3,
new Values(1,2,3),
new Values(4,5,6),
new Values(7,8,9),
new Values(10,11,12)
);
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
TridentTopology topology = new TridentTopology();
topology.newStream("ProjectionTrident", spout).parallelismHint(1)
.localOrShuffle().peek(tridentTuple ->LOG.info("================ {}",tridentTuple)).parallelismHint(2)
.shuffle()
.project(new Fields("y","z")).parallelismHint(2)
.localOrShuffle().peek(tridentTuple ->LOG.info(">>>>>>>>>>>>>>>> {}",tridentTuple)).parallelismHint(2);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("ProjectionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("ProjectionTrident");
cluster.shutdown();
}
}
5.2 Broadcast
需求:将一组batch 的tuple数据发送到所有partition上
public class BroadcastRepartitionTrident {
private static final Logger LOG = LoggerFactory.getLogger(BroadcastRepartitionTrident.class);
public static void main(String [] args) throws InterruptedException{
@SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("haddop",3),
new Values("java",4),
new Values("haddop",5)
);
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
TridentTopology topology = new TridentTopology();
topology.newStream("BroadcastRepartitionTrident", spout).parallelismHint(1)
.broadcast().peek(tridentTuple ->LOG.info("================ {}",tridentTuple))
.parallelismHint(2);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("BroadcastRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("BroadcastRepartitionTrident");
cluster.shutdown();
}
}
5.3 PartitionBy
需求:将一组batch中的tuple通过设置的字段分到同一个task中执行
public class PartitionByRepartitionTrident {
private static final Logger LOG = LoggerFactory.getLogger(PartitionByRepartitionTrident.class);
public static void main(String [] args) throws InterruptedException{
@SuppressWarnings("unchecked")
//FixedBatchSpout()里面参数解释:
// 1.spout 的字段名称的设置
// 2.设置数据几个为一个批次
// 3.字段值的设置
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",23),
new Values("scala",3),
new Values("haddop",10),
new Values("java",23),
new Values("haddop",10)
);
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
TridentTopology topology = new TridentTopology();
topology.newStream("PartitionByRepartitionTrident", spout).parallelismHint(1)
.partitionBy(new Fields("language")).peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("PartitionByRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("PartitionByRepartitionTrident");
cluster.shutdown();
}
}
5.4 Global
需求:对一组batch中的tuple 进行全局分组统计
public class GlobalRepatitionTrident {
private static final Logger LOG = LoggerFactory.getLogger(GlobalRepatitionTrident.class);
public static void main(String [] args) throws InterruptedException{
@SuppressWarnings("unchecked")
//FixedBatchSpout()里面参数解释:
// 1.spout 的字段名称的设置
// 2.设置数据几个为一个批次
// 3.字段值的设置
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",23),
new Values("scala",3),
new Values("haddop",10),
new Values("java",23),
new Values("haddop",10)
);
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
TridentTopology topology = new TridentTopology();
topology.newStream("PartitionByRepartitionTrident", spout).parallelismHint(1)
.partitionBy(new Fields("language"))
.parallelismHint(3) //不管配多少个并行度,都没有影响
.peek(tridentTuple ->LOG.info(" ================= {}",tridentTuple))
.global()
.peek(tridentTuple ->LOG.info(" >>>>>>>>>>>>>>>>> {}",tridentTuple));
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("GlobalRepatitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("GlobalRepatitionTrident");
cluster.shutdown();
}
}
5.5 batchGlobal
需求:不同batch的tuple分到不同的task中
public class BatchGlobalRepatitionTrident2 {
private static final Logger LOG = LoggerFactory.getLogger(BatchGlobalRepatitionTrident2.class);
public static void main(String [] args) throws InterruptedException{
@SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
TridentTopology topology = new TridentTopology();
topology.newStream("BatchGlobalRepatitionTrident2", spout).parallelismHint(1)
.batchGlobal().peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("BatchGlobalRepatitionTrident2", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("BatchGlobalRepatitionTrident2");
cluster.shutdown();
}
}
5.6 partition
需求:自定义partition
public class CustomRepartitionTrident {
private static final Logger LOG = LoggerFactory.getLogger(CustomRepartitionTrident.class);
public static void main(String [] args) throws InterruptedException{
@SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("haddop",3),
new Values("java",4),
new Values("haddop",5)
);
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
TridentTopology topology = new TridentTopology();
topology.newStream("CustomRepartitionTrident", spout).parallelismHint(1)
.partition(new HighTaskIDGrouping()).peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(2);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("CustomRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("CustomRepartitionTrident");
cluster.shutdown();
}
}
/**
* 自定义grouping :
* 让task编号更大的执行任务
* @author pengbo.zhao
*
*/
public class HighTaskIDGrouping implements CustomStreamGrouping{
private int taskID;
@Override
public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) {
//List<Integer> targetTasks: 下游所有的tasks的集合
ArrayList<Integer> tasks = new ArrayList<>(targetTasks);
Collections.sort(tasks); //从小到大排列
this.taskID = tasks.get(tasks.size() -1);
}
@Override
public List<Integer> chooseTasks(int taskId, List<Object> values) {
return Arrays.asList(taskID);
}
}
6.1 partitionAggregate
需求:对一组batch中tuple个数的统计
public class PartitionAggregateTrident {
private static final Logger LOG = LoggerFactory.getLogger(PartitionAggregateTrident.class);
private FixedBatchSpout spout;
@SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
}
@Test
public void testPartitionAggregtor(){
TridentTopology topoloty = new TridentTopology();
topoloty.newStream("PartitionAggregateTrident", spout).parallelismHint(2)//内部的优先级参数是1,所以我们写2是无效的
.shuffle()
.partitionAggregate(new Fields("name","age"), new Count(),new Fields("count"))
.parallelismHint(2)
// .each(new Fields("count"),new Debug());
.peek(input ->LOG.info(" >>>>>>>>>>>>>>>>> {}",input.getLongByField("count")));
this.submitTopology("PartitionAggregateTrident", topoloty.build());
}
public void submitTopology(String name,StormTopology topology) {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology);
try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
}
public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
6.2 aggregator
需求:对tuple中的数据进行统计
public class AggregateTrident {
private static final Logger LOG = LoggerFactory.getLogger(AggregateTrident.class);
private FixedBatchSpout spout;
@SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
}
@Test
public void testPartitionAggregtor(){
TridentTopology topoloty = new TridentTopology();
topoloty.newStream("AggregateTrident", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("name","age"), new Count(),new Fields("count"))
// .aggregate(new Fields("name","age"), new CountAsAggregator(),new Fields("count"))
.parallelismHint(2)
.each(new Fields("count"),new Debug())
.peek(input -> LOG.info("============> count:{}",input.getLongByField("count")));
this.submitTopology("AggregateTrident", topoloty.build());
}
public void submitTopology(String name,StormTopology topology) {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology);
try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
}
public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
6.3 reduceAggregator
需求:对一批batch 中的tuple第0个元素求和。 即一批batch中的多少条tuple,对tuple中的指定字段求和
public class ReduceAggregatorTrident {
private FixedBatchSpout spout;
@SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
}
@Test
public void testReduceAggregator(){
TridentTopology topoloty = new TridentTopology();
topoloty.newStream("ReduceAggregator", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("age","name"), new MyReduce(),new Fields("sum"))
.parallelismHint(5)
.each(new Fields("sum"),new Debug());
this.submitTopology("ReduceAggregator", topoloty.build());
}
public void submitTopology(String name,StormTopology topology) {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology);
try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
}
public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
static class MyReduce implements ReducerAggregator<Integer>{
@Override
public Integer init() {
return 0; //初始值为0
}
@Override
public Integer reduce(Integer curr, TridentTuple tuple) {
return curr + tuple.getInteger(0);
}
}
}
6.4 combinerAggregate
需求:对tuple中的字段进行求和操作
public class CombinerAggregate {
private FixedBatchSpout spout;
@SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
}
@Test
public void testCombinerAggregate(){
TridentTopology topoloty = new TridentTopology();
topoloty.newStream("CombinerAggregate", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("age"), new MyCount(),new Fields("count"))
.parallelismHint(5)
.each(new Fields("count"),new Debug());
this.submitTopology("CombinerAggregate", topoloty.build());
}
public void submitTopology(String name,StormTopology topology) {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology);
try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
}
public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
static class MyCount implements CombinerAggregator<Integer>{
@Override
public Integer init(TridentTuple tuple) {
return tuple.getInteger(0);
}
@Override
public Integer combine(Integer val1, Integer val2) {
return val1 + val2;
}
@Override
public Integer zero() {
return 0;
}
}
}
6.5 persistenceAggregator
需求:对一批batch中tuple元素进行统计
public class PersistenceAggregator {
private static final Logger LOG = LoggerFactory.getLogger(PersistenceAggregator.class);
private FixedBatchSpout spout;
@SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
}
@Test
public void testPersistenceAggregator(){
TridentTopology topoloty = new TridentTopology();
topoloty.newStream("testPersistenceAggregator", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.persistentAggregate(new MemoryMapState.Factory(), new Fields("name"), new Count(),new Fields("count"))
.parallelismHint(4)
.newValuesStream()
.peek(input ->LOG.info("count:{}",input.getLongByField("count")));
this.submitTopology("testPersistenceAggregator", topoloty.build());
}
public void submitTopology(String name,StormTopology topology) {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology);
try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
}
public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
6.6 AggregateChina
需求:对batch中的tuple进行统计、求和、统计操作
public class AggregateChina {
private static final Logger LOG = LoggerFactory.getLogger(AggregateChina.class);
private FixedBatchSpout spout;
@SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"),3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
}
@Test
public void testAggregateChina(){
TridentTopology topoloty = new TridentTopology();
topoloty.newStream("AggregateChina", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.chainedAgg()
.aggregate(new Fields("name"),new Count(), new Fields("count"))
.aggregate(new Fields("age"),new Sum(), new Fields("sum"))
.aggregate(new Fields("age"),new Count(), new Fields("count2"))
.chainEnd()
.peek(tuple->LOG.info("{}",tuple));
this.submitTopology("AggregateChina", topoloty.build());
}
public void submitTopology(String name,StormTopology topology) {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology);
try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
}
public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
7.GroupBy
需求:对一批batch中的tuple按name来分组,求对分组后的tuple中的数据进行统计
public class GroupBy {
private static final Logger LOG = LoggerFactory.getLogger(GroupBy.class);
private FixedBatchSpout spout;
@SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
}
@Test
public void testGroupBy(){
TridentTopology topoloty = new TridentTopology();
topoloty.newStream("GroupBy", spout).parallelismHint(1)
// .partitionBy(new Fields("name"))
.groupBy(new Fields("name"))
.aggregate(new Count(), new Fields("count"))
.peek(tuple -> LOG.info("{},{}",tuple.getFields(),tuple));
this.submitTopology("GroupBy", topoloty.build());
}
public void submitTopology(String name,StormTopology topology) {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology);
try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
}
public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
storm trident 的介绍与使用的更多相关文章
- Strom-7 Storm Trident 详细介绍
一.概要 1.1 Storm(简介) Storm是一个实时的可靠地分布式流计算框架. 具体就不多说了,举个例子,它的一个典型的大数据实时计算应用场景:从Kafka消息队列读取消息( ...
- Storm Trident API
在Storm Trident中有五种操作类型 Apply Locally:本地操作,所有操作应用在本地节点数据上,不会产生网络传输 Repartitioning:数据流重定向,单纯的改变数据流向,不会 ...
- Storm专题二:Storm Trident API 使用具体解释
一.概述 Storm Trident中的核心数据模型就是"Stream",也就是说,Storm Trident处理的是Stream.可是实际上Stream是被成批处理的. ...
- storm trident 示例
Storm Trident的核心数据模型是一批一批被处理的“流”,“流”在集群的分区在集群的节点上,对“流”的操作也是并行的在每个分区上进行. Trident有五种对“流”的操作: 1. 不 ...
- Storm流分组介绍
Storm流分组介绍 流分组是拓扑定义的一部分,每个Bolt指定应该接收哪个流作为输入.流分组定义了流/元组如何在Bolt的任务之间进行分发.在设计拓扑的时候需要定义数据 ...
- storm trident merger
import java.util.List; import backtype.storm.Config; import backtype.storm.LocalCluster; import back ...
- storm trident的filter和函数
目的:通过kafka输出的信息进行过滤,添加指定的字段后,进行打印 SentenceSpout: package Trident; import java.util.HashMap; import j ...
- storm trident function函数
package cn.crxy.trident; import java.util.List; import backtype.storm.Config; import backtype.storm. ...
- 第1节 storm编程:2、storm的基本介绍
课程大纲: 1.storm的基本介绍 2.storm的架构模型 3.storm的安装 4.storm的UI管理界面 5.storm的编程模型 6.storm的入门程序 7.storm的并行度 8.st ...
随机推荐
- Hadoop集群(第1期)CentOS安装配置
1.准备安装 1.1 系统简介 CentOS 是什么? CentOS是一个基于Red Hat 企业级 Linux 提供的可自由使用的源代码企业级的 Linux 发行版本.每个版本的 CentOS 都会 ...
- SYN4505型 标准同步时钟
SYN4505型 标准同步时钟 标准同步时钟电厂时间同步使用说明视频链接: http://www.syn029.com/h-pd-245-0_310_1_-1.html 请将此链接复制到浏览器打开观看 ...
- 通往Google之路:***
*** & BBR 安装 系统支持:CentOS 6+, Debian 7+, Ubuntu 12+ 内存要求:≥128M --- 前提 满足以上要求的VPS服务器一台 安装基础命令工具:yu ...
- uint16,uint32是什么?
记得之前在刷笔试题的时候就看见过这个问题,发现当时上网百度后又忘了. 最近在看CryEngine3引擎代码的时候又晕了,趁现在赶紧记下来~ 在查看CE3的代码时我发现了这个变量,TFlowNodeId ...
- 跟我学SpringCloud | 第二篇:注册中心Eureka
Eureka是Netflix开源的一款提供服务注册和发现的产品,它提供了完整的Service Registry和Service Discovery实现.也是springcloud体系中最重要最核心的组 ...
- 【JDK8】HashMap集合 源码阅读
JDK8的HashMap数据结构上复杂了很多,因此读取效率得以大大提升,关于源码中红黑树的增删改查,博主没有细读,会在下一篇博文中使用Java实现红黑树的增删改查. 下面是类的结构图: 代码(摘抄自J ...
- 无法启动print spooler服务,错误2,系统找不到指定的文件
来自百度: 无法启动print spooler服务,错误2,系统找不到指定的文件 我的打印机无法运行:出现"打印后台程序没有执行"提示.查:print spooler没有启动.点击 ...
- 汇编入门三-CPU工作原理
本文为读书笔记,个人总结与摘抄自<汇编语言 第二版> 1.CPU从内存中读取数据,首先要获得存储单元的地址. 2.指明进行的操作,如存储或者读写 所以,CPU要进行操作总结为: 1.存储单 ...
- 第六章 Fisco Bcos 多服务器分布式部署
想了解相关区块链开发,技术提问,请加QQ群:538327407 前提概要 前面几章,我们通过单机部署,在单台服务器上搭建四个节点,完成Fisco Bcos 底层搭建,并完成相关合约开发.sdk 开发. ...
- PATA 1027 Colors In Mars
#include <cstdio> char radix[13] = {'0','1','2','3','4','5','6','7','8','9','A','B','C'}; int ...