Storm(1) - Setting Up Development Environment
Setting up your development environment
1. download j2se 6 SDK from http://www.oracle.com/technetwork/java/javase/downloads/index.html chmod 775 jdk-6u35-linux-64.bin yes | jdk-6u35-linux-64.bin mv jdk1.6.0_35 /opt ln -s /opt/jdk1.6.0_35/bin/java /usr/bin ln -s /opt/jdk1.6.0_35/bin/javac /usr/bin export JAVA_HOME=/opt/jdk1.6.0_35 export PATH=$PATH:$JAVA_HOME/bin 2. install Git sudo apt-get install git 3. install Maven sudo apt-get install mvn 4. install Puppet, Vagrant, and VirtualBox sudo apt-get install virtualbox puppet vagrant 5. install Eclipse sudo apt-get install eclipse
Distributed version control
1. create project directory mkdir FirstGitProject cd FirstGitProject git init 2. create some filess in the repository touch README.txt vim README.txt 3. review the status of the repository git status 4. add all files and folders manually git add README.txt 5. commit the file git commit -a -m "The first commit" 6. add the remote repository to local repository and push the changes git remote add origin https://[user]@bitbucket.org/[user]/firstgitproject.git git push origin master
Creating a "Hello World" topology
1. create a new project folder and init Git repository
mkdir HelloWorld
cd HelloWorld
git init
2. create Maven project file
vi pom.xml
3. create the basic XML tags and project metadata
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>storm.cookbook</groupId>
<artifactId>hello-world</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>hello-world</name>
<url>https://bitbucket.org/[user]/hello-world</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
</project>
4. declare the Maven repositories to fetch the dependencies from
<repositories>
<repository>
<id>github-releases</id>
<url>http://oss.sonatype.org/content/repositories/github-releases/</url>
</repository>
<repository>
<id>clojars.org</id>
<url>http://clojars.org/repo</url>
</repository>
<repository>
<id>twitter4j</id>
<url>http://twitter4j.org/maven2</url>
</repository>
</repositories>
5. declare the dependencies
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>storm</groupId>
<artifactId>storm</artifactId>
<version>0.8.1</version>
<!-- keep storm out of the jar-with-dependencies -->
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.googlecode.json-simple</groupId>
<artifactId>json-simple</artifactId>
<version>1.1</version>
</dependency>
</dependencies>
6. add the build plugin definitions
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>com.theoryinpractise</groupId>
<artifactId>clojure-maven-plugin</artifactId>
<version>1.3.8</version>
<extensions>true</extensions>
<configuration>
<sourceDirectories>
<sourceDirectory>src/clj</sourceDirectory>
</sourceDirectories>
</configuration>
<executions>
<execution>
<id>compile</id>
<phase>compile</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test</id>
<phase>test</phase>
<goals>
<goal>test</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
</plugins>
</build>
7. complete required folder structure
src/main/java
src/test
8. generate the Eclipse project
mvn eclipse:eclipse
9. create the spout of HelloWorldSpout (HelloWorldSpout.java)
package storm.cookbook
class HelloWorldSpout : BaseRichSpout {
private SpoutOutputCollector collector;
private int referenceRandom;
private static final int MAX_RANDOM = 10;
public HelloWorldSpout() {
final Random rand = new Random();
referenceRandom = rand.nextInt(MAX_RANDOM);
}
}
10. after construction, the Storm cluster will open the spout
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.collector = collector;
}
11. The Storm cluster will repeatedly call the nextTuple method, which will do all the work of the spout
public void nextTuple() {
final Random rand = new Random();
int instanceRandom = rand.nextInt(MAX_RANDOM);
if (instanceRandom == referenceRandom) {
collector.emit(new Values("Hello World"));
}
else {
collector.emit(new Values("Other Random Word"));
}
}
12. tell the Storm cluster which fields this spout emits within the declareOutputFields
public void declareOutputFields() {
declarer.declare(new Fields("sentence"));
}
13. create the bolt of HelloWorldBolt (HelloWorldBolt.java)
package storm.cookbook
class HelloWorldBolt : BaseRichBolt {
private int myCount;
public void execute() {
String test = input.getStringByField("sentence");
if ("Hello World".equals(test)) {
myCount++;
System.out.println("Found a Hello World! My Count is now: " + Integer.toString(myCount));
}
}
}
14. create a main class to declare the Storm topology (HelloWorldTopology.java)
package storm.cookbook
public class HelloWorldTopology {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("randomHelloWorld", new HelloWorldSpout(), 10);
builder.setBolt("HelloWorldBolt", new HelloWorldBolt(), 2).shuffleGrouping("randomHelloWorld");
Config conf = new Config();
conf.setDebug(true);
if(args!=null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
}
else {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf, builder.createTopology());
Utils.sleep(10000);
cluster.killTopology("test");
cluster.shutdown();
}
}
15. execute the cluster from the project's root folder
mvn compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=storm.cookbook.HelloWorldTopology
Creating a Storm Cluster - provisioning the machines
. create a new project named vagrant-storm-cluster with the following data structure
vagrant-storm-cluster
vagrant-storm-cluster/data
vagrant-storm-cluster/manifest
vagrant-storm-cluster/modules
vagrant-storm-cluster/scripts
. create a file in the project root called Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
boxes = [
{ :name => :nimbus, :ip => , :memory => },
{ :name => :supervisor1, :ip => , :memory => },
{ :name => :supervisor2, :ip => , :memory => },
{ :name => :zookeeper1, :ip => , :memory => }
]
. define the hardware, networking, and operating system
boxes.each do |opts|
config.vm.define opts[:name] do |config|
config.vm.box = "ubuntu12"
config.vm.box_url = "http://dl.dropbox.com/u/1537815/precise64.box"
config.vm.network :hostonly, opts[:ip]
config.vm.host_name = "storm.%s" % opts[:name].to_s
config.vm.share_folder "v-data", "/vagrant_data", "./data", :transient => false
config.vm.customize ["modifyvm", :id, "--memory", opts[:memory]]
config.vm.customize ["modifyvm", :id, "--cpus", opts[:cpus]] if opts[:cpus]
. config the provisioning of the application
config.vm.provision :shell, :inline => "cp -fv /vagrant_data/hosts/etc/hosts"
config.vm.provision :shell, :inline => "apt-get update"
# Check if the jdk has been provided
if File.exist?("./data/jdk-6u35-linux-x64.bin") then
config.vm.provision :puppet do |puppet|
puppet.manifests_path = "manifests"
puppet.manifest_file = "jdk.pp"
end
end
config.vm.provision :puppet do |puppet|
puppet.manifests_path = "manifests"
puppet.manifest_file = "provisioningInit.pp"
end
# Ask puppet to do the provisioning now.
config.vm.provision :shell, :inline => "puppet apply /tmp/storm-puppet/manifests/site.pp --verbose --modulepath=/tmp/storm-puppet/modules/ --debug"
end
end
. create installJdk.sh file in the scripts folder
#!/bin/sh
echo "Installing JDK!"
/vagrant_data/jdk-6u35-linux-x64.bin
cd /root
yes | /vagrant_data/jdk-6u35-linux-x64.bin
.0_35 /opt
rm -rv /usr/bin/java
rm -rv /usr/bin/javac
.0_35/bin/java /usr/bin
.0_35/bin/javac /usr/bin
export JAVA_HOME=/opt/jdk1..0_35
export PATH=$PATH:$JAVA_HOME/bin
. create jdk.pp file in the manifest folder
$JDK_VERSION = "1.6.0_35"
package { "openjdk":
ensure => absent,
}
exec { "installJdk":
command => "installJdk.sh",
path => "/bagrant/scripts",
logoutput => true,
creates => "/opt/jdk${JDK_VERSION}",
}
. create provisioningInit.pp file in the manifest folder
$CLONE_URL = "https://bitbucket.org/qanderson/storm-puppet.git"
$CHECKOUT_DIR="/tmp/storm-puppet"
package {git:ensure=> [latest,installed]}
package {puppet:ensure=> [latest,installed]}
package {ruby:ensure=> [latest,installed]}
package {rubygems:ensure=> [latest,installed]}
package {unzip:ensure=> [latest,installed]}
exec { "install_hiera":
command => "gem install hiera hiera-puppet",
path => "/usr/bin",
require => Package['rubygems'],
}
. clone the repository, which contains the second level of provision
exec { "clone_storm-puppet":
command => "git clone ${CLONE_URL}",
cwd => "/tmp",
path => "/usr/bin",
creates => "${CHECKOUT_DIR}",
require => Package['git'],
}
. configure Puppet plugin of Hiera, which is used to externalize properties from the provisioning scripts
exec {"/bin/ln -s /var/lib/gems/1.8/gems/hiera-puppet-1.0.0/ /tmp/storm-puppet/modules/hiera-puppet":
creates => "/tmp/storm-puppet/modules/hiera-puppet",
require => [Exec['clone_storm-puppet'],Exec['install_hiera']]
}
#install hiera and the storm configuration
file { "/etc/puppet/hiera.yaml":
source => "/vagrant_data/hiera.yaml",
replace => true,
require => Package['puppet']
}
file { "/etc/puppet/hieradata":
ensure => directory,
require => Package['puppet']
}
file {"/etc/puppet/hieradata/storm.yaml":
source => "${CHECKOUT_DIR}/modules/storm.yaml",
replace => true,
require => [Exec['clone_storm-puppet'],File['/etc/puppet/hieradata']]
}
. create the Hiera base configuration file in data folder
hiera.yaml:
---
:hierarchy:
- %{operatingsystem}
- storm
:backends:
- yaml
:yaml:
:datadir: '/etc/puppet/hieradata'
. config the host's file
127.0.0.1 localhost
192.168.33.100 storm.nimbus
192.168.33.101 storm.supervisor1
192.168.33.102 storm.supervisor2
192.168.33.103 storm.supervisor3
192.168.33.104 storm.supervisor4
192.168.33.105 storm.supervisor5
192.168.33.201 storm.zookeeper1
192.168.33.202 storm.zookeeper2
192.168.33.203 storm.zookeeper3
192.168.33.204 storm.zookeeper4
. init the Git repository for this project and push it to bitbucket.org
Creating a Storm cluster - provisioning Storm
Once you have a base set of virtual machines that are ready for application provisioning, you need to install and configure the appropriate packages on each node.
. create a new project named storm-puppet
storm-puppet
storm-puppet/manifests
storm-puppet/modules
storm-puppet/modules/storm
storm-puppet/modules/storm/manifests
storm-puppet/modules/storm/templates
. create site.pp in the manifests folder
node 'storm.nimbus' {
$cluster = 'storm1'
include storm::nimbus
include storm::ui
}
node /storm.supervisor[-]/ {
$cluster = 'storm1'
include storm::supervisor
}
node /storm.zookeeper[-]/ {
include storm::zoo
}
. create init.pp in /modules/storm/manifests
class storm {
include storm::install
include storm::config
}
. create install.pp (in /modules/storm/manifests? or /manifests?)
class storm::install {
$BASE_URL="https://bitbucket.org/qanderson/storm-deb-packaging/downloads/"
$ZMQ_FILE="libzmq0_2.1.7_amd64.deb"
$JZMQ_FILE="libjzmq_2.1.7_amd64.deb"
$STORM_FILE="storm_0.8.1_all.deb"
package { "wget":
ensure => latest
}
# call fetch for each file
exec { "wget_storm":
command => "/usr/bin/wget ${BASE_URL}${STORM_FILE}"
}
exec {"wget_zmq":
command => "/usr/bin/wget ${BASE_URL}${ZMQ_FILE}"
}
exec { "wget_jzmq":
command => "/usr/bin/wget ${BASE_URL}${JZMQ_FILE}"
}
#call package for each file
package { "libzmq0":
provider => dpkg,
ensure => installed,
source => "${ZMQ_FILE}",
require => Exec['wget_zmq']
}
#call package for each file
package { "libjzmq":
provider => dpkg,
ensure => installed,
source => "${JZMQ_FILE}",
require => [Exec['wget_jzmq'],Package['libzmq0']]
}
#call package for each file
package { "storm":
provider => dpkg,
ensure => installed,
source => "${STORM_FILE}",
require => [Exec['wget_storm'], Package['libjzmq']]
}
}
. create config.pp in the storm manifests
class storm::config {
require storm::install
include storm::params
file { '/etc/storm/storm.yaml':
require => Package['storm'],
content => template('storm/storm.yaml.erb'),
owner => 'root',
group => 'root',
mode => '
}
file { '/etc/default/storm':
require => Package['storm'],
content => template('storm/default.erb'),
owner => 'root',
group => 'root',
mode => '
}
}
. create params.pp in the storm manifests for Hiera
class storm::params {
#_ STORM DEFAULTS _#
$java_library_path = hiera_array('java_library_path', ['/usr/local/lib', '/opt/local/lib', '/usr/lib'])
}
. specify the nimbus, supervisor, ui, and zoo class
class storm::nimbus {
require storm::install
include storm::config
include storm::params
# Install nimbus /etc/default
storm::service { 'nimbus':
start => 'yes',
jvm_memory => $storm::params::nimbus_mem
}
}
class storm::supervisor {
require storm::install
include storm::config
include storm::params
# Install supervisor /etc/default
storm::service { 'supervisor':
start => 'yes',
jvm_memory => $storm::params::supervisor_mem
}
}
class storm::ui {
require storm::install
include storm::config
include storm::params
# Install ui /etc/default
storm::service { 'ui':
start => 'yes',
jvm_memory => $storm::params::ui_mem
}
}
class storm::zoo {
package {['zookeeper','zookeeper-bin','zookeeperd']:
ensure => latest,
}
}
. init the Git repository and push it to bitbucket.org
. navigate to the vagrant-storm-cluster folder to run the provisioning
vagrant up
. vagrant ssh nimbus
Deriving basic click statistics
Getting ready
wget http://download.redis.io/redis-stable.tar.gz
tar xvzf redis-stable.tar.gz
cd redis-stable
make
sudo cp redis-server /usr/local/bin/
sudo cp redis-cli /usr/local/bin/
Then start the Redis Server.
1. crate a new java project named click-topology, and create the pom.xml file and folder structure as per the "Hello World" topology project.
mkdir ClickTopology
src/test
src/main/java
vi pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>storm.cookbook</groupId>
<artifactId>click-topology</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>click-topology</name>
<url>https://bitbucket.org/[user]/hello-world</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
</project>
2. add the <dependencies> tag
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jmock</groupId>
<artifactId>jmock-junit4</artifactId>
<version>2.5.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jmock</groupId>
<artifactId>jmock-legacy</artifactId>
<version>2.5.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>2.1.0</version>
</dependency>
3. create the ClickTopology main class in the package of storm.cookbook under the folder of src/main/java
public class ClickTopology {
TopologyBuilder builder = new TopologyBuilder();
public ClickTopology() {
builder.setSpout("clickSpout", new ClickSpout(), 10);
//First layer of bolts
builder.setBolt("repeatsBolt", new RepeatVisitBolt(), 10).shuffleGrouping("clickSpout");
builder.setBolt("geographyBolt", new GeographyBolt(new HttpIPResolver()), 10).shuffleGrouping("clickSpout");
//second layer of bolts, commutative in nature
builder.setBolt("totalStats", new VisitStatsBolt(), 1).globalGrouping("repeatsBolt");
builder.setBolt("geoStats", new GeoStatsBolt(), 10).fieldsGrouping("geographyBolt", new Fields(storm.cookbook.Fields.COUNTRY));
conf.put(Conf.REDIS_PORT_KEY, DEFAULT_JEDIS_PORT);
}
public void runLocal(int runTime){
conf.setDebug(true);
conf.put(Conf.REDIS_HOST_KEY, "localhost");
cluster = new LocalCluster();
cluster.submitTopology("test", conf, builder.createTopology());
if(runTime > 0){
Utils.sleep(runTime);
shutDownLocal();
}
}
public void shutDownLocal(){
if(cluster != null){
cluster.killTopology("test");
cluster.shutdown();
}
}
public void runCluster(String name, String redisHost) throws AlreadyAliveException, InvalidTopologyException {
conf.setNumWorkers(20);
conf.put(Conf.REDIS_HOST_KEY, redisHost);
StormSubmitter.submitTopology(name, conf, builder.createTopology());
}
}
4. add the main method.
public static void main(String[] args) throws Exception {
ClickTopology topology = new ClickTopology();
if(args!=null && args.length > 1) {
topology.runCluster(args[0], args[1]);
} else {
if(args!=null && args.length == 1) {
System.out.println("Running in local mode, redis ip missing for cluster run");
}
topology.runLocal(10000);
}
}
5. the topology assumes that the web server pushes message onto a Redis queue. you must create a spout to inject these into the Storm cluster as a stream. create the ClickSpout class, connect to Redis when it is opened by the cluster.
public class ClickSpout extends BaseRichSpout {
public static Logger logger = Logger.getLogger(ClickSpout.class);
private Jedis jedis;
private String host;
private int port;
private SpoutOutputCollector collector;
@Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields(storm.cookbook.Fields.IP, storm.cookbook.Fields.URL, storm.cookbook.Fields.CLIENT_KEY));
}
@Override
public void open(Map conf, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
host = conf.get(Conf.REDIS_HOST_KEY).toString();
port = Integer.valueOf(conf.get(Conf.REDIS_PORT_KEY).toString());
this.collector = spoutOutputCollector;
connectToRedis();
}
private void connectToRedis() {
jedis = new Jedis(host, port);
}
}
6. the cluster will then poll the spout for new tuples through the nextTuple method
public void nextTuple() {
String content = jedis.rpop("count");
if (content == null || "nil".equals(content)) {
try {
Thread.sleep(300);
}
catch (InterruptdException e) {
}
}
else {
JSONObject obj = (JSONObject)JSONValue.parse(content);
String ip = obj.get(storm.cookbook.Fields.IP).toString();
String url = obj.get(storm.cookbook.Fields.URL).toString();
String clientKey = obj.get(storm.cookbook.Fields.CLIENT_KEY).toString();
collector.emit(new Values(ip, url, clientKey));
}
}
7. create the bolts that will enrich the basic data through the database or remote API lookups.
public class RepeatVisitBolt extends BaseRichBolt {
private OutputCollector collector;
private Jedis jedis;
private String host;
private int port;
@Override
public void prepare(Map conf, TopologyContext topologyContext, outputCollector outputCollector) {
this.collector = outputCollector;
host = conf.get(Conf.REDIS_HOST_KEY).toString();
port = Integer.valueOf(conf.get(Conf.REDIS_PORT_KEY).toString());
connectToRedis();
}
private void connectToRedis() {
jedis = new Jedis(host, port);
jedis.connect();
}
}
8. add the execute method, which look up the previous visit flags from Redis, based on the fields in the tuple, and emit the enriched tuple
public void execute(Tuple tuple) {
String ip = tuple.getStringByField(storm.cookbook.Fields.IP);
String clientKey = tuple.getStringByField(storm.cookbook.Fields.CLIENT_KEY);
String url = tuple.getStringByField(storm.cookbook.Fields.URL);
String key = url + ":" + clientKey;
String value = jedis.get(key);
if(value == null) {
jedis.set(key, "visited");
collector.emit(new Values(clientKey, url, Boolean.TRUE.toString()));
}
else {
collector.emit(new Values(clientKey, url, Boolean.FALSE.toString()));
}
}
9. create the GeographyBolt class
package storm.bookbook;
public class GeographyBolt extends BaseRichBolt {
public void execute(Tuple tuple) {
String ip = tuple.getStringByField(storm.cookbook.Fields.IP);
JSONObject json = resolver.resolveIP(ip);
String city = (String) json.get(storm.cookbook.Fields.CITY);
String country = (String) json.get(storm.cookbook.Fields.COUNTRY_NAME);
collector.emit(new Values(country, city));
}
}
10. create the HttpIPResolver class, and injecting it into GeographyBolt at design time
public class HttpIPResolver implements IPResolver, Serializable {
}
11. splits streams into the bolt
builder.setBolt("geoStats", new GeoStatsBolt(), 10).fieldsGrouping("geographyBolt", new Fields(storm.bookbook.Fields.COUNTRY));
12. create the GeoStatsBolt class
public class GeoStatsBolt {
public void execute(Tuple tuple) {
String country = tuple.getStringByField(storm.cookbook.Fields.COUNTRY);
String city = tuple.getStringByField(Fields.CITY);
if(!stats.containsKey(country)){
stats.put(country, new CountryStats(country));
}
stats.get(country).cityFound(city);
collector.emit(new Values(country, stats.get(country).getCountryTotal(), city, stats.get(country).getCityTotal(city)));
}
}
13. create the CountryStats class
private class CountryStatus {
private int countryTotal = 0;
private static final int COUNT_INDEX = 0;
private static final int PERCENTAGE_INDEX = 1;
private String countryName;
private Map<String, List<Integer>> cityStats = new HashMap<String, List<Integer>>();
public CountryStats(String countryName) {
this.countryName = countryName;
}
public void cityFound(String cityName) {
countryTotal++;
if(cityStats.containsKey(cityName)){
cityStats.get(cityName).set(COUNT_INDEX, cityStats.get(cityName).get(COUNT_INDEX).intValue() + 1);
}
else {
List<Integer> list = new LinkedList<Integer>();
list.add(1);
list.add(0);
cityStats.put(cityName, list);
}
double percent = (double)cityStats.get(cityName).get(COUNT_INDEX)/(double)countryTotal;
cityStats.get(cityName).set(PERCENTAGE_INDEX, (int)percent);
}
public int getCountryTotal(){
return countryTotal;
}
public int getCityTotal(String cityName){
return cityStats.get(cityName).get(COUNT_INDEX).intValue();
}
}
14. the final counting for visitors and unique visitors
builder.setBolt("totalStats", new VisitStatsBolt(), 1).globalGrouping("repeatsBolt");
15. create the VisitStatsBolt class
public class VisitStatsBolt {
public void execute(Tuple tuple) {
boolean unique = Boolean.parseBoolean(tuple.getStringByField(storm.cookbook.Fields.UNIQUE));
total++;
if(unique) {
uniqueCount++;
}
collector.emit(new Values(total,uniqueCount));
}
}
Unit testing a bolt
1. create the StormTestCase class under src/text/java
package storm.cookbook;
public class StormTestCase {
protected Mockery context = new Mockery() {
setImposteriser(ClassImposteriser.INSTANCE);
};
protected Tuple getTuple() {
final Tuple tuple = context.mock(Tuple.class);
return tuple;
}
}
2. craete the TestRepeatVisitBolt class
@RunWith(value = Parameterized.class)
public class TestRepeatVisitBolt extends StormTestCase {
}
3. add the execute method
public void testExecute() {
jedis = new Jedis("localhost", 6379);
RepeatVisitBolt bolt = new RepeatVisibleBolt();
Map config = new HashMap();
config.put("redis-host", "localhost");
config.put("redis-port", "6379");
final OutputCollector collector = context.mock(OutputCollector.class);
bolt.prepare(config, null, collector);
final Tuple tuple = getTuple();
context.checking(new Expectations() {
oneOf(tuple).getStringByField(Fields.IP);
will(returnValue(ip));
oneOf(tuple).getStringByField(Fields.CLIENT_KEY);
will(returnValue(clientKey));
oneOf(tuple).getStringByField(Fields.URL);
will(returnValue(url));
oneOf(collector).emit(new Values(clientKey, url, expected));
});
bolt.execute(tuple);
context.assertIsSatisfied();
if(jedis != null) {
jedis.disconnect();
}
}
4. define the parameters
@Parameterized.Parameters
public static Collection<Object[]> data() {
Object[][] data = new Object[][] {
{ "192.168.33.100", "Client1", "myintranet.com", "false" },
{ "192.168.33.100", "Client1", "myintranet.com", "false" },
{ "192.168.33.101", "Client2", "myintranet1.com", "true" },
{ "192.168.33.102", "Client3", "myintranet2.com", false"}
};
return Arrays.asList(data);
}
5. add the base provisioning of the values using Redis
@BeforeClass
pubic static void setupJedis() {
Jedis jedis = new Jedis("localhost",6379);
jedis.flushDB();
Iterator<Object[]> it = data().iterator();
while (it.hasNext()) {
Object[] values = it.next();
if (values[3].equals("false")) {
String key = values[2] + ":" + values[1];
jedis.set(key, "visited");
}
}
}
create storm topologies and deploy to storm cluster.
Storm cluster <similar to> Hadoop cluster
topologies <whereas> MapReduce jobs
process forever < whereas> eventually finishes
master node => worker node
daemon<Nimbus> => JobTracker
distributing code
daemon<Supervisor> =>
storm jar all-my-code.jar org.apache.storm.MyTopology arg1 arg2
storm jar part takes care of connecting to Nimbus and uploading the jar.
Streams
=> unbounded sequence of tuples.
stream transformation => spouts and bolts, have interfaces
spout => source of stream
bolt => consume input stream, process and emit new stream
=> run functions, filter tuples, do streaming aggregation, do streaming join, talk to database
tuple => data model
Storm(1) - Setting Up Development Environment的更多相关文章
- storm环境搭建(前言)—— 翻译 Setting Up a Development Environment
Setting Up a Development Environment 搭建storm开发环境所需步骤: Download a Storm release , unpack it, and put ...
- Programming in Go (Golang) – Setting up a Mac OS X Development Environment
http://www.distilnetworks.com/setup-go-golang-ide-for-mac-os-x/#.V1Byrf50yM8 Programming in Go (Gola ...
- Establish the LAMP (Linux+Apache+MySQL+PHP) Development Environment on Ubuntu 14.04 Operating System
######################################################## Step One: Update the software package in yo ...
- Create A .NET Core Development Environment Using Visual Studio Code
https://www.c-sharpcorner.com/article/create-a-net-core-development-environment-using-visual-studio- ...
- Install Qualcomm Development Environment
安裝 Android Development Environment http://www.cnblogs.com/youchihwang/p/6645880.html 除了上述還得安裝, sudo ...
- The IBM Blockchain Platform:Installing the development environment
Follow these instructions to obtain the IBM Blockchain Platform: Develop development tools (primaril ...
- 1.3 PROGRAM DEVELOPMENT ENVIRONMENT
1.3 PROGRAM DEVELOPMENT ENVIRONMENT 1.4 WIN32 EXECUTEABLE FILE FORMAT We should also know that compl ...
- The Google Test and Development Environment (持续更新)
最近Google Testing Blog上开始连载The Google Test and Development Environment(Google的测试和开发环境),因为blogspot被墙,我 ...
- Building and setting up QT environment for BeagleBone
There are too few information available on how to easily setup QT environment for building Beaglebon ...
随机推荐
- 访问Google搜索,Google学术镜像搜索
Google学术镜像搜索:http://dir.scmor.com/google/ 不用FQ也能访问谷歌搜索网站,让我们一起Google 不用FQ也能访问谷歌搜索网站,让我们一起Google(摘自:h ...
- C#泛型专题系列文章目录导航
[C#泛型系列文章] 目录导航 第一部分:O'Reilly 出版的<C# Cookbook>泛型部分翻译 第一回:理解泛型 第二回:获取泛型类型和使用相应的泛型版本替换ArrayList ...
- php wamp 配置虚拟主机
apeach 配置: 还有是:E:\wamp\bin\apache\Apache2.4.4\conf 目录下有个 http.conf文件中,有一个需要取消注释, # Virtual hostsInc ...
- Android中的文件权限操作
默认本工程创建的文件本工程对其有读写权限. 我们可以通过context.openFileOutput("文件名", 模式): 我们可以创建私有, 共有, 只读, 只写文件, 默认的 ...
- Win7x64_chromeX86_相关路径
1. C:\Users\33\AppData\Local\Google 里面有2个文件夹:“Chrome”.“CrashReports” 2. C:\Program Files (x86)\Googl ...
- Win7_提高性能
1. 设置成经典主题:桌面->右键->个性化->经典主题 2. 计算机->右键->属性->高级系统设置 ==> 系统属性->高级->性能-> ...
- matplotlib库的常用知识
看看matplotlib是什么? matplotlib是python上的一个2D绘图库,它可以在夸平台上边出很多高质量的图像.综旨就是让简单的事变得更简单,让复杂的事变得可能.我们可以用matplot ...
- 统一事件源epoll代码示例
可以将信号注册进pipe管道的写端,通过对读端的监听,来实现统一事件源. #include <sys/types.h> #include <sys/socket.h> #inc ...
- hdu 5317 RGCDQ(前缀和)
题目链接:hdu 5317 这题看数据量就知道需要先预处理,然后对每个询问都需要在 O(logn) 以下的复杂度求出,由数学规律可以推出 1 <= F(x) <= 7,所以对每组(L, R ...
- Win7/8下提示OpenSCManager failed 拒绝访问
在我们日常使用命令行安装一些工具的时候经常提示如下错误提示,这是上市Win7或者Win8操作系统权限的原因 工具/原料 Win7,Win8操作系统 方法/步骤 1 以Win8为例,按WIN+Q ...