Top 25 Reactive Big Data (JVM) Resources and Articles

Ingestion system in Scala/Akka/Spray/… seems like a pumped up super Flume
Smack Stack (Mesos + )
Avro Resources
Kite works with avro
Key Features:
  • Row based with schema
  • schema in the file
  • schema is json
  • block compression, splittable files
  • schema evolution
Spark : @databricksMesos : @mesosphere Akka : @typesafe Cassandra : @DataStax Kafka : @ConfluentInc
Akka/Avro Persistence?
Akka / Event Sourcing
Apache Gora
Java 8 Streams
Do we store our Avro in HBase?
Apache Flume for Avro Consumption
Do we store and process Avro files on HDFS?
Replace Chron and Quartz
RDS Postgres or EC2 Postgres
jHiccup (JVM check)

Apache Drill with AVRO and MongoDB

Drill with REST and JDBC -> AVRO, MONGO
Set up Zookeeper based cluster
firewall was blocking it < watch these ports
select * from dfs.`/Users/timothyspann/Downloads/apache-drill-1.1.0/sample-data/region.parquet`
SELECT * FROM cp.`employee.json` LIMIT 20
show databases;
use mongo.`security-mydatabase`;
show tables;
Mongo Issue:   I have similar observations. At the moment I can only query elements that are strings. Numbers and dates do not work.
This is Drill 1.1 with Mongo 3.0 issue
bin/sqlline –u jdbc:drill:schema=dfs;zk=local

Test Data and Generated Data    <–  nice one   <- really but they receive email

Monday Cool Links

Monitoring and Metrics



Servers and Services

DevOps Techniques, Tools and Processes



Big Data

Tools for Crushing High Availability and Scalability, Part I

One tip that seems obvious, look at what the Internet innovators, large scale startups are doing.   They have open sourced many of their tools.   Square, Netflix, Google, Twitter, Facebook, LinkedIn,

I.  Data in Memory

Persistent Key-Value Store for Java from the excellent Java Advent calendar for 2014.

NoSQL Key-Value stores that run in-memory like Redis are incredibly helpful for scalability.   Memcached is also an option.   Gemfire is insanely powerful and scales across WANS, it runs some huge transaction sites and is awesome with Java.   India Railways is an incredibly interesting example of scaling on in-memory data grids.

II.  Microservices / 12 Factor Apps on a PaaS

It’s hard to scale a lumbering beast, easier when you have a swarm of agile services.    With each service having an API that can be shared internally or externally you easily open your architecture to extension and usage from outside sources which is key in growing them.   This has worked well for Uber, Google, Facebook, Twitter and a host of others.

The Platform to Enable Microservices:   An Open PaaS.   With the Linux Foundation running it, CloudFoundry is the open choice, it’s the Tomcat of the PaaS.  Extensible, Standard, Fast, Flexible, Elastic and Open — CloudFoundry Rocks.

III.  Reactive Programming

Using one of the reactive frameworks and following the Reactive Manifesto really help drive your scalable apps.  Take a look at:


Part II – Draft

IV.  Rapid Data Ingest

A key piece of the HA puzzle is having the data you need instantly available and

V.  Functional

Whether it’s Scala, Map Reduce, Apache Spark, Groovy or Java 8 with Lambdas – functional programming is bringing new ways of increasing performance and solving big data and real-time programming issues.

VI.   Message Driven

Disconnected services and running asynchronously greatly improves scalability and keeps one weak link from breaking your data chain.

VII.  Netflix Architecture / Spring Cloud Netflix

Internet pioneer Netflix has created a number of amazing tools that keep applications scaling, failing fast and recovering.   These tools have been augmented with Spring and are coming to the PaaS.   These have with critical portions like Circuit Breakers and discovery.   These are very important to manage all the microservices in your architecture.   There are also great tools from running on AWS.

VIII.  DevOps

VIV.  Containerization

X.  Continuous Delivery

Having a tool that builds your applications from Git and runs automated tests and static analysis is very critical.

Canary Deploys

Blue-Green Deploys

A/B Testing

Circuit Breaker for Inter-Service Communication and External Service Access

Reactive Streams / Asynchronous Streams

XI.  Responsive Front-Ends / Single Page Applications

WebSockets with SockJS, Stomp.  HTML5.  AngularJS / Backbone / Ember / …

XII.  Polyglot Programming

Play, MEAN, Spring Boot, DropWizard, Ratpack, Sinatra, Rails.   There are many different frameworks for building modern web applications.

XIII.  Front-End Tooling

Gulp, Grunt, Bower, …

XIV.  Clustering

Yarn, CloudFoundry, Kubernetes.

XV.  Native Applications

IoS and Android applications have more sensors, better user experience and superior look and feel.   There are awesome tools to enable doing this development faster and cleaner.

XVI.  Using Netty

XVIV.  API Documentation with Swagger


Big Data Links (16-May-2014) – Hadoop, Redis, SQLFire, HDFS, Pivotal HD

Map Reduce