Skip to content

StreamSets Data Integration Blog

Where change is welcome.

Using StreamSets Control Hub for Scalable Deployment via Kubernetes

By January 15, 2018

StreamSets, Docker, KubernetesIn Scaling out StreamSets with Kubernetes, I explained how to spin up Data Collectors as Kubernetes deployments along with Dataflow Performance Manager. I recommended using a deployment with one replica as the design environment and a deployment with many replicas for execution. We recently announced StreamSets Control Hub which makes the Kubernetes integration way smoother! StreamSets Control Hub adds a Control Agent for Kubernetes that supports creating and managing Data Collector deployments and a Pipeline Designer that allows designing pipelines for Kubernetes without having to install Data Collectors. In this blog, I will demonstrate how to take advantage of these features.

 

Streaming Data from Twitter for Analysis in Spark

By January 10, 2018

FootballHappy New Year! Our first blog entry of 2018 is a guest post from Josh Janzen, a data scientist based in Minnesota. Josh wanted to ingest tweets referencing NFL games into Spark, then run some analysis to look for a correlation between Twitter activity and game winners. Josh originally posted this entry on his personal blog, and kindly allowed us to repost it here. Over to you, Josh:

Tis the season of NFL football, and one way to capture excitement is Twitter data. I’ve tickered around with Twitter’s Developer API before, but this time I wanted to use a streaming product I’ve heard good things about: StreamSets Data Collector.

Back To Top