Skip to content

StreamSets Data Integration Blog

Where change is welcome.

Scaling out StreamSets with Kubernetes

By July 14, 2017

StreamSets, Docker, KubernetesUPDATE – Since this blog post was written, StreamSets Control Hub added a Control Agent for Kubernetes that supports creating and managing Data Collector deployments and a Pipeline Designer that allows designing pipelines without having to install Data Collectors. This blog entry has full details: Using StreamSets Control Hub for Scalable Deployment via Kubernetes.

In today’s microservice revolution, where software applications are designed as independent services that work together, two technologies stand out. Docker, the defacto standard for containerization, and Kubernetes, a container orchestration and management tool. In this blog I will explain how to run StreamSets Data Collector (SDC) Docker containers on Kubernetes.

Cache Salesforce Data in Redis with StreamSets Data Collector

By July 6, 2017

Redis LogoRedis is an open-source, in-memory, NoSQL database implementing a networked key-value store with optional persistence to disk. Perhaps the most popular key-value database, Redis is widely used for caching web pages, sessions and other objects that require blazingly fast access – lookups are typically in the millisecond range.

At RedisConf 2017 I presented a session, Cache All The Things! Data Integration via Jedis (slides), looking at how the open source Jedis library provides a small, sane, easy to use Java interface to Redis, and how a StreamSets Data Collector (SDC) pipeline can read data from a platform such as Salesforce, write it to Redis via Jedis, and keep Redis up-to-date by subscribing for notifications of changes in Salesforce, writing new and updated data to Redis. In this blog entry, I’ll describe how I built the SDC pipeline I showed during my session.

Back To Top