Skip to content

StreamSets Data Integration Blog

Where change is welcome.

DataOps Principles Start to Get Attention (thanks, Gartner!)

By August 7, 2018

The fact is, our founders started our organization on the foundation of DataOps principles and StreamSets was a DataOps company before the term was even coined in late 2015. oOr founders recognized the serious operational challenges that unstructured, streaming data and hybrid cloud infrastructures would pose to enterprises used to static, batch structured data integration. Since our inception, we’ve been focused on enabling teams to operationalize data movement and we created the StreamSets Data Integration Platform to empower customers to capitalize on a  DataOps approach.dataops-principles

Grab the DataOps guide now.

StreamSets Enhances its DataOps Platform

By August 6, 2018

Today, StreamSets has announced the immediate availability of StreamSets Data Collector 3.4.0 and StreamSets Control Hub 3.3.0. These enhancements are aimed at delivering a better and more connected cloud experience for users of the StreamSets Data Collector and a refined…

Synchronize HDFS Data into S3 Using the Hadoop FS Standalone Origin

By July 10, 2018

Introduction: from HDFS Data to S3

I am very excited to announce the new Hadoop FS Standalone origin in StreamSets Data Collector 3.2.0.0. Data Collector has long supported the Hadoop FS origin, but only in the cluster mode. The Hadoop FS (HDFS) Standalone origin does not need MapReduce or YARN installed and can run in multithreaded mode, with each thread reading one file at a time in parallel.

Using StreamSets and MapR Together in Docker

By June 26, 2018

Today's guest blogger is Ian Downard, a Senior Developer Evangelist at MapR Technologies. Ian focuses on machine learning and data engineering, and recently documented how he brought together the MapR Persistent Application Client Container (PACC) with StreamSets Data Collector and Docker to build pipelines…

Streaming Extreme Data Made Simple with Kinetica and StreamSets

By June 21, 2018

Kinetica, just one of dozens of origins and destinations supported by StreamSets Data Collector, is a distributed, in-memory, GPU database designed for geospatial analysis, machine learning, predictive analytics, and other workloads requiring high performance parallel processing. Mathew Hawkins, a Principal Solutions Architect at Kinetica, recently…

Back To Top