StreamSets Partners

Hadoop meets Blockchain: Trust your (Big) Data

Minneapolis-based phData has long been a StreamSets partner, deploying the StreamSets DataOps Platform at customers across the US. It’s not surprising then, that when phData principal solutions architect Keith Smith wanted to integrate the Ethereum blockchain platform with the Apache Hadoop filesystem and Apache Kudu, he reached for StreamSets Data Collector.

StreamSets Congratulates Voya Financial And Other Cloudera Data Impact Award Winners

StreamSets has a rich tradition of partnering with Cloudera to highlight companies that are pushing the possible with data and advanced analytics.  The Data Impact Awards is an annual event that recognizes the best organizations with data at the center of their strategy to impact the business’ bottom line, and better the world. Last year, […]

Using Docker Wrong: My Journey to a Better Container

Following on from last week’s guest post from MapR’s Ian Downard on integrating StreamSets Data Collector with MapR Persistent Application Client Container (PACC), MapR Distinguished Technologist John Omernik offers a cautionary tale on examining your assumptions before jumping into the world of Docker. We repost John’s original article here with his kind permission. Since starting at MapR […]

Using StreamSets and MapR Together in Docker

Today’s guest blogger is Ian Downard, a Senior Developer Evangelist at MapR Technologies. Ian focuses on machine learning and data engineering, and recently documented how he brought together the MapR Persistent Application Client Container (PACC) with StreamSets Data Collector and Docker to build pipelines for ingesting data into the MapR Converged Data Platform. We’re reposting Ian’s article here, with his […]

Streaming Extreme Data Made Simple with Kinetica and StreamSets

Kinetica, just one of dozens of origins and destinations supported by StreamSets Data Collector, is a distributed, in-memory, GPU database designed for geospatial analysis, machine learning, predictive analytics, and other workloads requiring high performance parallel processing. Mathew Hawkins, a Principal Solutions Architect at Kinetica, recently wrote an excellent tutorial on integrating Data Collector with Kinetics. We repost it here with […]

Speed up Hive Data Retrieval using Spark, StreamSets and Predera

In this guest blog, Predera‘s Kiran Krishna Innamuri (Data Engineer), and Nazeer Hussain (Head of Platform Engineering and Services) focus on building a data pipeline to perform lookups or run queries on Hive tables with the Spark execution engine using StreamSets Data Collector and Predera’s custom Hive-JDBC lookup processor.

Getting Started with Cloudera’s Cybersecurity Solution (feat. StreamSets, Arcadia Data and Centrify)

This post was originally published on the Cloudera VISION blog by Sam Heywood.   StreamSets configurations and images of Apache Spot Open Data Model ingest pipelines can be found here on Github. A quick conversation with most Chief Information Security Officers (CISOs) reveals they understand they need to modernize their security architecture and the correct answer […]

Installing StreamSets Data Collector on Amazon Web Services EC2

Mike Fuller, a consultant at Red Pill Analytics, recently wrote Stream Me Up (to the Cloud), Scotty, a tutorial on installing StreamSets Data Collector (SDC) on Amazon Web Services EC2. Mike’s article takes you all the way from logging in to a fresh EC2 instance to seeing your first pipeline in action. We’re reposting it here courtesy of […]

Visualizing NetFlow Data with StreamSets Data Collector, Kudu, Impala and D3

Sandish Kumar, a Solutions Engineer at phData, builds and manages solutions for phData customers. In this article, reposted from the phData blog, he explains how to generate simulated NetFlow data, read it into StreamSets Data Collector via the UDP origin, then buffer it in Apache Kafka before sending it to Apache Kudu. A true big data enthusiast, Sandish spends […]

Creating a Post-Lambda World with Apache Kudu

Apache Kudu and Open Source StreamSets Data Collector Simplify Batch and Real-Time Processing As originally posted on the Cloudera VISION Blog. At StreamSets, we come across dataflow challenges for a variety of applications. Our product, StreamSets Data Collector is an open-source any-to-any dataflow system that ensures that all your data is safely delivered in the […]

Receive Updates

Receive Updates

Join our mailing list to receive the latest news from StreamSets.

You have Successfully Subscribed!