The StreamSets DataOps Platform was architected to scale to the largest workloads, particularly when working with continuous streams of data from systems such as Apache Kafka or Apache Pulsar. As well as the ability to scale, the platform offers a number of deployment options, allowing you to trade off complexity, performance, and cost. This blog […]
Apache Pulsar is an open-source distributed pub-sub messaging system originally created at Yahoo, and a top-level Apache project since September 2018. StreamSets Data Collector 3.5.0, released soon after, introduced the Pulsar Consumer and Producer pipeline stages. In this blog entry I’ll explain how to get started creating dataflow pipelines for Pulsar.
Minneapolis-based phData has long been a StreamSets partner, deploying the StreamSets DataOps Platform at customers across the US. It’s not surprising then, that when phData principal solutions architect Keith Smith wanted to integrate the Ethereum blockchain platform with the Apache Hadoop filesystem and Apache Kudu, he reached for StreamSets Data Collector.
Jowanza Joseph is a principal software engineer at One Click Retail with long experience of building reliable and performant distributed data systems. Recently, Jowanza built a pair of data pipelines with StreamSets Data Collector to read data from Ford GoBike and send it to MapD via Kafka. It’s a great example of Data Collector’s versatility in dealing […]
Splunk is the tool-of-choice for many enterprises mining insights from machine-generated data such as server logs, but one problem with the default tools is that there is no way to filter the data as it is fed into Splunk. It’s easy to fill Splunk with redundant or irrelevant data, driving up costs without adding value. […]
June 2018 marked the fourth anniversary of StreamSets’ founding; here’s a look back at the past four years of StreamSets and the Data Collector product, from the early days in stealth-startup mode, to the recent release of StreamSets Data Collector 3.4.0. Girish Pancha and Arvind Prabhakar founded StreamSets on June 27th, 2014. Girish had been […]
A microservice is a lightweight component that implements a relatively small component of a larger system – for example, providing access to user data. A microservice architecture comprises a set of independent microservices, often implemented as RESTful web services communicating via JSON over HTTP, that together implement a system’s functionality, rather than a single monolithic […]
StreamSets solutions architect Alex Woolford is a data engineer with deep experience building robust and scalable solutions using technologies such as the StreamSets DataOps Platform, Apache Kafka, and the Cloudera and Hortonworks Hadoop distributions. In his role at StreamSets, Alex provides our customers with expertise including architecture design, demonstration systems, prototypes, presentations, and product configurations. […]
Following on from last week’s guest post from MapR’s Ian Downard on integrating StreamSets Data Collector with MapR Persistent Application Client Container (PACC), MapR Distinguished Technologist John Omernik offers a cautionary tale on examining your assumptions before jumping into the world of Docker. We repost John’s original article here with his kind permission. Since starting at MapR […]
Today’s guest blogger is Ian Downard, a Senior Developer Evangelist at MapR Technologies. Ian focuses on machine learning and data engineering, and recently documented how he brought together the MapR Persistent Application Client Container (PACC) with StreamSets Data Collector and Docker to build pipelines for ingesting data into the MapR Converged Data Platform. We’re reposting Ian’s article here, with his […]