StreamSets Data Integration Blog
Where change is welcome.
AWS Reference Architecture Guide for StreamSets
Using StreamSets DataOps Platform To Integrate Data from PostgreSQL to AWS S3 and Redshift: A Reference Architecture This document describes…
Struggling with Bad Data? What to Do From the Enterprise
Last week we announced the results of a survey of over 300 enterprise data professionals conducted by Dimensional Research and sponsored by StreamSets. We were trying to understand the market’s state of play for managing their big data flows. What we discovered was that there is an alarming issue at hand: companies are struggling to detect and keep bad data…
Ingesting Sensor Data on the Raspberry Pi with StreamSets Data Collector
In the unlikely event you're not familiar with the Raspberry Pi, it's an ARM-based computer about the same size as a deck of playing cards. The latest iteration, Raspberry Pi 3, has a 1.2GHz ARMv8 CPU, 1MB of memory, integrated Wi-Fi and Bluetooth, all for the same $35 price tag as the original Raspberry Pi released in 2012. Running a…
Visualize StreamSets Data Collector Metrics with Datadog
Back in January, Adam blogged about StreamSets Monitoring with Grafana, InfluxDB, and jmxtrans. While the Grafana/InfluxDB/jmxtrans open source stack works great, there's quite a lot of setup and configuration to keep track of. Another tool that StreamSets customers are using to monitor their infrastructure is Datadog; in today's blog post I'll show you the basics of configuring Datadog with StreamSets…
Ingesting MQTT Traffic into Riak TS via RabbitMQ and StreamSets
Riak KV is an open source, distributed, NoSQL key-value data store oriented towards high availability, fault tolerance and scalability. With its initial release in 2009, Risk KV is in use at companies such as AT&T, Comcast and GitHub. Last October, Basho, the vendor behind Riak KV, announced Riak TS. Riak TS, another distributed, NoSQL data store, is optimized for time…
Analyzing Salesforce Data with StreamSets, Elasticsearch, and Kibana
UPDATE - Salesforce origin and destination stages, as well as a destination for Salesforce Wave Analytics, were released in StreamSets Data Collector 2.2.0.0. Use the supported, shipping Salesforce stages rather than the unsupported code mentioned below! After I published a proof-of-concept Salesforce Origin for StreamSets Data Collector (SDC), I noticed an article on the Elastic blog, Analyzing Salesforce Data with Logstash,…