2016 | Page 6 of 6 | StreamSets

By Kirit Basu, Head of Strategy February 13, 2016

You can send log files to Elasticsearch using StreamSets Data Collector. Watch this short step-by-step to see how.

Continuous Ingest in the Face of Data Drift – Part 2 (from the Cloudera Vision Blog)

By Arvind Prabhakar February 9, 2016

In my previous post I discussed the causes and impacts of data drift, a natural consequence of Big Data which creates serious data quality and data pipeline operational issues. Now I will describe the features of StreamSets Data Collector, how they address ingesting data in a “drifty” environment and describe some common use cases.

Continuous Ingest in the Face of Data Drift (from the Cloudera Vision Blog)

By Arvind Prabhakar February 1, 2016

Big data has come a long way, with adoption accelerating as CIOs recognize the business value of extracting insights from the troves of data collected by their companies and business partners. But, as is often the case with innovations, mainstream adoption of big data has exposed a new challenge: how to ingest data continuously from any source and with high quality. Indeed, we have found that there are environmental causes that make it next to impossible to scale ingestion using current approaches, and this has serious implications for scaling big data projects.

Announcing StreamSets Data Collector 1.2.0.0

By Kirit Basu, Head of Strategy January 27, 2016

We are very excited to announce the next version of the StreamSets Data Collector. This version has seen over 250 JIRAs with a host of new features, performance enhancements and bug fixes.

Without further ado, here’s what’s new in 1.2.0.0:

StreamSets Monitoring with Grafana, InfluxDB, and jmxtrans

Data Integration

By Adam Kunicki January 14, 2016

The ability to monitor your critical infrastructure is a must, and we designed the StreamSets Data Collector (SDC) with this in mind: metrics are exposed through both the REST API and JMX. While there are many approaches to monitoring these metrics, let’s walk through a specific end-to-end example using jmxtrans to collect metrics, InfluxDB to store them, and Grafana to visualize them.

StreamSets Data Integration Blog

Continuous Ingest to Elasticsearch (video)

Continuous Ingest in the Face of Data Drift – Part 2 (from the Cloudera Vision Blog)

Continuous Ingest in the Face of Data Drift (from the Cloudera Vision Blog)

Announcing StreamSets Data Collector 1.2.0.0

StreamSets Monitoring with Grafana, InfluxDB, and jmxtrans

Stay in Touch

Connect