Simple Kafka Enablement Using StreamSets (video)
You can simplify use of Kafka within your infrastructure using StreamSets Data Collector. Watch this short step-by-step tutorial to learn how.
You can simplify use of Kafka within your infrastructure using StreamSets Data Collector. Watch this short step-by-step tutorial to learn how.
We’re happy to announce a new version of the StreamSets Data Collector. This version has a number of bug fixes and - most importantly - support for Elasticsearch 2.x.
You can send log files to Elasticsearch using StreamSets Data Collector. Watch this short step-by-step to see how.
In my previous post I discussed the causes and impacts of data drift, a natural consequence of Big Data which creates serious data quality and data pipeline operational issues. Now I will describe the features of StreamSets Data Collector, how they address ingesting data in a “drifty” environment and describe some common use cases.
Big data has come a long way, with adoption accelerating as CIOs recognize the business value of extracting insights from the troves of data collected by their companies and business partners. But, as is often the case with innovations, mainstream adoption of big data has exposed a new challenge: how to ingest data continuously from any source and with high quality. Indeed, we have found that there are environmental causes that make it next to impossible to scale ingestion using current approaches, and this has serious implications for scaling big data projects.