skip to Main Content

The DataOps Blog

Where Change Is Welcome

State of the Art Data Ingestion

By Posted in StreamSets News September 29, 2015

Forward-looking, data-driven enterprises increasingly leverage Big Data platforms, such as Hadoop, Elasticsearch and Amazon Web Services, to derive insights from non-­transactional, machine­-generated data. Many tools have emerged to power next ­generation data pipelines and provide specialized analytic capabilities. To get value from these technologies, data must reside in intermediate data stores in a consumable form. However, existing data integration tools do not offer the means to continuously extract data from the exploding variety of machine data sources and load into Big Data platforms in a consumption-­ready manner.

Consequently, next­-generation analytic applications end up implementing custom data ingestion processes in support of their specific use ­cases. Some enterprises have custom ­built data ingestion infrastructure ​to keep information flowing into Big Data platforms. These home­grown implementations typically leverage open source frameworks such as Apache Kafka, Flume, Spark Streaming, and Sqoop.

However, the data produced in source systems changes structure and format without notice. This often causes silent data corruption, data loss, or both. With increasing rates of data production, manual oversight to safeguard against such problems is economically neither viable, nor sustainable.

These lead to opaque, brittle and ad­-hoc implementations of domain­ specific logic for data movement. And such ingestion infrastructure is often constrained by the limitations of the chosen underlying frameworks, and the requirements of then­-known use­ cases. Such problems make today’s ingestion infrastructure a fragile part of otherwise robust Big Data environments, and end up become the bottleneck to delivering business value. There is a better way to support modern data ingestion.

StreamSets Data Collector, a fast data ingestion engine, aims to address these shortcomings. We were inspired to re-envision data ingest from the ground up. It’s free and open source. Try it out at and let us know what you think.

Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy