State of the Art Data Ingestion

State of the Art Data Ingestion

Forward-looking, data-driven enterprises increasingly leverage Big Data platforms, such as Hadoop, Elasticsearch and Amazon Web Services, to derive insights from non-­transactional, machine­-generated data. Many tools have emerged to power next ­generation data pipelines and provide specialized analytic capabilities. To get value from these technologies, data must reside in intermediate data stores in a consumable form. However, existing data integration tools do not offer the means to continuously extract data from the exploding variety of machine data sources and load into Big Data platforms in a consumption-­ready manner.

Consequently, next­-generation analytic applications end up implementing custom data ingestion processes in support of their specific use ­cases. Some enterprises have custom ­built data ingestion infrastructure ​to keep information flowing into Big Data platforms. These home­grown implementations typically leverage open source frameworks such as Apache Kafka, Flume, Spark Streaming, and Sqoop.

However, the data produced in source systems changes structure and format without notice. This often causes silent data corruption, data loss, or both. With increasing rates of data production, manual oversight to safeguard against such problems is economically neither viable, nor sustainable.

These lead to opaque, brittle and ad­-hoc implementations of domain­ specific logic for data movement. And such ingestion infrastructure is often constrained by the limitations of the chosen underlying frameworks, and the requirements of then­-known use­ cases. Such problems make today’s ingestion infrastructure a fragile part of otherwise robust Big Data environments, and end up become the bottleneck to delivering business value.

adaptable flows with streamsets

StreamSets Data Collector aims to address these shortcomings. We were inspired to reenvision data ingest from the ground up. It’s free and open source. Try it out at www.streamsets.com/opensource and let us know what you think.

Related Resources

Check out StreamSets white papers, videos, webinars, report and more.

Visit the Resource Library

Related Blog Posts

Receive Updates

Receive Updates

Join our mailing list to receive the latest news from StreamSets.

You have Successfully Subscribed!