The DataOps Blog
Where Change Is Welcome
The Next Chapter for StreamSets
Arvind Prabhakar and I co-founded StreamSets in 2014 with an audacious vision: data should be the lifeblood of the enterprise.…
Elasticsearch plus StreamSets for Reliable Data Ingestion
StreamSets Data Collector is open source software that lets you easily build continuous data ingestion pipelines for Elasticsearch. By being resistant to "data drift", StreamSets minimizes ingest-related data loss and helps ensure optimized indexes so that Elasticsearch and Kibana users can perform real-time analysis with confidence. See full post here.
Ingesting Streaming Data from JMS into HDFS and Solr using StreamSets
Now we’ll start publishing messages to a JMS queue. They are simple text messages with random words. Periodically the program outputs two types of bad records. Records without an id message header and records with empty content. We will use two of the StreamSets error handling facilities later on to catch these bad records.
Introducing the StreamSets Data Collector (video)
Wondering how the StreamSets Data Collector works? Have a look at this quick 4 minute introduction to the software.
What Is StreamSets?
This 2015 blog post has been updated. The original post is preserved below. StreamSets is a modern data integration platform dedicated to building the smart data pipelines needed to power DataOps across hybrid and multi-cloud architectures. StreamSets was founded in 2015 by a former Cloudera engineer and Informatica product leader to better manage data integration in the modern world. By…
State of the Art Data Ingestion
Forward-looking, data-driven enterprises increasingly leverage Big Data platforms, such as Hadoop, Elasticsearch and Amazon Web Services, to derive insights from non-transactional, machine-generated data. Many tools have emerged to power next generation data pipelines and provide specialized analytic capabilities. To get value from these technologies, data must reside in intermediate data stores in a consumable form. However, existing data integration tools do…