A step-by-step walkthrough of how Mac Noland implemented StreamSets to move away from hand-coded ETL and scale out an increasingly complex ingestion pipeline. Mac is a Solution Architect for phData, a Twin Cities services firm focused on Hadoop. He has spent 17 years as a software engineer and architect for projects in the legal, accounting, risk and medical device industries.
Watch StreamSets Field Engineer Jonathan “Natty” Natkins demonstrate how you can use the open source StreamSets Data Collector to flexibly handle painful “data drift” – the inevitable evolution of infrastructure, semantics and schema that leads to corrupted data and broken pipelines. Download Open Source StreamSets Data Collector at www.streamsets.com/opensource.
StreamSets is an open source, enterprise-grade, continuous big data ingest infrastructure that accelerates time to analysis by bringing unprecedented transparency and processing to data in motion. Watch co-founders Girish Pancha (CEO) and Arvind Prabhakar (CTO) talk about the problems they are trying to solve with StreamSets.
Forward-looking, data-driven enterprises increasingly leverage Big Data platforms, such as Hadoop, Elasticsearch and Amazon Web Services, to derive insights from non-transactional, machine-generated data. Many tools have emerged to power next generation data pipelines and provide specialized analytic capabilities. To get value from these technologies, data must reside in intermediate data stores in a consumable form. However, existing […]
Arvind had come to realize over his four year career at Cloudera that the best practice for most customers ingesting data into Hadoop was manually coding data processing logic and orchestrating them using open source frameworks. I was flabbergasted! As Chief Product Officer at Informatica, I had spent more than a dozen years at Informatica […]