Adaptive Data Cleansing with StreamSets and Cassandra
Thursday, September 8, 2016
Cassandra is a perfect fit for consuming high volumes of time-series data directly from users, devices, and sensors. Sometimes, though, when we consume data from the real world, systematic and random errors creep in. In this session, we'll see how to use open source tools like RabbitMQ and StreamSets Data Collector with Cassandra features such as User Defined Aggregates to collect, cleanse and ingest variable quality data at scale. Discover how to combine the power of Cassandra with the flexibility of StreamSets to implement adaptive data cleansing.