Upcoming Webinar: How Cox Automotive Democratized Data


Presented by: Nate Swetye and Michael Gay, Cox Automotive

Hear straight from data experts at Cox Automotive (brands such as AutoTrader, KBB and Mannheim) on how they created a self-service data exchange ingesting hundreds of data sources for their 25 companies, improving data availability from weeks to hours and reducing developer time by 90%.

Enabling Next Gen Analytics at a Major Bank with Microsoft Azure & StreamSets

Presented by: Pranav Rastogi, Microsoft, Krishna Venkataram, Microsoft & Kirit Basu, StreamSets

In this webinar, we discuss how StreamSets Data Collector works in concert with Microsoft Azure Data Lake and will show you how a major bank is using StreamSets to transport their on-premise data to the Azure Cloud Computing Platform to take advantage of analytics tools with unprecedented scale and performance.

How to Manage Continuous Dataflows for a Customer 360 Application

Presented by: Kirit Basu, Director of Product Management, StreamSets & Will Ochandarena, Sr.Director of Product Management, MapR

Enjoy this presentation on the challenges companies face building out and maintaining Customer 360 apps,
StreamSets tools for building and operating continuous data ingest from a variety of customer interaction sources and MapR's platform for Customer 360, closing with retail bank case study for gaining insights into customer profitability.

Take Control of Your Dataflow Operations: Introducing StreamSets Dataflow Performance Manager

Presented by: Arvind Prabhakar, CTO, StreamSets and Kirit Basu, Director of Product Management, StreamSets

Join StreamSets CTO Arvind Prabhakar and Director of Product Management Kirit Basu as they introduce and demonstrate StreamSets Dataflow Performance Manager, the first solution to manage operations of a company’s end-to-end dataflows within a single pane of glass.

Easy Big Data Ingestion into Microsoft Azure HDInsight

Presented by: Kirit Basu, Director of Product Management, StreamSets
Pranav Rastogi, Sr. Program Manager, Microsoft Azure HDInsight

Join StreamSets' Kirit Basu and Microsoft's Pranav Rastogi as they discuss the challenges and best practices associated with developing and operating ingest pipelines for cloud or hybrid environments and demonstrate end-to-end dataflows from streaming and batch sources into Microsoft Azure HDInsight using StreamSets Data Collector.

Evolving ETL in the Face of Data Drift

Presented by: Arvind Prabhakar, CTO, StreamSets
Pat Patterson, Community Champion, StreamSets

ETL, the process of Extracting data from a source, Transforming it and then Loading it into a target data store, is a perennial data movement activity that has recently become quite challenging with the emergence of big data and the new problem of data drift, those frequent changes to schema and semantics that occur due to unexpected updates to big data source systems. In this webinar, Arvind Prabhakar, co-founder and CTO of StreamSets, and Pat Patterson, StreamSets' community champion, will:

  • Explain the mechanics of how traditional data integration tools execute the ETL process.
  • Discuss the challenges big data sources like logs and IoT sensors create for these legacy approaches.
  • Describe intent-driven ingest, a new method designed to overcome the challenge of data drift.

Comparing Open Source Big Data Ingest Options

Presented by: Pat Patterson, Community Champion, StreamSets

In this webinar, Pat Patterson, Community Champion for StreamSets, will walk you through the various open source options for ingesting big data including Flume, Sqoop, NiFi and StreamSets. For each open source project he will:

  • Describe its features
  • Explain how it operates
  • Discuss the pros and cons of that option

Pat will conclude with a short demo of StreamSets Data Collector, showing how it can consume, transform and write data to a variety of destinations for analysis.

A New Paradigm for Managing Data in Motion

Presented by: Jason Stamper, Senior Analyst, 451 Research
Arvind Prabhakar, CTO, StreamSets

Big data operations are built on numerous flows of perishable data that need to travel from a variety of often untraditional, uncurated and unstable sources, through a fabric of transport, storage and compute components and into multiple analytic applications. The complexity of this new environment for data in motion creates a pressing issue: how does a company ensure that the sum total of the data flowing across a business is complete and accurate, and yet still fresh?

This is a management problem that requires a new paradigm and organizational discipline around the performance management of data flows. In this webinar, 451 Analyst Jason Stamper and StreamSets CTO Arvind Prabhakar will discuss:

  • The state of play for managing data in motion today and the need for adopting a ‘data performance management’ paradigm.
  • The objectives and key principles for such a data performance management system.
  • Practical advice for building a performance management practice in your organization today

How to Build Continuous Ingestion for the Internet of Things

Cloudera & StreamSets

IoT creates a new challenge: how to build and operationalize continual data ingestion from such a wide and ever-changing array of endpoints so that the data arrives consumption-ready and can drive analysis and action within the business.

In this webinar, Sean Anderson from Cloudera and Kirit Busu, Director of Product Management at StreamSets, will discuss Hadoop's ecosystem and IoT capabilities and provide advice about common patterns and best practices. Using specific examples, they will demonstrate how to build and run end-to-end IOT data flows using StreamSets and Cloudera infrastructure.

Recipes for Success: How to Build Continuous Ingest Pipelines

Presented by: Arvind Prabhakar, CTO, StreamSets

Modern data infrastructures are fed by vast volumes of data, streamed from an ever-changing variety of sources. Standard practice has been to store the data as ingested and force data cleaning onto each consuming application. This approach saddles data scientists and analysts with substantial work, creates delays getting to insights and makes real-time or near-time analysis practically impossible.

In this session you will discover:

  • recipes for building automated ingest pipelines that implement continual in-stream sanitization so that data lands in stores ready to consume, regardless of the complexity of collecting it.
  • methods for making your pipelines resistant to data drift – the inevitable changes in schema, semantics and infrastructure that break pipelines.
  • open source tools that allow you to create and maintain these pipelines with little to no hand coding.

Streaming Big Data with StreamSets and Cloudera

How to Implement Simple, Reliable, Continuous Data Delivery

Please join StreamSets CTO Arvind Prabhakar and Cloudera Director of Product Management Matthew Schumpert as they discuss:

  • The technology improvements that enable end-to-end data streaming in the enterprise.
  • The development and operational challenges of incorporating real-time streams into your data management processes, with a focus on data drift.
  • Best practices for stream processing with StreamSets Data Collector and Cloudera Enterprise, including tools like Apache Kafka and Apache Spark Streaming.

Case Study: Ingesting Diverse Data into Elasticsearch

Learn How Cisco Intercloud Services Performs Multi-Datacenter Log Ingest

  • The Cisco Intercloud Fabric is designed to help enterprises to create a seamless hybrid cloud by transparently extending their data centers or private clouds into public clouds and provider-hosted clouds.
  • The challenge for Cisco Intercloud Services is to manage service performance across the fabric, which requires efficient and reliably ingestion, processing and monitoring of real-time data from numerous data centers.
  • In this webinar Dimitri Chtchourov from Cisco along with experts from StreamSets and Elastic will discuss how Cisco Intercloud Services and the MANTL Data Platform use StreamSets Data Collector plus Elasticsearch, Logstash and Kibana to manage ingest of internal operational and multi-data center logs with low latency, high reliability and intelligent monitoring.