Customer Success

Online Media Company

Challenge

A leader in digital media needed real-time personalization to improve recommendations, increase engagement and maximize revenue.

Solution

StreamSets ingests, sanitizes and scores data from Omniture, external ad platforms and internal databases to deliver personalization to a variety of online media properties.

Benefits

  • Clickstream analytics reduced from 24 hours to minutes.
  • Self-service” data flows for data scientists to improve popularity ranking algorithms.
  • “Event Firehose” to enable rapid onboarding and sharing across all properties.

StreamSets has dramatically improved time to analysis and reliability of our data science efforts around revenue, quality control and ad inventory for our websites.

Senior Director, Data Science

Automotive Company

Challenge

Build ingestion framework that un-silos data from 30+ brands and business units to inform business decisions and spur innovation across the enterprise.

Solution

An elastic, hub-and-spoke model for on-demand ingestion of new data sources.  Data is auto-discovered, and exposed in Hive tables and archived to Amazon S3. StreamSets deals with data drift and arbitrarily complex data types.

Benefits

  • Scales across geographically dispersed business units.
  • Removes IT as a bottleneck to on-boarding data.
  • Proactive alerts around data drift.

We chose StreamSets as our enterprise-wide standard for our next generation “any-to-any” data flow infrastructure because of their singular focus on solving operations and deployment challenges, and their product roadmap focus on Dataflow Performance Manager.

VP, Enterprise Data Services

Software-as-a-Service Leader

Challenge

Build a new enterprise message fabric that aggregates data from distributed enterprise community instances into a single “community fire hose”

Solution

StreamSets ingests and sanitizes data from hundreds of community logs to Apache Kafka and concurrently move aggregated data between Kafka and Amazon Kinesis.

Benefits

  • 2 TB/day passed with end-to-end transit time of <15 seconds
  • Deployed 6 months faster vs. a hand-coded solution.
  • Real-time monitoring to detect and drill into any issues. Real-time data availability opens up innovative analyses.

We analyze behavior of the 100 million+ visitors who cross our platform monthly and log more than 12 billion daily interactions, all to make sure we are continually improving the experience for customers. StreamSets is the centerpiece of our enterprise message fabric. It allows us to easily ingest and route terabytes of log data daily into a unified community firehose and actively performance manage the latency and quality of these data flows.

Chief Technology Officer

Government Agency

Challenge

Make cybersecurity and intelligence information data available quickly to all users despite growing variety of sources, types and destinations.

Solution

StreamSets passes hundreds of thousands of records per second from numerous sources through Kafka and into HDFS, HBase, Kudu and Spark Streaming. Heavy analysis then performed in Impala.

Benefits

  • Time to value — allow people who need the data to act on it quickly. Legacy system took weeks to add new data sources. Now takes less than a day.
  • No dedicated cluster. Legacy solution required cluster larger than our Hadoop store.
  • Ability to scale and leverage existing investment in cloud.
  • Versatility to accommodate different data types and streams, monitor data quality and gracefully handle change to data.
  • No more hand coding required; you don’t need to be developer to use StreamSets.
melissaCustomer Success