SAN FRANCISCO – September 24, 2015 – StreamSets Inc., a company that speeds access to enterprise big data, today announced it has closed a $12.5 million round of Series A funding co-led by experienced big data investors Battery Ventures and New Enterprise Associates (NEA), with participation from Accel Partners and Ignition Partners. In addition, StreamSets today launched a revolutionary new data ingest infrastructure, called StreamSets Data Collector, which helps businesses accelerate data analysis and decision-making. Available under an open source Apache license (ALv2), this technology automates data movement in order to give data scientists and analysts continuous access to big data.
As companies’ data explodes, operators are spending more time sanitizing raw data before it can be used to inform business decisions. This is because the current data environment contains constantly changing infrastructure and semantics, which slows down the process of collecting and moving data so it can be used for reliable analytics — the problem of “data drift.” StreamSets ingests, cleanses and monitors data in motion to address this challenge and fuel real-time analysis.
“We invested in StreamSets because of the team’s expertise in delivering an enterprise-grade data management platform that enables timely operational decisions,” said Dharmesh Thakker, general partner at Battery Ventures.
“There is a massive opportunity for StreamSets’ technology to bring world-class transparency and monitoring to data — the next generation of performance management in enterprise IT,” added Pete Sonsini, general partner at NEA.
StreamSets co-founder and CEO Girish Pancha was previously chief product officer at Informatica, where he was responsible for the company’s entire data integration product portfolio. Co-founder Arvind Prabhakar was an early employee of Cloudera, where he led teams working on integration technologies such as Apache Flume and Apache Sqoop. A member of the Apache Software Foundation, Arvind is heavily involved in the open-source community, and was formerly an architect for the Informatica platform.
“Over the years, Arvind and I have seen first hand that the single biggest barrier to a successful enterprise analytics platform is the challenge of ingesting data. That problem is exacerbated when the data is constantly shifting underfoot,” said Girish Pancha, StreamSets co-founder and CEO. “Current solutions are simply too opaque and brittle to handle a fluid data landscape. We were inspired to start over from the ground up and bring unprecedented transparency and event processing to data in motion.”
Lithium Technologies’ message fabric uses StreamSets to enhance customer experience by enabling near real-time message flow across its Total Community social software suite. In addition, Cisco will use StreamSets to more easily uphold its software-as-a-service (SaaS) customers’ expectations of data pipeline flexibility and uptime.
“We are constantly adding products and services to our Intercloud offering,” said Ken Owens, chief technology officer of cloud services at Cisco Systems. “StreamSets automatically handles such infrastructure changes, and provides intelligent monitoring and dynamic shaping of our internal operational log as well as multi-datacenter data ingestion logs to help us meet strict service level agreements for our development team as well as future customers.”
StreamSets developed this enterprise-grade data infrastructure to support data-intensive applications that rely on several disparate sources of real-time, streaming and batch data from both machine-generated feeds and transactional enterprise systems. As opposed to traditional schema-centric approaches, StreamSets leverages intent-driven machine learning techniques to automatically validate and continuously prepare all of this data for consuming applications. This approach saves DevOps teams the work of building, operating and maintaining custom-coded solutions.
Starting today, data infrastructure teams can download the open source StreamSets Data Collector software and join the community at streamsets.com/community, or purchase a commercial subscription license for development or production support. StreamSets will use its Series A funding to build a thriving open source community, advance the company’s product roadmap, and incrementally invest in partnerships and other go-to-market activities. In addition, Pete Sonsini from NEA and Dharmesh Thakker from Battery Ventures will join the company’s board of directors.
Founded in 2014, StreamSets provides data ingest technology for the next generation of big data applications. Its enterprise-grade infrastructure accelerates data analysis and decision-making by bringing unprecedented transparency and event processing to data in motion. The company was founded by Girish Pancha, a long-time executive and former chief product officer of Informatica, and Arvind Prabhakar, an early employee and engineering leader at Cloudera. StreamSets is headquartered in San Francisco, and backed by top-tier Silicon Valley venture capital firms and angel investors, including Accel Partners, Battery Ventures, Ignition Partners and New Enterprise Associates (NEA). For more information, visit streamsets.com.