It’s been a little over a year (9/24/15) since we launched StreamSets Data Collector as an open source project. For those of you unfamiliar with the product, it’s any-to-any big data ingestion software through which you can build and place into production complex batch and streaming pipelines using built-in processors for all sorts of data transformations.
We’re thrilled to announce that as of last month StreamSets Data Collector had been downloaded by over ⅓ of the Fortune 100! That’s several dozen of the largest companies in the U.S. And downloads of this award-winning software have been accelerating, with over 500% growth in the quarter ending in October versus the previous quarter.
In fact, this is probably a substantial understatement as we only know the corporate identity of a small sliver of the large number of developers who have downloaded the software.
Amongst the Fortune 500 companies where we have experienced download activity, the industry breakdown is interesting. As shown below, the largest single sector is financial services, accounting for 36% of the identified companies. Within this sector there is heavy representation from banks, credit institutions and insurance companies.
Following the financial folks are technology companies with 17% of the total, healthcare companies (11%) and services companies (8%). Other industries represented include media, energy, apparel, consumer package goods and even agriculture. This broad cross-section of the economy is a consequence of both the widening adoption of big data as well as the flexibility of StreamSets Data Collector to enable a diverse array of dataflow use cases including IoT, customer 360, cybersecurity, cloud migration and architectural modernization.
If you’re one of the many who has downloaded StreamSets Data Collector we thank you for giving us a try; we’re honored to help you make the most of your data in motion. For those of you have yet to try, you can download it here. And also take a look at SDC’s companion product, StreamSets Control Hub, which helps you operationalize production of large number of data pipelines, helping you to create a well-managed dataflow operation at your company.