I would be lying if I said I planned for the introduction of StreamSets Summer ‘21 Beta to unfold exactly this way a year ago. But I won’t. A year ago, on this exact date, I joined the StreamSets team. I was excited about the opportunity at hand for StreamSets and hopeful about making a difference for our customers, partners and the market at large.
Today, I am thrilled to let you know that the next evolution of the StreamSets DataOps Platform is available in Beta. The release is currently in public Beta and is available for anyone to access immediately. Key highlights include the following:
- A single experience for all design patterns—including batch, streaming, CDC, ELT, ETL and machine learning—allowing data engineers to experience benefits for real world use cases in minutes. Additionally, data engineers can easily extend to sophisticated data engineering by constructing reusable fragments, templates, or dropping into code if needed.
- Mission control for the full data life cycle starting with building and testing data pipelines all the way through to operating and monitoring data pipelines. Data teams will benefit from being able to utilize a single pane of glass across on-premises, cloud or hybrid environments.
- Smart data pipelines that are resilient to changes in data schema, structure or deployment, allowing data engineers to eliminate 80% of maintenance and break-fix tasks.
Why StreamSets Summer ‘21?
There are many different factors that contributed to why this release came about. Here’s a distilled version:
Plethora of architectural patterns
The last couple decades have seen several technology shifts in the data industry, leading to a plethora of architectural choices that data engineers need to make. Do I go with ETL or ELT? Should I choose CDC or batch, or both, for ingestion? We use Kafka for streaming, should I modify our data pipelines to incorporate this? Which cloud storage do I use for my landing zone? The real answer, as always, is “it depends” on the use-case and the business needs. We believe for data engineers to truly shine and exceed the expectations of the business, they should not have to contend with loss of productivity that comes from context switching, using different tools for different patterns.
Data complexity, compounded
The pandemic accelerated the shift to the cloud in an almost “ludicrous mode” fashion. This has led to data teams across several verticals dealing with complex data architectures spread across cloud, on-premise and hybrid environments. This complexity becomes compounded by the fact that the schema and structure for the data produced everywhere in this landscape is highly dynamic.
Bar continually raised for data teams
Data teams responsible for taming this data chaos and making sense out of the data need to be able to adapt to this dynamic environment. In the concept of “remote everything”, the expectation on data teams by the business, if anything, has risen even more significantly over the past year. Despite frequent changes, data teams are expected to deliver to the business at the speed of need, and with confidence.
Rise in high scale cloud platforms
Besides the above perfect storm of factors stemming from overall industry dynamics, we had some internally-motivated reasons as well. Firstly, we needed to ensure that the platform was built on top of an infrastructure that would provide the scalability and performance we needed across multiple geographies and cloud platforms. Secondly, we wanted to increase the level of insights and analytics about our users and their use-cases. We knew that we had to engineer for greater robustness around understanding the impact of feature rollouts for our users, ultimately allowing us to become even more responsive to our user’s needs.
The Value of StreamSets Summer ‘21 to our Community
We are only just beginning, but we are excited by what our early beta testers have experienced and what our design partners have let us know directly.
Fast time to value with rapid on boarding. Summer ‘21 offers a rich library of curated sample pipelines, productivity hacks like live data preview, and built-in version control that make it easy to build, debug and deploy pipelines in minutes. We also flatten the learning curve for new users with reusable pipeline fragments and templates that 1 data engineer can use to empower 10s of ETL developers and 100s of analysts and data scientists. Some of our early beta testers have experienced rapid success with the product already, starting from product signup to running data pipeline jobs in under a few minutes.
Mission control for the full lifecycle. Summer ‘21 new features address separation of concerns across dev/test/prod environments and teams and facilitate team collaboration while still retaining centralized management. In addition, this provides a single pane of glass to manage multi-cloud and hybrid environments for building and deploying data pipelines. Feedback from our early design partners has been both positive and brutally honest:
“This is exactly what we need”
“I’m glad to see this now, but wish we had this a few months ago – so awesome to see this”
Single experience for different architectural patterns. With Summer ‘21 users will work with the exact same pipeline build, run, monitor experience for different architectural patterns – CDC, batch, real-time, machine-learning and the like. There is no context switching involved for users, as the pipeline designer and the end-to-end UI is applicable to the different patterns, based on the engine type that is chosen. This same framework will be extended in the future, when we roll out additional capabilities for other architectural patterns.
Reduced latency and better scalability. The cloud platform infrastructure level changes we have made provide reduced latency, improved user experience, flexibility, scalability and faster feature validation with A/B testing environments.
The Perfect DataOps Recipe: With These Key Ingredients
Several key ingredients contributed to the making of our StreamSets Summer ‘21 Beta release and the confidence we feel about what it can offer to our end users.
Obsession with our users
At StreamSets, we have institutionalized our user personas to the point where every single employee in the company knows who Ana (Data Engineer persona) and Vik (Platform Administrator persona) are. While there is a cast of characters in our persona library, for StreamSets Summer ‘21 specifically, Ana and Vik have shaped most of our thinking. They are highlighted in our feature specs. Our UX was designed for them. Ana and Vik are even “called in” to settle our design discussions and debates. Through this process, we have been working with several real-world “Anas” and “Viks” who have been our early stage design partners and are now active beta testers. We are grateful for their time and valuable feedback!
While we perform extensive qualitative interviews and research into our users and their needs, we also measure friction points and time to value for our users and their journey. A committed cross-functional team of StreamSetters check in every day into our internal analytics dashboard to understand if there are broad usage patterns we need to troubleshoot, identify areas to reduce friction and rapidly iterate back into the product. We ensure we are tangibly measuring what we are aiming for, which allows us to sharpen our focus and prioritize our efforts.
Overarching strategy plus execution rigor
Our overarching strategy is to help data teams rise to meet modern data challenges and accomplish their goals by applying DataOps tooling and best practices for data integration needs. We broke down this strategy into an actionable plan and applied some serious execution rigor to the plan. This rigor did not happen by chance. Our collective hundreds of years in the data industry coupled with operational excellence launching several dozens of SaaS releases allowed us to apply the battle-tested lessons we had already learned in real-world scenarios.
StreamSets Summer ’21 secret sauce
While the rigor we put into the making of this release speaks to the team’s top notch skills, our true secret sauce comes down to our winning culture and mindset that truly sets StreamSets apart as a great place to work. This allows for every individual on the team to bring their creative ideas and innovative best selves into work every single day.
The Future of DataOps in the Cloud: What’s Next?
We are just getting started! We are working hard to get the Summer ‘21 release ready for go-live late summer. We are also excited about the opportunity this platform has to offer for further rapid innovation to make data engineering teams wildly successful. Our recent announcement on StreamSets engine for Snowpark is one such initiative. Expect much more to come from the StreamSets team.
We are eager to show you what we have made so far in the Summer ‘21 Beta release. Sign up here and let us know what you think!