Create Apache Spark applications with new drag and drop UI
|Christian Hargrave in Enterprise Monday, September 9, 2019|
Want to create Apache Spark applications with a drag and drop UI? StreamSets launches StreamSets Transformer, announces the beta of StreamSets Cloud and their vision of conquering data in the age of modern analytics.
StreamSets, Inc., provider of a DataOps platform for modern data integration, has released StreamSets Transformer, a simple-to-use, drag-and-drop UI tool to create native Apache Spark applications. Designed for a wide range of users, even those without specialized skills, StreamSets Transformer enables the creation of pipelines for performing ETL, stream processing and machine-learning operations. Now, data engineers, scientists, architects, and operators gain deep visibility into the execution of Apache Spark while broadening usage across the business.
Apache Spark delivers on the promise of advanced data processing and machine learning at scale. But there are drawbacks. Developing and operating applications on Apache Spark is complex and requires hand-coding. It is typically restricted to developers and companies with mature data engineering and data science practices. In addition, users often have very limited visibility into how their Apache Spark jobs are running. StreamSets Transformer solves these issues. Its easy-to-use, logical user interface and rich tools for designing data transformations eliminate the complexity and need for specialized skills. Pipelines instrumented with StreamSets Transformer provide unparalleled visibility into every Spark execution. Equally important, developers now have a single tool to build both batch and streaming pipelines.
The key features of StreamSets Transformer to create Apache Spark applications include:
- Continuous monitoring - Unparalleled visibility into Apache Spark application execution
- Continuous data - Runs in both batch and streaming modes
- Progressive error handling - Finds where and why errors occur without the need for Apache Spark skills to decipher complex log files
- Execute on Apache Spark anywhere - Works in the cloud, Kubernetes or on-premises
- Highly extensible - Higher-order transformation primitives for the ETL developer, SparkSQL for the analyst, PySpark for the data scientist, and custom Java/Scala processors for the Apache Spark developer
- Sets-based processing - For ETL, machine learning and complex event processing
“With StreamSets Transformer, Apache Spark is finally available to a wide range of users, enabling visibility, monitoring and reporting for mission-critical workloads. In essence, StreamSets Transformer brings the power of Apache Spark to businesses, while eliminating its complexity and guesswork,” said Arvind Prabhakar, CTO of StreamSets.
“With StreamSets Transformer and Databricks integrated together, even more, users can easily access the powerful capabilities of Delta Lake and our optimized Apache Spark for data science and analytics. Especially as organizations migrate from legacy on-premises platforms, our partnership will help them efficiently make that transition to manage their data and machine learning workloads in the cloud,” said Michael Hoff, senior vice president of Business Development and Partners at Databricks.
StreamSets Announces Beta of StreamSets Cloud
StreamSets, Inc., has opened pre-registration for the upcoming beta of the StreamSets Cloud. With this new offering, data-driven businesses have an easy-to-use cloud service to design, deploy and operate smart pipelines for ingesting data into cloud data platforms, such as Snowflake, Azure Data Warehouse, Azure Data Lake Storage and Amazon S3. StreamSets Cloud is the company’s latest step forward in strengthening its commitment to the cloud and to DataOps practices with a Software-as-a-Service (SaaS) offering for designing and operating data pipelines.
StreamSets Cloud makes it easy to create and manage smart data pipelines, enabling faster delivery of data without sacrificing data integrity and confidence. The beta program for StreamSets Cloud will open in the coming weeks. Designed for companies of all sizes that are adopting cloud data platforms, StreamSets Cloud is the first platform-neutral service that combines StreamSets’ full life cycle approach to designing and operating smart data pipelines with all the benefits of a hosted cloud service:
- Agility - A fully-managed cloud service gets data to stakeholders faster with integrated, easy-to-use tools for designing and operating pipelines.
- Flexibility - Pipelines can be executed anywhere - on-premises or in the cloud.
- Data confidence - Operationalized smart data pipelines are fully instrumented and data drift-tolerant.
- Scalability - Built-in Kubernetes containerization and native cloud scaling reliably handle the largest workloads.
“StreamSets has already proven its commitment to the cloud, with a number of SaaS solutions and support for cloud platforms. With StreamSets Cloud, we’re doubling down on our commitment to delivering an easy-to-use, cloud-native service for designing, deploying and operating data pipelines. At the same time, we’re continuing to focus on promoting DataOps practices to help organizations deal with the speed, fragmentation and the pace of change associated with modern analytics,” said Arvind Prabhakar, CTO, StreamSets.
“StreamSets Cloud is focused not just on developer productivity, but also on making operations far more efficient with fully instrumented pipelines, end-to-end monitoring, and drift-handling. Now, with smart data pipelines built for the modern, always-on and always-changing world, enterprises have the foundation for operationalizing the full data flow life cycle,” said Jobi George, general manager, StreamSets Cloud.
DataOps: Conquering Data in the Age of Modern Analytics
StreamSets, Inc., has detailed its vision for DataOps, the emerging practice for managing data in the age of modern analytics. With the first authoritative guide book on the subject, an inaugural event dedicated to DataOps and a long list of customers using the StreamSets DataOps Platform, StreamSets is driving the continued maturation of DataOps as well as pioneering the foundational data integration technology that underpins DataOps practices.
The current state of data
Data is fragmented and proliferating in more places, with more and more of it outside the direct control of the IT department. At the same time, data is more valuable than ever. Analytics has modernized to harness data in an always-on, always-changing world. How data is delivered to drive those analytics has to modernize, too. DataOps, based on the DevOps concepts of continuous delivery and bringing data providers and data consumers closer together, emerges as a new approach to bring data integration and management into the modern era.
As outlined in “DataOps: The Authoritative Edition,” a book written by industry expert John Schmidt and Kirit Basu, VP Product, StreamSets, DataOps is a rapidly emerging practice that is critical for modernizing and operationalizing data management and integration in order to keep pace with the insatiable and accelerating business demands of modern analytics. A DataOps practice enables data professionals to deal with the speed, fragmentation, and pace of change associated with modern analytics. Ultimately, by embracing DataOps practices and the technology enabling DataOps, data professionals are no longer forced to choose between rapid delivery of data and the confidence that it’s reliable; rather, they can have both.
Data management industry is taking notice
DataOps’ rise has been documented by a number of industry analysts and experts, including 451 Research and Eckerson Group. In Gartner’s “Hype Cycle for Data Management”, DataOps is positioned in the Innovation Trigger phase of the Hype Cycle and noted as “On the Rise.” In fact, the Gartner “Your Data Culture Is Changing Do You Need DataOps?” presentation states, “Given the tremendous pressure to achieve faster delivery of new and enhanced data and analytics capabilities, DataOps will quickly traverse the first half of the Hype Cycle.”
We believe the real testament to DataOps’ rise, however, is the growing number of leading companies that are building out their DataOps practices and investing in the people, processes, and technology to support it. Many of these companies, such as GlaxoSmithKline, RingCentral, and Shell, are StreamSets customers using the StreamSets DataOps Platform. This technology bolsters their DataOps practices by addressing data drift, the unpredictable, unannounced and unending changes in data characteristics caused by the operation, maintenance and modernization of the systems that produce the data. The platform mitigates data drift by detecting and handling these changes in an automated way and enables continuous design and operations.
“DataOps is transforming the delivery of data by using a continuous improvement approach to acquiring, aligning, rationalizing and evolving the data to meet the needs of the consumer. StreamSets is a critical technology enabler for that continual flow of data,” said Mark Ramsey, former Chief Data Officer, GlaxoSmithKline.
“It’s very early in the game for DataOps, with a number of different organizations and vendors offering their own definition of the term in an effort to be part of the excitement. But the practice is real, as evidenced by our customers who have built teams and processes around it. We are committed to not only innovating on the technology front with our StreamSets DataOps platform but also supporting the broader ecosystem of practitioners, thought leaders and vendors who are maturing this critical practice for modernizing data management,” said Girish Pancha, CEO, StreamSets.