A serverless search engine for Data Scientists and Developers emerges
|Richard Harris in Big Data Friday, March 22, 2019|
Serverless search and analytics engines, such as Rockset can ingest data in all popular semi-structured formats and automatically turn them into fast SQL tables.
Rockset announced the general availability of a cloud service that allows developers and data scientists to put complex data sets to use in minutes instead of weeks.
Current trends in technology - smart devices, digital lending, fraud detection, omnichannel retail, machine learning, microservices, and real-time business dashboards - all require clean data from a multitude of sources. However, businesses are struggling with tons of high-value, low-quality data in fragmented systems like data lakes, NoSQL databases, and data streams. Increasingly, data comes from many different sources and tends to be machine data in JSON, CSV, XML and Parquet formats or business data like XLSX and PDF. Rockset empowers teams to serve data-intensive applications using a wide range of data formats from multiple sources, without having to model schemas or manage fragile extract-transform-load (ETL) pipelines.
“Modern data sets are messy and unstructured. Traditional data management systems force teams to spend more than 50 percent of their time just preparing and loading these data sets,” said Venkat Venkataramani, co-founder and CEO of Rockset. “What's the point of collecting new data if you cannot use it immediately and iterate quickly? Rockset has reimagined the stack in a new developer-friendly way, with fast SQL on raw data -- so you can go from data to applications in a few minutes.”
Rockset is a serverless data backend that continuously ingests raw data as it is generated and delivers real-time SQL queries at scale. Its patent-pending Converged Indexing, which combines an inverted index, columnar index, and document index, is optimized for key-value, time-series, document, search, aggregation and graph type queries out of the box. It is backed by RocksDB, which combined with its Cloud Native Distributed Query Engine, delivers the performance and scale necessary for real-time operational applications and interactive data science.
This release delivers new functionalities including:
- Continuous schema-less ingest from data lakes, databases and streams including Apache Kafka, Amazon Kinesis, Amazon DynamoDB, Amazon S3, and Google Cloud Storage
- Smart schemas for JSON, XML, CSV, Parquet, Excel, and PDF
- SQL, including JOINs, with 1,000+ QPS
- SDKs for Java, Python, Node.js, Golang and REST API
- Delivered as-a-service with self-serve access
Developer and Data Scientist Communities Embrace Rockset
“We need to carefully monitor our growth in real-time. Is a certain product suddenly selling more? Is there a fraudulent transaction? We easily generate 20-30 million events per day, all captured in Kafka streams. Our applications query the data every few seconds. By sending our raw event data directly from Kafka to Rockset, we save a lot of time and energy. We track over 40 metrics in real time and constantly take immediate actions.” says Amboj Goyal, principal engineer at Fynd, a leader in online-to-offline retail with a fashion e-commerce portal that helps millions of customers find fashion from local stores.
“We receive data from many different sources in many different formats - each with different delimiters, headers and escape characters. It is time-consuming to load the data to multiple systems, individually infer the schema and handle malformed data. We also deal with constantly changing schemas as new versions of the data are received. Rockset has the potential to free our data scientists to run ad-hoc SQL queries on raw data while minimizing ETL work and maintenance.” says Alex Izydorczyk, head of data science at Coatue, a technology sector hedge fund - known for its data prowess — that invests in public and private equity markets.