What It Means to Supercharge Analytics

Posted 3/11/2016 8:06:33 AM by TRAVIS OLIPHANT, CEO of Continuum Analytics

What It Means to Supercharge Analytics
Making the process faster is one thing. We recently compiled our Python distribution with Intel MKL, so that Anaconda works up to 7x faster. We added Microsoft R Open to our R-Essentials package for customers to get a high performance version of R that can also be used with Intel MKL. 

What really puts the rocket fuel into analytics; however, is a connected ecosystem for data, analytics and computation - a massive undertaking otherwise known as Open Data Science. Proprietary big data software has inherent limitations and barriers that prevent agile collaboration and high performance for data science teams. By removing that ball and chain, Anaconda 2.5 isn’t just making analytics faster. It is freeing data professionals to use all the tools at their disposal. 

One platform, many open tools

When you look at the state of data science today, you'll find a disjointed and disconnected ecosystem, where all too often open source reinforces the norms of proprietary vendors that create barriers preventing simple interoperability and collaboration. 

Open Data Science is an inclusive movement for data science that is focused on the various pieces - the data, the analytics, the computation - within a connected ecosystem. Open source is at the heart of Open Data Science but not all open source is created equally. 

For example, some data scientists use Scala, the open source language underneath Spark, to do machine learning. However, a huge amount of machine learning today lives in two other open source languages -  R and Python. Unfortunately, the Java Virtual Machine (JVM) that Scala runs on makes it difficult for R or Python to interoperate with Scala.  Support for R and Python as been added later, but as second class citizens, through the use of APIs. 

Non-interoperable tools drain time and energy

When you have to move data around - especially in Big Data - you pay a huge performance penalty. By default, R and Python analytics suffer a large penalty when used with Scala. If Scala had been designed with interoperability in mind from the onset, there would be no need to move data around and developers could leverage the entire Python and R ecosystem, while enjoying higher performance. 

Could a developer write their own analytics in Scala? Yes, eventually, but it's not going to match the decades of work that have been put into R and Python. 

The basic problem is this: members of data science teams often need specific tools to get the job done, but receive the software equivalent of a sledge hammer when they need a penny hammer. Knowing that, and with our mission to empower data science teams to do their best work, Continuum Analytics has taken the time to incorporate various technologies into Anaconda, our open source modern analytics platform powered by Python, and work to supercharge big data analytics and the "Open Data Science" movement.

A platform for innovation

With Anaconda, Continuum Analytics is taking the "big tent" approach: first we created a platform with a world-class distribution and enterprise-grade, value-added capabilities on top for Python. Then we added R into that mix as a first- not second-class citizen. We give data scientists the right tools for the problems they have. 

For statistics, we offer R, whereas for data analysis, we offer Python. We are constantly innovating and integrate with other high value platforms, such as our recent one with Cloudera, to further expand the data scientist tools.

This approach is crucial to the Open Data Science movement: different tools for different jobs across a connected ecosystem. Data science teams have different preferences, training, background and experiences. There’s no one size fits all approach for data science teams. They need to be empowered with a variety of different tools - open data science languages, UIs, analytics and modern architectures to be able to solve different problems. 

And so rather than dictating that a command line is the only way, the ideals of Open Data Science dictate that we provide command line interfaces, notebooks, IDEs, visual programming environments and all these different ways in which you can actually develop your data science solution.

In Open Data Science, everything is open at a data level to support many different data platforms - at the analytics level to support various libraries and computational level to support different computing architectures.

When we talk about Open Data Science, you get to choose. We don't tell you how to do your job. By facilitating an open and interoperable experience, Anaconda 2.5 supercharges all analytics, Python and beyond. 

Read More https://www.continuum.io/...


About the author: TRAVIS OLIPHANT, CEO of Continuum Analytics

Travis has a Ph.D. from the Mayo Clinic and B.S. and M.S. degrees in Mathematics and Electrical Engineering from Brigham Young University. Since 1997, he has worked extensively with Python for numerical and scientific programming, most notably as the primary developer of the NumPy package, and as a founding contributor of the SciPy package. He is also the author of the definitive Guide to NumPy.

Travis was an assistant professor of Electrical and Computer Engineering at BYU from 2001-2007, where he taught courses in probability theory, electromagnetics, inverse problems, and signal processing. He also served as Director of the Biomedical Imaging Lab, where he researched satellite remote sensing, MRI, ultrasound, elastography, and scanning impedance imaging.

As CEO of Continuum Analytics, Travis engages customers in all industries, develops business strategy, and helps guide technical direction of the company. He actively contributes to software development and engages with the wider open source community in the Python ecosystem. He has served as a director of the Python Software Foundation and as a director of NumFOCUS.

Subscribe to App Developer Daily

Latest headlines delivered to you daily.