In its beta
release of Intel Distribution for Python, Intel is introducing what they call “something new and unusual for the Python world.” The new functionality is an experimental module which unlocks additional performance for multi-threaded Python programs by enabling threading composability between two or more thread-enabled libraries.
The Intel Distribution for Python platform promises to offer faster performance from Python packages powered by Intel Math Kernel Library (Intel MKL). The beta product adds new Python packages like scikit-learn, mpi4py, numba, conda, tbb (Python interfaces to Intel Threading Building Blocks) and pyDAAL (Python interfaces to Intel Data Analytics Acceleration Library). The platform also delivers performance improvements for NumPy/SciPy through linking with performance libraries like Intel MKL, Intel Message Passing Interface (Intel MPI), Intel TBB and Intel DAAL.
With the new ability of threading composability, developers can accelerate programs by avoiding inefficient threads allocation (oversubscription) when there are more software threads than available hardware resources.
Intel says the biggest improvement is achieved when a task pool like the ThreadPool from standard library or libraries like Dask or Joblib (used in multi-threading mode) execute tasks calling compute-intensive functions of Numpy/Scipy/PyDAAL which in turn are parallelized using Intel MKL or/and Intel Threading Building Blocks (Intel TBB).
The module implements Pool class with the standard interface using Intel TBB which can be used to replace Python’s ThreadPool. With the monkey-patching technique implemented in class Monkey, no source code change is needed in order to unlock additional speedups.