Posted 9/1/2015 4:04:04 PM by STUART PARKERSON, Publisher Emeritus
Intel has introduced Parallel Studio XE 2016, the latest installment of its developer toolkit for high performance computing (HPC) and technical computing applications. Intel Parallel Studio XE 2016 helps developers design, build, verify and tune code in Fortran, C++, C, and Java.
This suite of compilers, libraries, debugging facilities, and analysis tools, targets Intel architecture, including support for the latest Intel Xeon processors (codenamed Skylake) and Intel Xeon Phi processors (codenamed Knights Landing).
Highlights of this year's tool release include:
- Intel Data Analytics Acceleration Library
- Vectorization Advisor
- MPI Performance Snapshot
- High performance support for industry standards, the latest processors, operating systems and their related development environments.
Intel Data Analytics Acceleration Library (Intel DAAL)
Intel DAAL helps speed big data analytics. It’s designed for use with data platforms including Hadoop, Spark, R, and Matlab, for efficient data access. Intel DAAL was created by the team that created the Intel Math Kernel Library (Intel MKL) and the team says that Intel DAAL can be thought of as “Intel MKL for Big Data.”
Vectorization is the process of using SIMD instructions in processors. It facilitates the modernization of applications to get top performance out of any modern processor, allowing developers to focus on multithreading, vectorization and fabric scaling. Intel Advisor XE 2016 provides tools to help with multithreading and vectorization including:
- Vectorization Advisor is an analysis tool that helps identify loops that will benefit the most from vectorization by identifying obstacles to vectorization that are particular to a program, explore the benefit of alternative data organization, and increase the confidence that transformations, aimed to increase vectorization, will preserve the correctness of the original program.
- Threading Advisor is a threading design and prototyping tool that lets developers analyze, design, tune, and check threading design options rapidly.
In a recent blog post Intel’s James Reinders says, “Vector Advisor cannot tell you anything we could not show you how to do yourself. However, when I teach ‘vectorization’ I tend to rattle off a list of things to check. Each item that I suggest to “check” involves using a tool in a particular way. Bringing that into one tool, makes life easier and definitely makes the process faster and more efficient. One of the key Vectorization Advisor features is a Survey Report that offers integrated compiler report data and performance data all in one place, including GUI-embedded advice on how to fix vectorization issues specific to your code. This page augments that GUI-embedded advice with links to web-based vectorization resources.’
MPI Performance Snapshot
The MPI Performance Snapshot is a scalable lightweight performance tool for MPI applications. It collects a variety of MPI application statistics (such as communication, activity, and load balance) and presents it in an easy-to-read format. The tool is provided as part of the Intel Parallel Studio XE 2016 Cluster Edition.
The MPI Performance Snapshot helps deal with the following problems as it relates to analysis of MPI application when scaling out to thousands of ranks:
- The size of clusters continue to grow so applications are getting more and more scalable.
- Large amounts of data are collected when doing profiling at larger scale which in turn can easily become unmanageable.
- It's hard to identify which are the key metrics to track when gathering so large amounts of data.
Reinders comments, “By addressing these three items, MPI Performance Snapshot improves scaling to at least 32K ranks which is an order of magnitude above what is tolerable with the prior Intel Trace Analyzer and Collector. Therefore, we can now recommend when aiming to optimize a large scale run (anything above one thousand MPI ranks), we suggesting starting with the MPI Performance Snapshot capability first to figure out where you need to dig deeper (which processes are slowing the application down, where are the peaks in memory usage, etc.). Then, do another run with the Intel Trace Analyzer and Collector on a subset of selected ranks to get a more detailed per-process information in order to visualize how a communication algorithm is implemented and if see if there are apparent bottlenecks.”
MPI Performance Snapshot combines lightweight statistics from the Intel MPI Library with OS and hardware-level counters to provide high-level categorization of and application including MPI vs. OpenMP load imbalance info, memory usage, and a break-down of MPI vs. computation vs. serial time.
The Parallel Studio XE 2016 provides high performance support for the latest processors including the Skylake microarchitecture and Knight Landing microarchitecture. Fortran support includes a feature from the draft Fortran 2015 standard which can help MPI-3 users.
Operating system support includes Debian 7.0, 8.0; Fedora 21, 22; Red Hat Enterprise Linux 5, 6, 7; SuSE Linux Enterprise Server 11,12; Ubuntu 12.04 LTS (64-bit only), 13.10, 14.04 LTS, 15.04; OS X 10.10; Windows 7 thru 10, Windows Server 2008-2012. These are versions Intel has tested, additional operating systems, such as CentOS, should work as well.
There is a series of webinars being held by Intel starting in September 2015 which cover many topics related to Intel Parallel Studio XE 2016. The webinars can be attended live, and offer interactive question and answer time. The webinars will also be available for replay after the live webinar is held. The first webinar is on September 1 – “What’s New in Intel Parallel Studio XE 2016?”
Read More https://software.intel.com/intel-parallel-studio-x...