Posted 4/19/2016 7:00:29 AM by STUART PARKERSON, Publisher Emeritus
Google’s TensorFlow is an open source software library for numerical computation using data flow graphs. The architecture provides the ability to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.
It its recent release of TensorFlow 0.8, Google is introducing distributed computing support, including everything needed to train distributed models on a company’s infrastructure. Distributed TensorFlow is powered by the high-performance gRPC library, which supports training on hundreds of machines in parallel.
Google has published a distributed trainer for the Inception image classification neural network in the TensorFlow models repository. Using the distributed trainer, Google reports that it trained the Inception network to 78% accuracy in less than 65 hours using 100 GPUs. Small clusters can also benefit from distributed TensorFlow, as including more GPUs improves the overall throughput, and produces accurate results sooner.
In addition to distributed Inception, the 0.8 release includes new libraries for defining distributed models. TensorFlow's distributed architecture offers flexibility in defining the model, as every process in the cluster can perform general-purpose computation.
In TensorFlow, all computation, including parameter management, is represented in the dataflow graph, and the system maps the graph onto heterogeneous devices (such as multi-core CPUs, general-purpose GPUs, and mobile processors) in the available processes. Included are Python libraries to write a model that runs on a single process and scales to use multiple replicas for training.
TensorFlow’s Data flow graphs describe mathematical computation with a directed graph of nodes & edges. Nodes typically implement mathematical operations, but can also represent endpoints to feed in data, push out results, or read/write persistent variables. Edges describe the input/output relationships between nodes. These data edges carry dynamically-sized multidimensional data arrays, or tensors.
The flow of tensors through the graph is where TensorFlow gets its name. Nodes are assigned to computational devices and execute asynchronously and in parallel once all the tensors on their incoming edges becomes available.
Read More https://www.tensorflow.org/...