Heterogeneous Computing for AI

Braemac
Apr 2, 2020
3 min read

Companies seeking to maximise the innovation and productivity gains from deep learning and artificial intelligence (AI) should consider using a computing platform optimised for the associated algorithms. Deep learning and AI algorithms tend to work on a lot of data and as a result require fast memory transfer, parallel computing and mixed-precision computation to improve the latency and responsiveness of AI models. In order to handle these diverse requirements, an optimised computing platform will typically employ two or more different types of computing cores. This design approach is called heterogeneous computing.

AI WORKLOAD OPTIMISATION

In order to attain an optimised computing platform for deep learning and AI solutions developers may need to evaluate many areas including Hardware Selection, Performance Profiling, Dynamic Workload Deployment.

Selecting Hardware Determine which computing cores are best suited to run the required AI algorithms and how much computing power and I/O bandwidth are needed. Hardware selection should also consider size, weight, power consumption and cost constraints, particularly when deploying AI at the edge of the network.

Tuning AI Performance

Identify bottlenecks in the software or hardware due to insufficient platform resources, inefficient scheduling of software threads or contention between various running processes. This analysis typically requires software profilers and other types of performance tuning tools.

Deploying Dynamic Workloads across the Network

Develop the ability to deploy and provision dynamic workloads across the network in order to improve inference results with refined AI models and address new challenges. Dynamic workloads require a flexible and adaptable computing architecture that enables a scalable, real-time, and reliable deployment environment.

REDUCE AI DESIGN EFFORT

System developers, OEMs and systems integrators can more easily develop deep learning and AI solutions with help from ADLINK’s consultation services and Data River™. By working with ADLINK, it is possible to more easily:

Lower solution cost: Select the right hardware platform for the target workload.
Increase system performance: Eliminate platform bottlenecks that slow down AI algorithms.
Simplify AI deployment: Leverage ADLINK’s Data River™ to more easily share data and distribute AI processing across the network.

DEPLOY AI AT THE EDGE

Optimised for AI, ADLINK heterogeneous computing platforms include a mix of GPUs, vision processing units (VPUs), FPGAs and ASICs that accelerate AI workloads running on board-, system-, and server-level products. With these platforms, system developers, OEMs and systems integrators can construct and optimise system architecture for both AI inferencing and training applications.

ADLINK GPU- and VPU-Powered Solutions

ADLINK’s portfolio of GPU-based and VPU-based solutions offer different performance level, power efficiency, and form factors for AI applications.

ADLINK’s families of embedded graphics products provide hundreds or thousands of parallel cores for AI inferencing and training that require sophisticated image processing and mathematical and statistical computations. Mobile PCI Express Module (MXM) modules are well-suited for SWaP-constrained applications. PCI Express graphic (PEG) cards connect via a common interface, are easy to integrate and are used in many embedded segments. For applications that must meet stringent size constraints and other requirements, onboard GPU integration is a customised solution to accommodate size and performance needs.
For image-based inferencing embedded VPU modules dedicated to image processing feature low power, a small footprint and excel in computer vision applications.

System Components for AI Workload Integration

For SWaP-constrained applications, ADLINK’s Matrix compact fanless embedded computers offer the best performance-per-watt and high availability with expandability options, including MXM cards. Actively-cooled DLAP platforms support applications in less thermally-challenged settings. For mission-critical applications ADLINK’s configurable embedded computers provide an expandable building block to host multiple accelerators, enabling the consolidation of workloads including highly-parallel graphics computing, motion control and data acquisition, onto one system. For applications demanding even greater scalability ADLINK offers highly configurable motherboards and rackmount industrial chassis that provide an enclosure for ADLINK’s large family of industrial ATX motherboards. The boards feature multiple PCIe/PCI/LAN/USB 3.0, enable immediate multi-tasking deployment and balance performance and expandability.