HIGH-PERFORMANCE COMPUTING (HPC)
High-performance computing is taking over the world more than ever. It is getting optimized for the tasks, modified and creating its own unique world. We will consider several solutions belonging to this world.
The main application of supercomputers is calculating mathematical models of various structures, phenomena, processes, etc. Defining the need for such calculations is not our prerogative. But we can help to define the required supercomputer architecture and composition for your purposes.
Solutions can be both for simple engineering calculations and for fundamental scientific researches using grids with dimension of hundreds of millions cells. Our solutions are tested at work in universities, enterprises and leading industry and academic research institutions.
Solutions Based on High-Performance SMP Systems of x86/x64 Family
The basic solution is a server with the number of CPU cores from 64 pieces and with RAM capacity from 128 GB. We can offer different network interfaces: 1 Gigabit Ethernet, 10 Gigabit Ethernet, Infiniband QDR or FDR. The basic interface is 10 Gigabit Ethernet in consequence of its universality and bandwidth capacity. Such solutions are in high demand when calculating large models in fundamental researches.
Solutions based on clusters, computing nodes of which communicate with each other via MPI
This solution is in general use. But even standard solutions periodically do not show the expected results. It happens exceptionally because executors do not understand the requirements of the tasks to be solved and because of ineffective supercomputer optimization for these tasks.
Computing nodes configurations range from two sockets to eight sockets built on the base of processors Intel Xeon or AMD Opteron of x86/x64 family. RAM capacity amounts from 1 GB to 16 GB per one physical core. We basically offer Infiniband Interconnect but for a certain class of tasks the use of Gigabit Ethernet Task 10 provides higher efficiency.
Scaling of this solution allows to create supercomputers with peak performance of up to several PFlops.
To increase computing nodes performance various kinds of accelerators are used. There are several types of accelerators. The main accelerators are on the basis of stream processors and on the basis of programmable logic gates.
Typical representatives of stream accelerators are AMD FireStream, nVidia Fermi, Intel Xeon Phi. These accelerators are usually performed in the form of expansion card with PCI-e interface. Productivity of the accelerators mentioned above for the moment amounts to approximately 1 TFlops for double precision operations.
Any supercomputer has several networks. A small one can manage with two or three networks. Large ones usually use up to three-five networks. The most commonly used networks are Infiniband and Ethernet. Specialized networks with 2D-torus, 3D-torus, ... nD-torus organization are also used for certain supercomputers. All of the abovementioned networks have full bandwidth capacity under any load and have minimal value of delay when transferring data.
Any supercomputer has its own DSS. One large DSS rarely serves several supercomputers. But it is also possible. A distinctive feature of DSSs for supercomputers is high IO performance, low latency, the ability to provide concurrent access to a single file for a large number of nodes (in large supercomputers up to tens of thousands nodes), reliability, fault tolerance, etc.
It happens that an ordinary DSS without parallel file system is enough for small supercomputers. Therefore, the choice of any DSS is primarily based on the needs of the task to be solved.
DSS with a Parallel File System
In 2010 the standard of the widespread NFS - NFS v.4.1 protocol was adopted. The feature of this version which distinguishes it from the previous ones is the chapter on concurrent access and describing pNFS. This chapter explains three ways to access the data – the file, block and object ways. At the moment, the only available implementation is the file one. The block implementation is expected next year. And the object implementation is expected not earlier than in 2014.
But this does not limit the ability of using proprietary technologies. Thus, the following file systems are widely known: GPFS, Lustre, OneFS, PanFS, StorNext, ExaData. Each such file system has unique abilities and is indispensable for a specific range of tasks. For example, ExaData and StorNext are most suitable for media processing. GPFS, Lustre and PanFS are most suitable to work with supercomputers. However, it is worth noting that PanFS works only at Panasas Company DSS and therefore it can only be used in conjunction with this equipment.
File system location can be another distinguishing criterion. OneFS and PanFS are installed in DSSs themselves. GPFS, Lustre, StorNext, ExaData are installed at the dedicated servers. We suggest to pay attention to the following parallel file systems:
Network File System (NFS) is a protocol of network access to file systems. NFS is abstracted from the types of both server and client file systems; there are many implementations of NFS-servers and clients for various operating systems and hardware architectures. An integral part of NFS v.4.1 is pNFS - Parallel NFS, mechanism of parallel NFS-Client access to the data of multiple distributed NFS-Servers. Availability of such a mechanism in the standard network file system will help to build distributed “cloud” storages and information systems.
GPFS parallel file system is designed and developed by IBM. IBM General Parallel File System (GPFS) is a high performance file system with multiple accesses to the disks, which provides quick access to all the nodes of a homogeneous or heterogeneous cluster of IBM UNIX servers running AIX 5L or Linux operating system.
Lustre is a distributed file system of massive parallelism usually used for large scaled cluster computing. Name Lustre is a contamination formed by the words Linux and cluster. Implemented under GNU GPL license, the project provides high-performance file system for clusters with tens of thousands network nodes and petabyte information storages.
DSS Architecture with a Parallel File System
The basic solution is the availability of data storage systems with block-level access and with a group of servers connected to them, on which the parallel file system code is executed. This group, in its turn, is divided into file servers and metadata servers. File servers host the data directly at the available disk space and provide quick access to them. At the same time servers, which consume these data, do not know from which file server they are to take the data. Therefore, the primary request goes to the metadata server and then it indicates which part of the requested file from which file server should be taken.
Another solution is NetApp Company solution. DataONTAP Cluster Mode operates at each DSS controller and forms a single equitable peer-to-peer storage system of these controllers. The request, which has come to any controller, is processed by the controller and it requests the missing data from its neighbors. Thus, there is no need to install dedicated metadata servers.
Apache™ Hadoop® project is developing with an open source code for reliable, scalable, distributed computing.
Apache Hadoop software library is a framework which implements distributed processing of large data sets via computer clusters using a simple programming model. It is designed to scale from a single server to a thousand, where each server acts as a local evaluator and at the same time as a storage system.
Computing MapReduce paradigm is applied in Hadoop project. Hadoop MapReduce is a framework for programming and implementation of distributed computing within MapReduce paradigm and also a set of Java-classes and executable utilities to create batch jobs for MapReduce processing. Taken together, they complement each other.
Developers of applications for Hadoop MapReduce need to implement a basic handler, which will provide transformation of the original “key-value” pair to the intermediate set of “key-value” pairs (a class implementing Mapper interface named after the higher order Map functions), and a handler, which reduces the intermediate set of pairs into the final, reduced set (folding, a class implementing Reducer interface), at each computing cluster node. Framework sends the sorted outputs from the basic handlers to the folding input, the reduction consists of three phases - shuffle (shuffling, required output section allocation), sort (sorting, grouping by key outputs derived from the distributors – final sorting, required in case when different atom handlers return sets with identical keys; with that, sorting rules in this phase can be set in software and can use any of the internal keys structure features) and reduce as such (reduction) - getting the result set. For some types of processing folding is not required and in this case Hadoop MapReduce returns the set of sorted pairs derived from basic handlers.
But experience has shown that Hadoop in its original form is not effective for all such tasks. Architecture with disk quotas located at high-performance fault-tolerant disk storage and computing nodes connected directly to the DSS appeared to be more flexible and efficient. Thus, we get a flexible and versatile structure in which each finished logical architecture component is independent from the others. The experience of its implementation has shown the effectiveness of using such a solution.
NetApp® E-series – E-series are high-performance data storage systems meeting the highest corporate requirements of productivity and storage volume without compromising in terms of simplicity and efficiency. Meeting many different requirements, these balanced systems are designed in a way that they equally well support high-performance file systems requiring high bandwidth application capacity and workloads when performing many operations.The ability to use different disk enclosures in E-series systems allows to create special configurations adaptable to any environment..
DataONTAP is an operating system of NetApp data storage systems. It has several versions: DataONTAP Seven Mode, DataONTAP Cluster Mode, DataONTAP Seven Mode V-series.
Data ONTAP Cluster Mode. is the most demanded for supercomputers. The particular DataONTAP GX (the latest version 10) branch was a progenitor of Data ONTAP Cluster Mode. But starting with version 8, DataONTAP obtained the version numbering and base code unified with Seven Mode and V-series.
In addition to providing opportunities for parallel data access,Data ONTAP Cluster Mode allows to combine several NetApp DSSs including various packaging and models into a single DSS with a single management system and single units concurrently located at the disks all the controllers. Thus, the possibility of increasing performance of all the parts of a single data storage system is achieved. Parallel access is provided by NFS protocol v.4.1. CIFS, iSCSI, FibreChanel protocols are also supported..
In the near future, NetApp Company plans to merge two branches, Seven Mode and Cluster Mode, into a single one supporting options of both of them.
Lustre — is a distributed file system of massive parallelism usually used for large scaled cluster computing. Name Lustre is a contamination formed by the words Linux and cluster. Implemented under GNU GPL license, the project provides high-performance file system for clusters with tens of thousands network nodes and petabyte information storages.