e-Zest members share technology ideas to foster digital transformation.

Non Uniform Memory Architecture (NUMA)

Written by Madhura Oak | Aug 13, 2012 4:37:23 PM

Around two decades ago, Non Uniform Memory Architecture or Non Uniform Memory Access (NUMA) created a new trend in multiprocessing architectures. Many microprocessor manufactures such as AMD, Intel, Unisys, HP, Silicon Graphics, Sequent Computer Systems, EMC, Digital and NCR had started manufacturing commercial NUMA systems.

NUMA is a type of parallel processing architecture. There are two types of parallel processing architectures – Shared Memory Architecture and Distributed Memory Architecture. Shared Memory Architectures are of two types – Uniform Memory Access (UMA) and Non Uniform Memory Access (NUMA).

In Shared Memory Architecture all processors share a common memory. This architecture is also called as Symmetric Multiprocessing (SMP). Cache Coherency is a challenge for this architecture and Snoopy scheme is a preferred way to handle it.

UMA

In Uniform Memory Access (UMA) as shown in Figure 1, all processors have same latency to access memory. As shown in Figure 1, a hardware cache is usually present with each processor.

Figure 1. Uniform Memory Access

This architecture is scalable only for limited number of processors. Its scalability decreases beyond a certain number of processors.

NUMA

In Non Uniform Memory Access (NUMA) as shown in Figure 2, each processor has its own local memory. A processor can also have a built-in memory controller as present in Intel’s Quick Path Interconnect (QPI) NUMA Architecture.

Unlike Distributed Memory Architecture, the memory of other processor is accessible but the latency to access them is not same. The memory which is local to other processor is called as remote memory or foreign memory.

A processor usually uses its local memory to store the data required for its processing. Accessing a local memory has least latency. It can also utilize the remote memory. Scalability is not an issue even if the count of processors grow up in this architecture.

In Cache Coherent NUMA (ccNUMA), a processor can be directly linked via HyperTransport bus to other.

NUMA provides better scalability than UMA when number of processors is very large.

Figure 2. Non Uniform Memory Access

The hardware trend is to use NUMA systems with several NUMA Nodes as shown in Figure 3. A NUMA Node has a group of processors having shared memory. A NUMA Node can use its local bus to interact with the local memory. Multiple NUMA Nodes can be added to form a SMP. A common SMP bus can interconnect all NUMA Nodes.

Figure 3. NUMA SMP

Having local memory reduces the lock contention of heap memory managers. For memory allocation on shared memory, the heap memory manager locks the memory when it is being allocated to a processor thread. All the threads have to wait until the lock on the memory is released. This results in a performance bottleneck when the number of processors accessing a shared memory is more. Many heap memory mangers also use lock-free algorithms for allocation of memory. To allocate local memory, the heap memory managers need to be NUMA aware.

Virtual Memory allocation in NUMA

The First Touch memory allocation policy is used in NUMA, in which there is a page fault while accessing the memory for first time. The virtual memory is not initially mapped to a physical memory. On the page fault, the physical memory is allocated by the operating system (OS) and the virtual memory address is mapped to the physical memory.

NUMA aware OS

For the optimum use of NUMA, the OS running on it should be able to recognize it. The OS can schedule threads on local processors also called as processor affinity. It can take care of load balancing by distributing jobs/data across multiple processors/memories while executing large jobs. This may sound conflicting but the NUMA aware OS takes care of efficient resource management. Linux, Windows XP SP2 and Windows 2003 Server support NUMA. Microsoft Vista provides an API to allocate memory on a specific NUMA node (this increases the cost of memory management).

NUMA aware Programming Languages

For efficient memory management on NUMA, the memory manager of programming languages also need to be NUMA aware. Java is NUMA aware.

NUMA aware Databases

Oracle 11g needs to be explicitly enabled for NUMA support. Oracle disables NUMA when the page size is large. SQL Server 2008 R2 is NUMA aware and implicitly recognizes NUMA. Databases which are not NUMA aware for e.g. SQL Server 2000 SP3 have very poor performance on NUMA machines.