Tag Archives: memory

Kingston memory modules, part number vs revision number

Some Kingston DDR3 DIMMS

Same Kingston Part Number, different Kingston Revision Numbers.

Upgrading memory in some boxes, it turns out that while fairly detailed DMI memory module identification info available through dmidecode on a running Linux system shows a ‘Part Number’ for each memory module:

Handle 0x0015, DMI type 17, 28 bytes
 Memory Device
 Array Handle: 0x000F
 Error Information Handle: Not Provided
 Total Width: 72 bits
 Data Width: 64 bits
 Size: 4096 MB
 Form Factor: DIMM
 Set: None
 Locator: P1-DIMM2A
 Bank Locator: BANK2
 Type: DDR3
 Type Detail: Other
 Speed: 1333 MHz
 Manufacturer: Kingston
 Serial Number: 3C1A27E7
 Asset Tag:
 Part Number: 9965439-121.A00LF
 Rank: Unknown

.. this part number is not always easy to look up, and it is not the same as a Kingston Part Number. On the interwebs, Kingston refers to part numbers like KVR16R11D4/8 or KVR1600D3D4R11S/8G. They have a helpful page about it at www.kingston.com/us/memory/valueram/valueram_decoder that explains what the number means and how to decode it. In Kingston lingo, that is what they mean by ‘Part Number’.

The dmidecode style ‘Part Number’ in Kingston speak, is a ‘Revision Number’. About this they don’t really talk a lot and not all revision numbers are actually mentioned anywhere on the indexed web. You might find them mentioned in data sheets. Kingston  do tell you how to find this revision number, at https://legacy.kingston.com/support/help_master.asp, probably because when there is a reported  issue, internally at Kingston this pinpoints more exactly what product version it is. A Kingston Revision Number looks something like the above, 9965439-121.A00LF. There is also a third number, called the Work Order. That is not very interesting as far as I can tell.

It might help somebody if I write down some Kingston Revision Numbers I encountered and which Kingston Part Number they map to – if you have the physical module, it shows both. There goes:

Kingston Revision Number  Kingston Part Number  More info
9931129-002.A00G          KVR1333D3D4R9S/4G     4GB DDR3 ECC Reg 1.5V DIMM
9931129-004.A00G          KVR1333D3D4R9S/4G     4GB DDR3 ECC Reg 1.5V DIMM
9931129-005.A00G          KVR1333D3D4R9S/4G     4GB DDR3 ECC Reg 1.5V DIMM
9965426-059.A00LF         KVR1333D3D8R9S/4G     4GB DDR3 ECC Reg 1.5V DIMM
9965426-199.A00LF         KVR13LR9D8/8          8GB DDR3L ECC Reg 1.35V DIMM
9965426-405.A00LF         KVR16LR11D8/8HB       8GB DDR3L ECC Reg 1.35V DIMM
9965439-121.A00LF         KVR13LR9S8/4          4GB DDR3L ECC Reg 1.35V DIMM
9965439-127.A00LF         KVR16R11S8/4          4GB DDR3 ECC Reg 1.5V DIMM
9965447-017.A00LF         KVR1333D3D4R9S/4G     4GB DDR3 ECC Reg 1.5V DIMM
9965516-490.A00LF         KVR16R11D4/16         16GB DDR3 ECC Reg 1.5V DIMM
9965600-012.A01G          KVR21R15D4/16HA       16GB DDR4 ECC Reg 1.2V DIMM
9965640-016.A00G          KVR21R15D4/32         32GB DDR4 ECC Reg 1.2V DIMM

Will add more as I come across them.

 

Linux virtualization with libvirt: NUMA management

Turns out that with any more than lightly loaded modern KVM + libvirt host system that houses more than a single CPU, NUMA tuning is essential.

If you see symptoms like high system CPU time peaks on the host, and steal CPU time on the virtual instances (or ‘missing’ CPU time), chances are the system is spending too much time transferring data between NUMA nodes. Another tell tale sign is guests / VMs that get shot down by the host because of an OOM (Out of Memory) condition, while there seems to be a healthy margin of free/buffered/cached memory.

The problem here is that host system memory is distributed across available NUMA nodes. Certain cores are associated to certain nodes. With current x86_64 machines usually one physical processor package, typically containing multiple CPU cores, also holds one NUMA node. It helps performance a lot if a single virtual machine (or any old process) is contained within a single node. The below picture, output of hwloc-ls (apt-get install hwloc) gives an idea of how this is layed out.

hwloc-ls-output

For that reason you might want to pin guests to a defined subset of logical cores, that are all in the same node. The memory assigned to guests within a single NUMA node should also together, fit within the memory the node has available. Having a non NUMA aware guest, even if there is only one, with more memory assigned to it than one NUMA node has available, does not make a lot of sense. There is a big performance penalty because part of the assigned RAM when addressed, always needs to get transferred between nodes. If you need to do this anyway, have a look at e.g. http://docs.openstack.org/developer/nova/testing/libvirt-numa.html – you can make the guest itself NUMA aware so it can efficiently use memory from multiple NUMA cells.

Helpful tools here are numastat -m (apt-get install numactl), the hwloc suite (apt-get install numactl hwloc, see picture, and you will need X forwarding (ssh -X) if you want the graphics from a remote system with hwloc-ls), lscpu.

Peripherals such as disks and NICs are also NUMA node bound, which can be important for DMA performance.

Example numastat output:

root@host-system:~# numastat -m

Per-node system memory usage (in MBs):
                          Node 0          Node 1           Total
                 --------------- --------------- ---------------
MemTotal                64328.18        64510.16       128838.34
MemFree                 10501.73         2769.72        13271.46
MemUsed                 53826.45        61740.43       115566.88
Active                  48528.07        55331.48       103859.55
Inactive                 3082.79         4013.75         7096.54
Active(anon)            45496.47        51762.40        97258.87
Inactive(anon)              0.12            0.18            0.30
Active(file)             3031.60         3569.08         6600.68
Inactive(file)           3082.67         4013.56         7096.23
Unevictable                 0.00            0.00            0.00
Mlocked                     0.00            0.00            0.00
Dirty                       0.50            0.30            0.80
Writeback                   0.00            0.00            0.00
FilePages                6115.00         7584.15        13699.15
Mapped                     13.54            6.84           20.38
AnonPages               45495.84        51761.65        97257.49
Shmem                       0.73            0.94            1.67
KernelStack                 6.70            4.98           11.69
PageTables                 94.40          124.43          218.82
NFS_Unstable                0.00            0.00            0.00
Bounce                      0.00            0.00            0.00
WritebackTmp                0.00            0.00            0.00
Slab                     1462.05         1622.14         3084.19
SReclaimable              701.69         1071.01         1772.70
SUnreclaim                760.36          551.12         1311.49
AnonHugePages            1396.00          438.00         1834.00
HugePages_Total             0.00            0.00            0.00
HugePages_Free              0.00            0.00            0.00
HugePages_Surp              0.00            0.00            0.00

From virsh capabilities, example NUMA layout:

    <topology>
      <cells num='2'>
        <cell id='0'>
          <memory unit='KiB'>65872056</memory>
          <cpus num='12'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0,12'/>
            <cpu id='1' socket_id='0' core_id='1' siblings='1,13'/>
            <cpu id='2' socket_id='0' core_id='2' siblings='2,14'/>
            <cpu id='3' socket_id='0' core_id='3' siblings='3,15'/>
            <cpu id='4' socket_id='0' core_id='4' siblings='4,16'/>
            <cpu id='5' socket_id='0' core_id='5' siblings='5,17'/>
            <cpu id='12' socket_id='0' core_id='0' siblings='0,12'/>
            <cpu id='13' socket_id='0' core_id='1' siblings='1,13'/>
            <cpu id='14' socket_id='0' core_id='2' siblings='2,14'/>
            <cpu id='15' socket_id='0' core_id='3' siblings='3,15'/>
            <cpu id='16' socket_id='0' core_id='4' siblings='4,16'/>
            <cpu id='17' socket_id='0' core_id='5' siblings='5,17'/>
          </cpus>
        </cell>
        <cell id='1'>
          <memory unit='KiB'>66058400</memory>
          <cpus num='12'>
            <cpu id='6' socket_id='1' core_id='0' siblings='6,18'/>
            <cpu id='7' socket_id='1' core_id='1' siblings='7,19'/>
            <cpu id='8' socket_id='1' core_id='2' siblings='8,20'/>
            <cpu id='9' socket_id='1' core_id='3' siblings='9,21'/>
            <cpu id='10' socket_id='1' core_id='4' siblings='10,22'/>
            <cpu id='11' socket_id='1' core_id='5' siblings='11,23'/>
            <cpu id='18' socket_id='1' core_id='0' siblings='6,18'/>
            <cpu id='19' socket_id='1' core_id='1' siblings='7,19'/>
            <cpu id='20' socket_id='1' core_id='2' siblings='8,20'/>
            <cpu id='21' socket_id='1' core_id='3' siblings='9,21'/>
            <cpu id='22' socket_id='1' core_id='4' siblings='10,22'/>
            <cpu id='23' socket_id='1' core_id='5' siblings='11,23'/>
          </cpus>
        </cell>
      </cells>
    </topology>

Example (virsh edit instance) instance CPU/memory definition for the above layout:

  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <vcpu placement='static' cpuset='8-9,20-21'>4</vcpu>

The logical CPUs in the cpuset are all within one NUMA node (cell with id 1) and the memory fits within the 64GB available to that node.