Changes between Version 15 and Version 16 of Installation/Hardware


Ignore:
Timestamp:
Jan 17, 2014 1:22:30 PM (11 years ago)
Author:
jhickey
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Installation/Hardware

    v15 v16  
    77 * Understand that DETER is about handing you physical nodes with physical networking.  Buy a bunch of cheap nodes instead of a few expensive nodes.  Given the way CPU prices scale you'll probably come out with more overall compute power anyway.
    88 * Consider the cost of people in addition to the cost of hardware.  Successfully setting up a DETER requires more than just hardware.
     9
     10=== Nodes ===
     11
     12DETER is a network testbed.  This means our nodes are different than your typical computing cluster and presents a little bit of an optimization problem based on budget.  If you have a fixed budget you should select your nodes as follows:
     13
     14==== Overall goal when purchasing nodes with a fixed budget ====
     15
     16You want to maximize the number of nodes in your testbed.  The more independent nodes you have, the more resilient your testbed will be.  If you have 20 nodes and 1 fails, you have lost 5% of your testbed capacity.  If you only have 4 high end nodes and one fails, you have lost 25%.  Experimental allocation is done at the node level, so more nodes means finer grained allocation if multiple researchers are making use of your testbed.  More nodes gives you the flexibility to duplicate experiments.  When we do major demos at DETER, we sometimes swap in a duplicate of an experiment so we have a hot spare should a problem with the main demo experiment arise.
     17
     18==== The four key parts of a testbed node ====
     19
     20DETER is a network testbed, not a compute cluster.  We have found that people need to counter some of their instincts when selecting testbed nodes.  What every testbed node needs to emphasize are:
     21
     22 * Network Connectivity.  Each node should have 6 Ethernet ports.
     23 * RAM in proportion of CPU cores
     24 * CPU with full virtualization support
     25 * IPMI control
     26
     27Here are some features that are not terribly important for testbed nodes:
     28
     29 * RAID controllers and high capacity storage.  Large, redundant storage should be centralized.
     30 * High end CPUs.  This is because we have to optimize for the maximum number of nodes in a fixed budget.
     31 * Redundant Power Supplies.  If you have more nodes and centralized storage, you can wait to get a power supply in the mail.
     32 * Generally exotic hardware design (blade centers, etc).  These typically emphasize computer power per node which is not our goal.
     33
     34==== Budget, High Node Count, and Selecting Nodes ====
     35
     36As an example, in DETER's most recent build-out, I managed to add 128 nodes with a budget of around $300k (including switches).  I also had to factor in power and weight constraints that are unique to the DETER server room.  Here is what I came up with:
     37
     38 * I used SuperMicro Microcloud servers.  These 3u servers contain 8 independent nodes.  They are simple Intel C20x based motherboards with two onboard NICs, an IPMI port, and a PCI-E slot.  They allowed me to fix 64 nodes per rack.
     39 * For CPUs I used Xeon E3-1260L processors to keep our power consumption and heat output low.
     40 * I was able to install 16GB of RAM per node.  I use unbuffered ECC RAM because DIMMs do go bad if you a large enough population.
     41 * For hard drives, I picked the least expensive Western Digital RE edition hard drive.  I basically exchanged capacity for a more reliable drive with a 5 year warranty.  Drives were ~$80 each for 250GB of capacity.
     42 * A quad port Intel I350 NIC per node.  This means each node has 5 dedicated experimental interfaces and 1 control interface.
     43 * HP 5400 series switches.  I ended up using 4 5412zl switches for the experimental network and 2 5406zl switches for the control network.
     44 
     45Areas where I could have reduced cost on this installation:
     46
     47 * Used separate commodity switches for the IPMI ports.  The IPMI ports don't technically need to be connected to a managed switch.  Personally, I did save a lot of time by using the advanced features of the HP switches to map the 128 IPMI mac addresses to nodes (I used a known wiring pattern).
     48 * Use Pentium G low power CPUs.  The Xeon E3 CPUs were about $300 when I did the build.  I could have saved $200 per node by selecting specific sub-$100 Pentium CPUs which work well with the C20x chipset and include support for ECC Ram.
     49 * Less RAM.  ECC Unbuffered RAM is still somewhat expensive.  Going with 8GB or 4GB per node would have saved ~$150-$100 per node.
     50
     51Areas where I wouldn't want to save money:
     52
     53 * Configuring a node with a dual port card instead of a quad port card could save some money.  Also, not buying Intel could save some money, but the Intel cards support advanced virtualization features and generally have very good driver support.
     54 * Using something other than HP switches.  The HP switches come with free support and firmware updates.  They also come with a lifetime hardware warranty.  They are the only switch type currently in use at DETER and therefore will be the best supported switch.
     55
    956
    1057=== Network Switches ===