Changes between Version 1 and Version 2 of Internal/Switches/Cisco4


Ignore:
Timestamp:
Oct 18, 2010 6:24:00 PM (13 years ago)
Author:
jhickey
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Internal/Switches/Cisco4

    v1 v2  
    22
    33=== Resetting a port on Cisco4 ===
    4 From Ted's message on the subject:
    54
    6 As some of you know, we've been having problems with some of the Cisco switch ports locking up and disrupting connectivity of experiments.  Several had gotten into this state on the pc733s and were being a real hassle.  We're still unsure what triggers this, but we can now restore dead ports.  We think that ports weren't dying at any spectacular rate, but that the dead ones were accreting.  The rest of this message will tell you how to diagnose a dead port and report it so we'll reset it. I've also got a trick you might want to try on your own to fix it if we're not around.
    7 
    8 First, its worth mentioning that port lockups seem rare.  Hopefully you'll never encounter this, but, if you do here's how you can help us diagnose it and get you quickly back to experimentation.
    9 
    10 A dead port usually shows up in an experiment as a dead link to a node.  The node can't hear packets sent to it, including arp messages, though the routing and configuration are correct.  The best way I've seen to diagnose a dead port if you're seeing connectivity problems is to try to ping the nodes in the experiment and then do a
    11 {{{
    12   $ portstats <project> <experiment>
    13 }}}
    14 on "users".  If there's a dead port it will show up as a line of all 0s in the output side of the table.  That's your cue that even arp messages aren't going out.
    15 
    16 At this point you can just send us mail to testbed-ops and tell us you think you have a dead port and we'll breathe life into it.  It's a quick process.  If you can leave your experiment swapped in, please do, so the port doesn't bite anyone else.  Restoring the port won't adversely affect your experiment.  If you decide to swap your experiment out and then back in to avoid the bad port, do let us know which machine had the bad port.  The "Reserved nodes" table in the "Show experiment" page will have the name of the machine we want.  In most cases it'll be pcddd (or bpcddd), where the d's are integers.
    17 
    18 If it's 4 in the morning and the ISI ops team is irresponsibly sleeping, you can try this idea.  Now, I haven't tried this, but I think it will work.  It might also get your experiment swapped out, or not improve matters at all.  Caveat emptor.
     5Once in a while a port will go down on Cisco4.  It would be nice to document how to reset the port here.