Question for the experts on Spanning Trees

The network is the forum.

Moderators: Steel, notfred

Question for the experts on Spanning Trees

Postposted on Tue Aug 02, 2011 1:46 pm

Hi guys I have a question about spanning trees.

So at work we have four alcatel-lucent 9700 or 7700 switches. They're arranged in a full mesh with redundant back-bone links. The ultimate goal is to figure out what settings provide the best recovery time with respect to a full switch or link failure. In addition, a guy I work with for some reason really wants to monitor all of the bpdu's going between each switch over each link. Is there a standard practice for determining these kinds of settings?

The bulk of our traffic is multicast. Using RSTP we get in the case of a root bridge failure a recovery time of roughly 7 seconds (maybe 5 at best). Recovery being our multicast stream starts making its way from a source to a destination again. Optimally we would like a two second recovery (or really optimally one second).

Do any of you have any experience trying to solve these kinds of problems?
ApockofFork
Gerbil First Class
 
Posts: 149
Joined: Thu Nov 30, 2006 9:34 pm

Re: Question for the experts on Spanning Trees

Postposted on Wed Aug 03, 2011 1:48 pm

Sorry, I've done more routing than switching and then more as a developer rather than a user. Wikipedia has mention of RSTP http://en.wikipedia.org/wiki/Spanning_T ... .28RSTP.29 as recovering in 3x hello time, so maybe you could try reducing the hello time to something like 500ms if the switches support it.

Sniffing BPDUs isn't something that I would consider unless you are troubleshooting and then you are going to need to dump them in to a pcap format, load them in to wireshark and sit there reading the standard and working out exactly what is going on. It can be the only thing to do when things go wrong but it is painful and slow and not something I would consider doing normally.
notfred
Grand Gerbil Poohbah
 
Posts: 3650
Joined: Tue Aug 10, 2004 9:10 am
Location: Ottawa, Canada

Re: Question for the experts on Spanning Trees

Postposted on Wed Aug 03, 2011 3:05 pm

The last time I worked on a large network were somebody wanted a very low recovery time, the ones that did the design trimmed as much as they could and in a bare FAT on a table everything worked. In reality after installation. 1 year after implementation, we were still troubleshooting and resetting spanning tree settings to default timers, etc. Of course, the design sounds a whole lot different since our core was routed, and spanning tree only ran in the distribution/access layer.

All things considered, if you are gonna screw around with the timers, be prepared to spend some time in the lab with a mirror setup of your production enviroment before you commit to anything. I have exactly zero experience with Alcatel Lucent, but if they support something that works like the Nortel/Avaya SMLT it might be a much better fit than spanning tree if you need really fast recovery times. But it depends on the design, and I dont really get how your design looks from what you write in the post.
Aphasia
Grand Gerbil Poohbah
 
Posts: 3355
Joined: Tue Jan 01, 2002 6:00 pm
Location: Solna/Sweden

Re: Question for the experts on Spanning Trees

Postposted on Wed Aug 03, 2011 5:57 pm

Aphasia wrote:All things considered, if you are gonna screw around with the timers, be prepared to spend some time in the lab with a mirror setup of your production enviroment before you commit to anything. I have exactly zero experience with Alcatel Lucent, but if they support something that works like the Nortel/Avaya SMLT it might be a much better fit than spanning tree if you need really fast recovery times. But it depends on the design, and I dont really get how your design looks from what you write in the post.


Nortel/Avaya SMLT works like a champ. We implemented where I work. I'm not a switch expert by any means but it was pretty easy to get going.
IrateAdmin
Gerbil In Training
 
Posts: 6
Joined: Sun Apr 24, 2011 7:16 pm

Re: Question for the experts on Spanning Trees

Postposted on Thu Aug 04, 2011 1:20 pm

Hi guys thanks for the responses. As an answer to a few of your questions.

Firstly a better description of the network. Its very simple setup. We have four switches and each switch has two connections to each other switch. Its a very simple setup thats not very large. In most cases we don't even do any routing the entire network is a single flat lan. (In some cases we run multiple vlans).

SMLT looks to be some implementation of link aggregation which makes a lot of sense. It would simplify the network so that the two connections between each switch would really look only like one. The switches have the capability to do that and to be honest I'm not sure why it has been used in the past. I know I ran a really quick test one afternoon and it seemed to help with network recovery in the case of a single link failure. It would provide instant roll over to the remaining link but didn't really speed up things if a whole switch went down. I could definitely investigate it further.

On the topic of timers we actually run the lowest timers the switch will let us use. We use a max age time of 6, a forward delay timer of 4, and a hello time of 1. Those are the only timer variables associated with the spanning tree that we use. I think there is another variable called hold count that we can mess with but traditionally haven't.

We have at times pulled out wireshark and mirrored some ports on the switch to watch the bpdu's going back and forth. Using that we can watch to see when topology change notifications stop showing up which is helpful for determining when a change has propagated. At least in theory... usually the results are rather random and don't really tell us much.

I don't know if any of that information will actually help you guys be able to help me solve my problem any better but I appreciate the effort. Thanks!
ApockofFork
Gerbil First Class
 
Posts: 149
Joined: Thu Nov 30, 2006 9:34 pm

Re: Question for the experts on Spanning Trees

Postposted on Thu Aug 04, 2011 6:18 pm

SMLT is a form of aggregation or MLT that lets you distribute an aggregated trunk over several switches instead of using spanning tree. Failover times is usually measured in ms as in 100ms up to 500ms instead of seconds like it is with spanning tree. In many cases you really don't notice anything at all from a client-server perspective. Although at the current time I dont know if any other companies implement it other than Nortel / Avaya, but it that case it probably has a proprietary setup/name. Havent been able to get any hands on with Cisco Nexus yet to see if they allow link aggeragation over their "virutal switches" that can be setup to cross chassis. And as I said, I dont know if Alcatel Lucent has something similar.
http://en.wikipedia.org/wiki/Split_multi-link_trunking


As for the Spanning Tree topology changes. The switch should be able to log them and output them as syslog/snmp-trap. At least Cisco logs it so it's easily viewed by your choice of log server. And since most solutions can easily be setup to give you a mail/sms or parse it to whatever you need depending on the event. But that only gives you a heads up when the topology changed. if you on the other hand need to know the exact time of a the topology change in a lab-setup, it might not be for much use. Have you looked at the debug features of each switch instead of trying to use wireshark ?

A more specific solution is hard too really help you with. Would probably be best to ask somebody that have more experience with Alcatel Lucent if the generic ones like spanning tree doesnt work. And it may be that the solutions isnt neccessarily completely limited to the network but need to be a hybrid of network/server, or could even be one of education of the users if its not a technical must have that you recover within 2 seconds. Even though your description of the network is simple enough, it really doesnt say anything about the function of the application. You mentioned that you use multicast. Are sources distributed to multiple switches or single source connected to different switches, etc. Do you need the full mesh because of bandwidth usage, etc?
While I probably cant do anything about it, I'm a stickler for trying to state things cleary. A quick search mentions Ring RSTP support in some Alcatel-Lucent switches that is supposed to provide sub 100ms convergence time, but that would require you do change your full mesh to a ring topology instead, etc.

From earlier discussions I would say that we arent really more than a handful of people here in the TR forums that deal with larger enterprise networks though, so some posts in other foras might be a good thing or perhaps a mail to your Alcatel Lucent supplier for ideas.
Aphasia
Grand Gerbil Poohbah
 
Posts: 3355
Joined: Tue Jan 01, 2002 6:00 pm
Location: Solna/Sweden

Re: Question for the experts on Spanning Trees

Postposted on Fri Aug 05, 2011 6:40 am

Hi thanks for the response. I might go ask elsewhere. I think you might be on to something. The correct solution to this is probably some combination of the switch's logging and alcatels monitoring software called Omnivista. Sorry for being vague. The real problem with the our network is less the hardware and more how we use it. Its for a military application (thus the piles of redundancy) and data distribution. We actually screw up the whole multicast system by statically assigning ports to certain addresses and not letting people join/leave. Trying do things like that can sometimes confuse the switches. The other problem is i'm not a network/IT guy at all who some how has an IT job amongst people who really aren't IT people either (yay government...) anyways thanks for helping guys, you at least gave me a few ideas on how to go about it.
ApockofFork
Gerbil First Class
 
Posts: 149
Joined: Thu Nov 30, 2006 9:34 pm

Re: Question for the experts on Spanning Trees

Postposted on Fri Aug 05, 2011 11:15 am

You might try at Ars Technica too, were there are a few more than here that works with larger networks, although I dont know how many uses Alcatel-Lucent, and the main discussions usually revolve around Cisco on the switching side, and some Dell/HP for the budget inclined. Had you used Cisco or Nortel I could've come much closer since that it what I work with. Or anything non brand specific for that matter :P

Hope you find something that fits your needs, or that your omniswitch support Ring RSTP and you can switch the mesh to a ring. But my answer to the customer would've probably have been in line with something like... if somebody didnt think of it before implementation or did a correct specification for the abilities... then its either to rebuild the network/get new gear if the application really is that important, or take the 5-6 seconds that rapid.STP would be able to give. Having a larger switch break down, usually constitute a larger problem then 5 seconds of downtime for many. Of course, there are always exceptions to anything. And I would hazard that military just like the financial world has quite a few exceptions that really arent ruled by anything remotely technical from the beginning.
Aphasia
Grand Gerbil Poohbah
 
Posts: 3355
Joined: Tue Jan 01, 2002 6:00 pm
Location: Solna/Sweden


Return to Networking

Who is online

Users browsing this forum: No registered users and 3 guests