jump to navigation

Cisco Call Manager database replication failure due to incorrect MTU April 28, 2012

Posted by jamesisaac in Uncategorized.
Tags:
add a comment

This weekend we changed from one WAN ethernet circuit to a different one for our metro ethernet. We have a Call Manager publisher on one side and a subscriber on the other side. After changing the circuit, we discovered that we couldn’t dial out from the subscriber side to anything other than local (in the building) phones. What’s going on? The servers were unchanged; we used the same routers and interfaces for the new circuit. The only thing that happened was unplugging one cable and plugging in another one. Why would the phone routing fail in such a way?

Here’s the other interesting bit of business: during our troubleshooting, we rebooted the local (subscriber) call manager. We then tested the phones and found that we could dial out! Great, problem solved. Except that 10 minutes later we couldn’t dial out again. Putting two and two together, we realized that when the phone was registered to the publisher, we could dial out. When the subscriber rebooted and the phones re-registered to the local CCM server, our calls failed again. Hmm. It didn’t seem to be a routing problem, because all of our ping testing was successful. And when the phone is registered to the publisher, everything works! So clearly it can’t be a network problem.

At this point TAC transferred us to the database replication team and we started getting packet dumps and analyzing them with Wireshark. The smoking gun appeared. During the database replication from the publisher to the subscriber, there’s a large packet containing the CCM certificate. This packet is tagged as “do not fragment”. It was not reaching the subscriber, even though other packets around it were. Thus – the MTU.

We changed the MTU on both servers to 1400 and retested the database replication. The network tests immediately passed, and the replication began. Once the pub and sub were in sync, our calls outside the building were successful.

So why would this change? My guess is that our telco provisioned the replacement ethernet circuit with their own VLAN tags on it – to run multiple companies’ traffic over the same circuit, which result in a smaller total frame available to us. Plus we had our own VLAN tags inserted (VLAN within VLAN). A large packet with a “do not fragment” packet will get dropped instead of fragmented. We had to change the server behavior to build a smaller packet, and once we did that, everything worked.

 

Advertisements

VLAN across WAN July 9, 2009

Posted by jamesisaac in Uncategorized.
Tags: ,
add a comment

Stop me if you’ve heard this: you can’t extend a VLAN across a WAN. Or the alternative comment: you can, but why would you want to? After all, a VLAN is a container for a broadcast domain, right? And those are done with local, physical entities. Routers act to block broadcasts, so your broadcast domain can’t extend past a router.

Sure, that’s true to one degree or another. In a bandwidth-constricted environment, forwarding all your broadcasts across a small pipe is a recipe for disaster. But what if you’ve got a larger pipe, say, 10mb ethernet, and you promise that you’ll selectively forward some VLANs and not others? Then can you do it?

I pursued this for practical and theoretical reasons, and found that you can in fact span a VLAN across a WAN with by reaching waaaaay back and building a bridge. Yep, we’re going to bridge that WAN.

I have two routers, with two ethernet interfaces each. Fast0/0 is the inside and Fast0/1 is the outside on both routers. The secret is to create subinterfaces and encapsulate dot1q for your subinterfaces. That puts the VLAN tag on that traffic. Then, just enable bridging for each respective subinterface, and you’re gold.

This config is for Cisco routers. YMMV.

bridge crb

!

!

interface FastEthernet0/0

description Corp local network

no ip address

duplex auto

speed auto

!

interface FastEthernet0/0.1

encapsulation dot1Q 3

ip address 192.168.1.1 255.255.255.0

no snmp trap link-status

!

interface FastEthernet0/0.102

encapsulation dot1Q 102

no snmp trap link-status

bridge-group 102

!

interface FastEthernet0/0.103

encapsulation dot1Q 103

no snmp trap link-status

bridge-group 103

!

interface FastEthernet0/1

description Interface to DC

no ip address

duplex auto

speed auto

!

interface FastEthernet0/1.3

encapsulation dot1Q 3

ip address 192.168.2.1 255.255.255.252

no snmp trap link-status

!

interface FastEthernet0/1.102

encapsulation dot1Q 102

no snmp trap link-status

bridge-group 102

!

interface FastEthernet0/1.103

encapsulation dot1Q 103

no snmp trap link-status

bridge-group 103

!

bridge 102 protocol ieee

bridge 103 protocol ieee

So what I did was, I have built three subinterfaces on this wire. VLAN 3 is routed using a subnet on one side and a different subnet on the other, with a tiny subnet inbetween to glue the two networks together. We use VLANs here even though this is just a routed network because the network ports on either side are full 802.1 trunk ports. VLAN 102 and VLAN 103 are true “broadcast” VLANs. There’s no ip information contained in them, because you don’t use ip routing with a bridge. The secret sauce is configuring a bridge-group for each VLAN and then turning on broadcast traffic with the “bridge 102 protocol ieee” command. This doesn’t show up explicitly in the configs but is not on by default (at least in the version of code I was using).  The other router should be configured identically, except that the VLAN 3 information would be for the local network on the other side. Use the same VLAN encapsulation and bridge-group numbering.

I don’t recommend doing this with your main networks, as you will then be sending all of your broadcast traffic across the wire for (probably) no good reason. I’m doing it to fix some workstation deployment issues using a non-standard PXE boot appliance, as well as just to see if it is possible. Using VLANs in this manner essentially makes your VLAN’ed network portable between physical networks. Since the ip addresses don’t change (remember, it’s not a routed network), you can move your devices around from one site to another without having to renumber them. Keep in mind that their default router may be on the other side of the physical network, though, so you may want to fix that once you finish moving devices around.

Bits and Pieces June 18, 2009

Posted by jamesisaac in Uncategorized.
Tags: , , ,
add a comment

The datacenter-in-waiting is starting to take shape in the corner of the server room. I’ve got the Cisco routers set up so they talk to each other over ethernet, and our new data center network is logically separated from the rest of the network.

  • Configured the Belkin KVM-over-IP. It has a very simple interface; you connect to the web page and *bam*, you’re looking at the server switcher. Nothing extraneous here. Mouse tracking is a little finicky and seems to depend mostly on the “enhanced pointer” control inside the remote OS.
  • Looks like I will have to figure out how to trunk VLANs across the fiber to the DC for our phone integration. The Cisco guy was talking about a Layer 3 VLAN, or virtual interface, or something like that. Time to do some research.
  • Received one of our modem servers from www.siliconmechanics.com; they’re a systems integrator. Great to do business with. Problem of the day, though, is this: the new server has an Intel SATA controller. XP doesn’t have a native driver (and yes, we’re using XP on the server). With no floppy drive, how do you load XP? Check out www.nliteos.com if you haven’t yet – it’s amazingly easy to build a bootable Windows XP or 2003 CD with your text-mode drivers pre-installed. I’m making a few for each of our custom servers with the RAID controller, NIC, and video drivers pre-loaded. Had the same problem with an HP DL320 G5p server – I stuck in the SmartStart CD and it said, “Sorry, this disk controller is not supported.” What? What a monumental error HP made on that deal. How can they ship a server that doesn’t run SmartStart? Anyway, nLite to the rescue. I downloaded the SATA drivers from HP and built a new W2k3 installer CD and away we went.