jump to navigation

“cannot change the host configuration” August 14, 2009

Posted by jamesisaac in Uncategorized.
Tags: ,
2 comments

Here’s a bizarre problem I encountered with vSphere and VMFS.

I have an iSCSI SAN presenting 5 LUNs to vSphere. I set up the first server, added the LUNs, formatted and named them, and all is well. I added the second server, and could only add two of the 5 LUNs. With the other three, I could see the LUN in the “add storage” configuration page, but after going through the wizard, vSphere errored out with “cannot change the host configuration”. There’s not a lot of documentation on this error, but I found the secret here – it’s a bug in Virtual Center. Or at least it appears to be.

The solution is to use the vSphere client to connect directly to the host, not Virtual Center, and add the LUNs there. Works perfectly and they show up in Virtual Center as you’re adding them.

Advertisements

DSS iSCSI failover solved July 25, 2009

Posted by jamesisaac in Uncategorized.
Tags: , ,
add a comment

Two weeks ago I was in the throes of a confusing puzzle of how to make iSCSI failover work with the open-e DSS. I was searching for the magic switch that would make it work, and I found it – “it” being ASCII character 20h, 32 decimal, good ol’ space.

Here’s the rundown – the DSS high availability construction kit goes like this:

  1. Create a volume on your “source” DSS.
  2. Create an iSCSI lun in that volume.
  3. Create a replica volume on the “target” DSS (“target”, “replica”, whatever you want to call it)
  4. Create an identical iSCSI lun, same name, same lun number.
  5. Configure a volume replication job on your source DSS. Here’s the important thing: don’t create the job name with a space. So, “replicate lv0000” will work for the replication job, but it won’t even show up in the iSCSI failover job list. Create your job and call it “replicate_lv0000” instead.
  6. Start the replication job and wait until the volumes are synchronized.
  7. Configure iSCSI failover – you should see your replication job listed.

It’s amazing and a little disconcerting to think of all the time wasted because one part of the UI allowed a job with a space in the name, and another part of the UI wouldn’t list jobs with spaces in their names.

Now, arguably, I didn’t run this by support, nor have I seen the source code, so I may be barking up the wrong tree. All I know is that 20 hours later, the space character is the one change that I made which allowed everything else to work.

DSS issues July 12, 2009

Posted by jamesisaac in Uncategorized.
1 comment so far

Configuring one open-e DSS for iSCSI? Easy. Takes about 15 minutes. A little longer to format the volume, but actual config time is quick.

Configuring two DSS’es to talk to each other and do the iSCSI automatic failover? Well, I’m ten hours in and it doesn’t work yet. I’m sure there’ s a magic switch somewhere that I’m just not finding, because it looks like it should work. My problem at this point is that I just don’t get, conceptually, how they’re doing the networking. I have a three-segmented network (LAN/management, iSCSI, and replication channel), and I can’t figure out what ip address to put where when doing the configs. I wish that e-open had a set of documentation that said, “Ok, here’s what’s going on behind the scenes. Figure it out.” That actually would help.

VLAN across WAN July 9, 2009

Posted by jamesisaac in Uncategorized.
Tags: ,
add a comment

Stop me if you’ve heard this: you can’t extend a VLAN across a WAN. Or the alternative comment: you can, but why would you want to? After all, a VLAN is a container for a broadcast domain, right? And those are done with local, physical entities. Routers act to block broadcasts, so your broadcast domain can’t extend past a router.

Sure, that’s true to one degree or another. In a bandwidth-constricted environment, forwarding all your broadcasts across a small pipe is a recipe for disaster. But what if you’ve got a larger pipe, say, 10mb ethernet, and you promise that you’ll selectively forward some VLANs and not others? Then can you do it?

I pursued this for practical and theoretical reasons, and found that you can in fact span a VLAN across a WAN with by reaching waaaaay back and building a bridge. Yep, we’re going to bridge that WAN.

I have two routers, with two ethernet interfaces each. Fast0/0 is the inside and Fast0/1 is the outside on both routers. The secret is to create subinterfaces and encapsulate dot1q for your subinterfaces. That puts the VLAN tag on that traffic. Then, just enable bridging for each respective subinterface, and you’re gold.

This config is for Cisco routers. YMMV.

bridge crb

!

!

interface FastEthernet0/0

description Corp local network

no ip address

duplex auto

speed auto

!

interface FastEthernet0/0.1

encapsulation dot1Q 3

ip address 192.168.1.1 255.255.255.0

no snmp trap link-status

!

interface FastEthernet0/0.102

encapsulation dot1Q 102

no snmp trap link-status

bridge-group 102

!

interface FastEthernet0/0.103

encapsulation dot1Q 103

no snmp trap link-status

bridge-group 103

!

interface FastEthernet0/1

description Interface to DC

no ip address

duplex auto

speed auto

!

interface FastEthernet0/1.3

encapsulation dot1Q 3

ip address 192.168.2.1 255.255.255.252

no snmp trap link-status

!

interface FastEthernet0/1.102

encapsulation dot1Q 102

no snmp trap link-status

bridge-group 102

!

interface FastEthernet0/1.103

encapsulation dot1Q 103

no snmp trap link-status

bridge-group 103

!

bridge 102 protocol ieee

bridge 103 protocol ieee

So what I did was, I have built three subinterfaces on this wire. VLAN 3 is routed using a subnet on one side and a different subnet on the other, with a tiny subnet inbetween to glue the two networks together. We use VLANs here even though this is just a routed network because the network ports on either side are full 802.1 trunk ports. VLAN 102 and VLAN 103 are true “broadcast” VLANs. There’s no ip information contained in them, because you don’t use ip routing with a bridge. The secret sauce is configuring a bridge-group for each VLAN and then turning on broadcast traffic with the “bridge 102 protocol ieee” command. This doesn’t show up explicitly in the configs but is not on by default (at least in the version of code I was using).  The other router should be configured identically, except that the VLAN 3 information would be for the local network on the other side. Use the same VLAN encapsulation and bridge-group numbering.

I don’t recommend doing this with your main networks, as you will then be sending all of your broadcast traffic across the wire for (probably) no good reason. I’m doing it to fix some workstation deployment issues using a non-standard PXE boot appliance, as well as just to see if it is possible. Using VLANs in this manner essentially makes your VLAN’ed network portable between physical networks. Since the ip addresses don’t change (remember, it’s not a routed network), you can move your devices around from one site to another without having to renumber them. Keep in mind that their default router may be on the other side of the physical network, though, so you may want to fix that once you finish moving devices around.

WAN and VLAN issues July 8, 2009

Posted by jamesisaac in Uncategorized.
Tags:
add a comment

With the arrival of the DSS SANs, the project is moving ahead with great speed. We purchased two SANs from Silicon Mechanics, with the goal of configuring synchronous replication between them. They arrived each in a large box, containing the 2U chassis and a large pink foam insert with each drive packaged up separately. The Silicon Mechanics technicians had loaded the drives, configured the RAID groups, then taken everything apart and packed the drives along with instructions for reassembling everything at the destination site. I assume this improves the reliability by not shipping the storage server with drives installed. They did a very thorough job of protecting everything. The end result is 3.5 TB of fast, SAS-based storage. I’m extremely happy that we are able to use 1TB SAS drives for one of our RAID groups and not have to burn drive slots just to accomodate our larger, but less i/o-intensive vm’s.

I spent a few hours figuring out how to trunk VLANs across the WAN and into the datacenter setup. On the Cisco router side, this involves configuring subinterfaces on the inside and outside interfaces of each router, and setting the “encap dot1q” for each subinterface. It’s an interesting game to play to try to telnet into the appropriate interface and change the ip config of the other interface, so that you don’t cut off the limb that you’re standing on, so to speak. After a few tries I resorted to the console cable and got everything squared away.

The Netgear switches proved to be another mindbender, though, as even though I’ve done this before (and documented it), the VLAN trunking is just not intuitive for someone with a Cisco background. The other troublesome part of the equation is that some devices natively understand and inject their own VLAN information (i.e., the VMWare host servers), but others do not and have to have their native VLAN set at the port. In the end, I found it easiest to set the native VLAN for each device to something other than 1 – that way I was certain that if I was reaching a device, it was through the appropriate VLAN.

DD510 and Backup Exec cross-domain backup June 29, 2009

Posted by jamesisaac in Uncategorized.
Tags:
add a comment

We purchased one Data Domain DD510 appliance, which I intend to use as a target for backing up our data at the data center. This will replace three separate servers using BackupExec and Ultrium tape drives. I ran into a snag because there are two separate domains in our environment.

I initially installed the DD510 in “Active Directory” mode, which used the LDAP connector to authenticate into our AD. No problems there – everything worked fine and I could set security and map shares from any server in the joined domain. However, BackupExec in the other domain refused to allow me to create a “backup-to-disk” folder on the DD510. Apparently this is aknown issue, as googling for “Backup Exec backup to disk access denied” returns many links.

I tried changing the Backup Exec services accounts to use pass-thru authentication and even tinkered with trusting across domains, but had no luck until I removed the DD510 from our domain and put it back into Workgroup authentication. After that, BE worked like a charm.

The key is to create a backup user on the DD510, create local users on whatever servers BE is running on with the same username and password, and then set BE to use that username and password for the services. So now the DD510 is back to being a backup appliance instead of a general-purpose file server repository – which is a little less flexible, but probably more controllable.

After running backups for a week onto the device, I am suitably impressed. Backup-to-disk is much faster than even the local Ultrium tape drive that I was using, and the dedupe reduces each additional full backup by 95% as promised. YMMV, of course – what remains is the delta between the two backups, which the on-disk compression reduces even further.

One remaining issue is that we have several folders full of many small files (like hundreds of thousands of small files), and performance is abysmal when backing up those files. I suppose it’s due to the overhead of all the security descriptors and other metadata that each file carries with it. I’m going to investigate doing an image backup instead of a file-by-file backup and see if that gives us the necessary performance.

Bits and Pieces June 18, 2009

Posted by jamesisaac in Uncategorized.
Tags: , , ,
add a comment

The datacenter-in-waiting is starting to take shape in the corner of the server room. I’ve got the Cisco routers set up so they talk to each other over ethernet, and our new data center network is logically separated from the rest of the network.

  • Configured the Belkin KVM-over-IP. It has a very simple interface; you connect to the web page and *bam*, you’re looking at the server switcher. Nothing extraneous here. Mouse tracking is a little finicky and seems to depend mostly on the “enhanced pointer” control inside the remote OS.
  • Looks like I will have to figure out how to trunk VLANs across the fiber to the DC for our phone integration. The Cisco guy was talking about a Layer 3 VLAN, or virtual interface, or something like that. Time to do some research.
  • Received one of our modem servers from www.siliconmechanics.com; they’re a systems integrator. Great to do business with. Problem of the day, though, is this: the new server has an Intel SATA controller. XP doesn’t have a native driver (and yes, we’re using XP on the server). With no floppy drive, how do you load XP? Check out www.nliteos.com if you haven’t yet – it’s amazingly easy to build a bootable Windows XP or 2003 CD with your text-mode drivers pre-installed. I’m making a few for each of our custom servers with the RAID controller, NIC, and video drivers pre-loaded. Had the same problem with an HP DL320 G5p server – I stuck in the SmartStart CD and it said, “Sorry, this disk controller is not supported.” What? What a monumental error HP made on that deal. How can they ship a server that doesn’t run SmartStart? Anyway, nLite to the rescue. I downloaded the SATA drivers from HP and built a new W2k3 installer CD and away we went.

Parts and issues June 14, 2009

Posted by jamesisaac in Uncategorized.
add a comment

A month in, and the parts are starting to show up.

Received so far:

  • 2 Cisco 2800 routers, one for main office, one for DC.
  • 1 Data Domain DD510 disk appliance, for use as a backup-to-disk target.
  • 4 Netgear switches, 2 7224R 24-port and 2 7248R 48-port.
  • Belkin KVM with ip
  • Box o’ ethernet cables. Unfortunately my color requests didn’t get through purchasing so I have a box of 7 foot grey cables.

Issues identified to far:

  1. Installing the Cisco switches between the main office and the DC means re-addressing the ip space already in use at the DC. So far I’ve provisioned several devices at the DC (including our firewall) using our main office ip space and just bridging across the fiber. This will become unusable as we move more servers over and eventually move our voice lines, as I won’t be able to set QoS across the bridge. So I’m going to have to bite the bullet and actually route traffic to the DC. I should have done this from the beginning but didn’t have the routers until now.
  2. The Data Domain DD510 is a nice box, but I have two problems with it already – not it’s fault, just the architecture. First, I want to backup-to-disk using Veritas BackupExec. That’s fine, BE supports backup-to-disk folders. The problem is that I have servers in two domains. BE doesn’t provide any method of authentication the backup-to-disk folders. So if I put the DD510 in one domain, then I can’t backup from the BE instance in the other domain. Sux0rs. I think I will have to reach across from one domain to the other with a single instance of BE so I can write to the DD510. Second problem is that all of this traffic may put too high a load on the network. I think I may get another Netgear switch just for the backup network and dedicate a NIC on each server and VMWare host for the backup network.
  3. The Netgear switches were an interesting exercise in configuration. They’re clearly trying to look like Cisco IOS, but not quite exactly the same – probably due to legal reasons. If all you are doing is plugging things in – they work great with no configuration hassles. But for configuring VLANs, it’s a whole different ball of wax. I had an “a-ha!” moment when I figured out how they do VLAN trunks – essentially all traffic is tagged on the trunk port (i.e., your uplink port) and then the other ports are members of your vlan but not tagged. That means they will get traffic from the desired vlan and not have to deal with tagging and untagging the frames on the server. It makes sense once you see what Netgear is doing.
  4. Potential future issue: the vSphere licensing is coming, but I have now found out that the SAN software is not yet vSphere (4.0) certified. We’re using Open-E DSS version 5, which is certified for ESX 3.5. Supposedly DSS version 6 will be vSphere-compatible, but it’s in beta. I also believe there will be a charge to upgrade from 5 to 6. Shoulda waited another two months and just bought version 6 – but then we’d be two months behind. It will probably be released by the time we’re done testing, and then we’ll get to test how the production SAN deals with a software upgrade. That should be fun.

Planning May 14, 2009

Posted by jamesisaac in Uncategorized.
add a comment

Now that the project is approved, it’s time to nail down the actual nuts and bolts of how the pieces are going to fit. We’re going to be installing a full 42U rack full of gear – internet and WAN routers, four switches, several VMWare host servers, and some non-virtualized physical servers, plus two SANs. The design goal for this project has been to identify failure points and work around them. For example, there will be two 48-port LAN switches to connect all the server LAN traffic together, and two 24-port SAN switches to connect the iSCSI network. We also have two SANs with a dedicated replication channel between them.

So, my big project at this point seems simple, but will have a lasting impact on the whole project: figure out how to cable everything together. I’m thinking of going with six colors of ethernet cabling, based on what is being carried by each cable:

  • LAN traffic to Switch A
  • LAN traffic to Switch B
  • iSCSI traffic to Switch C
  • iSCSI traffic to Switch D
  • internet-connected devices
  • KVM

I think I’ll go with red for internet and yellow for KVM. Then perhaps a light blue and dark blue for LAN and a light green and dark green for iSCSI. I just have this idea in my mind that I want to be able to look at a server and visually see that it has all the proper redundant cables plugged in, and then look at the switches and verify that they are all the right color and we’re not mixing traffic. It’s easier to think about this now rather than after everything is in place and there’s a rat’s nest behind the rack.

The other problem, though, is cable length. I have very little confidence in my ability to make custom-length cables – in my experience, it usually takes me three tries to come up with one RJ45 cable that tests out correctly. But we are talking about potentially 150+ cables, so I want to make sure that I don’t have extra loops dangling in the cage. I have the rack layout built, so the next exercise will be to estimate distance between each device and the switch it will plug into, and figure out if a combination of 3′ and 6′ cables will suffice. I’m just afraid causing the Flying Spaghetti Monster to spontaneously emerge from a mess o’ cables, waving his noodly appendages at my servers.

Approved! May 12, 2009

Posted by jamesisaac in Uncategorized.
add a comment

Requisitions are approved and off to purchasing. Had a lengthy meeting today to discuss all of the key redundancy and failure points still remaining. No matter what you do, there will always be a failure point somewhere. It’s just a matter of identifying them and deciding what the cost is to solve vs. the risk of failure.