Friday, 7 May 2010

Multihomed DNS and how Windows makes you lazy

I was called in to solve an interesting problem today. There's a good principle in effective IT security to segregate different services into LANs appropriate to their function. If there is more than one function, then more than one NIC (perhaps a VLAN ID) is required, with a unique IP address.

Windows is a great tool, a platform of continual development (and yes, that means it gets better, so don't think I've always found it to be great) over the last two decades that now runs a fair chunk of global business, and in some demanding environments. One of the simple beauties of the platform is the unified codebase, libraries and APIs from the smallest XP Embedded right up to Windows Server Datacentre Edition: The machine I'm developing on is almost identical, apart from scale, to the machine I'm likely to deploy on.

Yes, there are other differences, but I find most of them to be paid features like clustering and more speed (I can't do it captain). The hardware abstraction layer and other consistencies like the IP stack, filesystems and memory management are wonderful tools for developers. Unfortunately, so many admins cut their teeth on Windows desktop editions, or at least smaller servers under their absolute control, that they struggle to make the transition to enterprise administration.

My previous rant about NetBIOS is a case in point. With all this abstraction, details like network interfaces and network service location are so well hidden, they're essentially invisible. Ever try to catch the Invisible Man to ask him what he's doing?

The problem I had to solve today was around multihomed servers. Windows IT admins tend to be lazy, and NetBIOS broadcasts are only one factor where we rely on the wizardry of the OS to figure out what we're trying to do and make it happen. Dynamic DNS registration removes some of the tedium and mistakes from the process of getting systems deployed, but blindly assuming it knows what you want is just wrong.

The convergence of an Active Directory domain and the DNS namespace is a nifty feat, but in multihomed systems it's a nightmare. If all interfaces are routable and reachable, then this is slightly moot, but put up a firewall or routing restriction in the way and intermittent problems (the worst kind) crop up, and troubleshooting without a solid foundation in networking is tough. DHCP, DDNS, NetBIOS, even APIPA, all seek to hide the complexity from Windows admins, and they end up woefully underskilled in the cornerstone that makes their network tick.

The problem in this instance is that the FQDN of the server is comprised of the hostname and the AD DNS Name. No problem for a typical, single-NIC server. Unfortunately, this is an abstraction when it comes to multiple NICs: just how is DNS supposed to know what your topology is when giving you an answer.

Trying to convince a religiously Microsoft admin to use a subzone to specify the interface is absorbed with something approaching heresy. Do a traceroute (or tracert for Windows guys) to any internet address and you'll see FQDNs of routers, with the hostname portion wildly different along the way, including multiple digit groups. Most of these are Internet routers, and the DNS entries correspond to the interface rather than the router itself.

Of course, Windows has a mild cow if you try to refer to it by anthing other than the system name as the first part of the FQDN, and always expects all interfaces to be present in the machine's dns suffix. The best solution...

Change the way you think about finding servers. When you're connecting, you're probably interested in a particular interface anyway. Some services may not even be listening on particular interfaces. Getting your brain tuned to how your network is built, using that to figure out how your systems are connected, and habitually spelling out exactly which way you want to connect by an explicit FQDN can only do good.

Of course, some applications take it as read that a server is reachable by the short FQDN. Sometimes system admins can be even more hardcoded. Both are very, very difficult to change.