Architecture Matters: WYCRMS Part 4. Windows Updates and File Locking

In 1997, a HP 9000 engineer wouldn't blink telling about a server that had been running continuously for over five years.I found this remarkable at the time, and couldn't imagine a Windows server lasting that long. I have moved on, and frankly expect my Windows servers to survive that long today. Very few share this position, and I'm trying to find out why it's so lonely on this side of the fence.

4. Windows Updates

It's Lab Time!

Open a console on Windows (7, 2008, whatever), and enter the following as a single line (unfortunately wrapped here, but it is a single line):

for /L %a in (1,1,30) do @(echo %a & waitfor /t 1 signal 2>NULL) >numbers

What's that all about you ask? Well, it sets up a loop, a counter (%a) that increases from 1 to 30. On each iteration, the value of %a is echod to the standard output, and WAITFOR (normally a Resource Kit file, included in Windows 7/2008 R2) pauses for one second waiting for a particular signal. 2>NULL just means I'm not interested that the signal never arrives and want to throw away the error message (there's no simple sleep command I can find in Windows available to the console). Finally >numbers sends the output to the file numbers.

As soon as you press enter, the value 1 is written to the first line of a file called numbers. One second later, a line is over-written with the value 2, and so on for thirty seconds.

Now open another console in the same directory while this first is running (you've got thirty seconds, go!) and type (note, type is a command that is typed in, it's not a command for you to start typing):

type numbers & del numbers

If the first command (the for loop) is still running, you'll get the contents of the file (a number), followed by an error when the delete command is attempty - this makes sense as the first loop is still busy writing to the file.

This demonstrates a very basic feature of Windows called File Locking. Put simply, it's a traffic cop for files and directories. The first loop opens a file and writes the first value, then the second, then the third, all while holding a lock on the file. This is a message to the kernel that nobody else is allowed to alter the file (deletes, moves, modifications) until the lock is released, which happens when the first program terminates or explicitly releases the file.

This is great for applications (think editing a Word document that someone else wants to modify), but when it comes time to apply updates or patches to the operating system or applications it can make things very complex. As an example, I have come across a bug in Windows TCP connection tracking that is fixed by a newer version of tcpip.sys, the file that provide Windows with an IP stack. Unfortunately, if Windows is running, tcpip.sys is in use (even if you disable every NIC), so as long as this file is being used by the kernel (always) it can never be overwritten. The only time to do this is when the kernel is not running - but then how do you process a file operation (performed by the kernel), when the kernel is not available.

Windows has a separate state it can enter before starting up completely where it processes pending operations. Essentially, when the update package notices it needs to update a file that is in use it tells Windows that there is a newer version waiting, and Windows places this in a queue. When starting up, if there are any entries in this queue, the kernel executes them first. If these impact the kernel itself (e.g. a new ntfs.sys), the system performs another reboot to allow the new version to be used.

This is the only time a reboot is necessary for file updates. Very often administrators simply forget to do simple things like shut down IIS, SQL or any number of other services when applying a patch for those components. A SQL Server hotfix is unlikely to contain fixes for kernel components, so simply shutting down all SQL instances before running the update will remove the reboot requirement entirely.

Similarly, Internet Explorer is very often left running after completing the download of updates, some of which may apply to Internet Explorer itself. Even though this is not a system component, the file is in use and it is scheduled for action at reboot. Logging in with a stripped-down, administrative privileged account to execute updates removes the possibility that taskbar icons, IE, an explorer right-click extension or anything else is running that may impede a smooth, rebootless patch deployment that updates interactive user components.

This is simply a function of the way Windows handles file locking, and a bit of planning to ensure no conflicts arise can remove unnecessary reboots in a lot of cases.

Previous: Part 3. Console Applications, Java, Batch Files and Other Red Herrings

Next: Part 5. Nobody Ever Runs a Server That Long

Architecture Matters

Thursday, 10 October 2013

WYCRMS Part 4. Windows Updates and File Locking

No comments:

Post a Comment

Twitter

Followers