Architecture Matters: 2013

Tuesday, 5 November 2013

WYCRMS Part 7. I Don't Think You Understand What A Server Is

In 1997, a HP 9000 engineer wouldn't blink telling about a server that had been running continuously for over five years.I found this remarkable at the time, and couldn't imagine a Windows server lasting that long. I have moved on, and frankly expect my Windows servers to survive that long today. Very few share this position, and I'm trying to find out why it's so lonely on this side of the fence.

7. I Don't Think You Understand What A Server Is

It's taken me a long time to find the right words to explain this title, because it's a bit contentious. This is a longer article, but I hope it's of value and I hope expresses my sense of pride and love of computing.

I've explained in previous posts that Windows Server is accessible to those new to IT engineering because it is a simple learning path from the system most of us have in our homes, schools, libraries and elsewhere. Moving to depending on a server to provide service takes different people different amounts of time to get right, and some organisations are more tolerant of slip-ups than others.

There is something that starts to take over. It's not obvious, and since it applies to something commonplace and often the subject of passion for those true geeks, it is also often unrecognised in oneself.

Fear

In no other industry would a consumer or customer tolerate advice such as rebuilding a product, service or transaction because of a minor fault. No mechanic would expect to be paid if they told their customers that the brakes may fail after an hour on the freeway, advising the driver to just come to a complete stop and then move off again, yet this is what we in the (Windows) IT Service industry resort to all-too-otfen. It's a mild paranoia that all things Microsoft (and others, as I will show) are prone to misbehave, and I've been told this explicitly, in writing, more than once from multiple providers after I've requested a feature or function be deployed. Providing vendor documentation in favour of the product's capabilities doesn't sway these types, as they've experienced their own horror stories of staying up until 2am while functionality is restored. I know, I've been in that trench.

But imagine for a moment if you'd gone to a garage affiliated directly with your car's manufacturer. A Ford mechanic providing me the aforementioned brake-fault workaround would be held to account if he also displayed his certification of training or affiliation with Ford Motor. I could challenge him and go to his management to insist that real advice and a fix is provided for my product. If he could show you that his advice was sound my beef would be with Ford selling me an underperforming product. I'd have recourse.

IT engineers seem to think they don't, either in being the place to receive this criticism (hey, they didn't write the software they just run it), or in being able to back up their fears with proof that vendor products in fact don't behave as they would hope. Yet the offices of IT service providers proudly display partner certifications, while engineers with IT credentials flowing of their resumes continue to fear the products that are their livelihood and the foundation for the logos they too show off with some pride. Doublethink indeed.

I am very active in reporting bugs in open-source and even paid-for products, because I expect that the product is only good as those who help make it better. I've already mentioned that failure to even start reporting faults to vendors is negligent, and how engineers should have more pride in their platforms and confidently defend it from detractors with authoritative sources. What I haven't spoken about is the relationship those engineers have with the platform itself.

There is a widely held belief in ICT over a decade old that is demonstrably false: Ethernet autonegotiation is quirky, and potentially dangerous. There was a time this was true, but not for at least five years since 100Base-TX lost it's leading position in datacenter, server and finally desktop connectivity method. Implementing Fast Ethernet (as 100Mb Ethernet was known) needs some knowledge of standards. When the Institute of Electrical and Electronics Engineers (IEEE) published their 802.3u standard for Fast Ethernet, I recall an interview with one of the panel members who stated that it is technically possible to run 100Mbit over stacheldracht (German for barbed wire) and you may have success. He made it clear though that your experience cannot be guaranteed as it is not 802.3u-compliant.

That's the crux: when a vendor states something as a standard, part of a reference architecture, or included in their documentation, they're making a promise.

The section of the new standard dealing with how the two nodes selected the operating speed and duplex setting was, unfortunately, not precise enough and open to some interpretation. Cisco and a few other vendors chose one interpretation while everyone else chose another, and the resulting duplex mismatch is notoriously hard to diagnose, occurring as it does only at moderate load and a ping test over an idle cable will likely succeed. It's insidious, and resulted in the universal abandonment of Autonegotiation in implementations (especially datacenters and core networks).

The problem is, Autonegotiation is not only working well in Gigabit Ethernet (over twisted-pair copper, or Cat5e/Cat6 cabling), it is mandatory. Even network professionals, burnt previously in the 90s and later with Fast Ethernet, advise against turning on a feature that is explicitly required to be a truly standards-compliant implementation, with all the promises attached. A prime reason is that the applicable line is buried deep inside section 28 of the IEEE-802 standard, as amended for Gigabit. It's dry reading...

Gigabit Ethernet was a big jump forward that started to seriously tax memory buses and CPUs like no other iteration before, and includes a highly valuable feature known as a Pause Frame to stop transfers flooding receive buffers and being dropped. This facility is only used if the opposite end cooperates, and the only mechanism to advertise this is autonegotiation.

I've seen an implementation of Microsoft Exchange 2010 come to its knees for lack of Pause Frames, and it is again an insidious failure since packets are only dropped under load, and ping tests and even high-load throughput tests succeed. It is the clinging to an old wisdom without knowing the cause, and then failing to keep up with developments that has caused this issue. Not running with Autonegotiation means you aren't running a standards-compliant Gigabit Ethernet network, and all promises are void.

Not following vendor advice is a bad idea. If the vendor promises a feature that you feel is not ready for primetime then by all means hold off. But if I expect something to work that a vendor promises will work, I don't expect to be told war stories of how this breaks - especially when I last saw that issue, myself, over 12 years ago. It's old thinking, stuck in past fears, and it's stopping you from unleashing your platform's potential. Windows Server especially has become a solid, dependable and performant platform, yet doubts linger and fears cling to dark corners, an uneasiness that is sometimes not even apparent to those harbouring it.

I enjoy reading on the history of computing, and contemplating how modern computers implement both Harvard and von-Neumann architectures depending on how closely you're looking. It's esoteric to speak of privilege rings or context switches, but knowing these things has been of immense help to round out my understanding of computing and gain trust in the models deployed. But the biggest thing I would like to see engineers embrace is this:

The Turing Machine

It's a simplistic representation of any computer, from your old calculator wristwatch to supercomputing clusters: A processor reads instructions from a sequence, implements those instructions with some data, and stores the result somewhere before moving to the next instruction. The next instruction may be a pointer to a different instruction, but all of computing boils down to this concept. there may be more than one processor, and there may be complex layouts of memory, but at its most basic form every computer works this way, and building on your model of a system's internals starts here.

It is deterministic, in that the state after an instruction is performed can be predicted from the initial state. In principal all of computing conforms with this principle, and any unexpected behaviour simply means the initial state was not well-understood enough. It is this mountain that engineers need to climb to truly excel at their profession, and I've met some expert climbers in my time. They have no fear of digging down to each root cause, and unearthing an even deeper root.

Rebooting is not the answer. It indicates a lack of knowledge on cause of faults. It is a sign of an unwillingness to investigate further. Worst, it is a misunderstanding of what your server is, what is meant to do, and the longer you allow that mentality to perpetuate the worse off you will be.

Old tales have value, but they are no substitute for knowledge and verifiable fact. If those facts contradict your experience, investigate, shout at vendors, check your implementation.

But most of all, be proud of your platform, because as obscure as It appears ot be, it is genuinely not that hard if you are willing to do better.

Previous: Part 6. It's OK, the Resilient Partner Can Take Over

Thursday, 17 October 2013

WYCRMS Part 6. It's OK, the Resilient Partner Can Take Over

A rolling reboot schedule should be implemented for the XenApp Servers so that potential application memory leaks can be addressed and changes made to the provisioned XenApp servers can be reset. The period between reboots will vary according to the characteristics of the application set and the user base of each worker group. In general, a weekly reboot schedule provides a good starting point.

More imprecise advice is hard to find in technical documents. How exactly does the administrator, engineer or designer know the level of his "potential" exposure to memory leaks? I've spent some time exploring this issue in the previous articles, and I stand by my point - if an administrator tolerates poor behaviour by applications or - worse - the OS itself without making an attempt to actually correct the flaw (e.g. contacting the vendor to demand a quality product), that administrator is negligent, scheduled reboots are a workaround, and nobody can have a reasonable expectation of quality service from that platform.

But most of all: How are you ever going to trust a vendor who has so little faith in their product that it cannot tolerate simply operating? I'm not singling out Citrix here, but their complacency in the face of bad code is shocking. I admire Citrix, so I'm not pleased at this display of indifference. Best practice I guess...

Then we get to sentences two and three of this three-sentence paragraph, which informs our reboot-happy administrator to try a particular schedule without a definitive measure in sight. There's a link on how to set up a schedule and how to minimise interruption while it happens, but not one metric or even a place to find them is proposed. He/she is given a vague "meh, a week?" suggestion with zero justification, apart from being "feels-right"-ish.

If a server fails, it is for a specific reason. Sometimes this is buried deep in kernel code, the result of interactions that can never be meaningfully replicated, or much more exotic reasons. In most cases however it is because of a precise reason (memory leaks included), and computing is honestly not so hard that these cannot be fixed.

You might tell I'm an open-source advocate, because I firmly believe in reporting bugs. I also believe I ge tto see the response to that bug. I've found some projects to be more responsive than others, but generally if I've found something that is not just broken but damaging I see people hopping to attention - and that's people volunteering.

If you're buying your software from a vendor they have that floor to start from in their response to you. Tolerate nothing less than attention, and get your evidence together before they start pointing fingers.

When you work in a large organisation you realise things have designations and labels for a reason. Resilient pairs are for unanticipated failures, and DR servers are for disasters.

You don't get to hijack a purpose just because it's unlikely it will be needed - they exist precisely for the unlikely.

Previous: WYCRMS Part 5. Nobody Ever Runs a Server That Long

Next: Part 7. I Don't Think You Understand What A Server Is

Monday, 14 October 2013

WYCRMS Part 5. Nobody Ever Runs a Server That Long

Next: Part 6. It's OK, the Resilient Partner Can Take Over

Thursday, 10 October 2013

WYCRMS Part 4. Windows Updates and File Locking

Next: Part 5. Nobody Ever Runs a Server That Long

Wednesday, 9 October 2013

WYCRMS Part 3. Console Applications, Java, Batch Files and Other Red Herrings

Next: Part 4. Windows Updates and File Locking

Tuesday, 8 October 2013

WYCRMS Part 2. Windows Just Isn't That Stable

Next: Part 3. Console Applications, Java, Batch Files and Other Red Herrings

Monday, 7 October 2013

WYCRMS Part 1: But I Have to Reboot My Own Windows System All the Time!

Next: Part 2. Windows Just Isn't That Stable

Why You Can't Reboot My Server

When I was an on-site server engineer in 1997, I stood next to a HP 9000 engineer waiting for a SCSI hard drive at our parts depot, and we got chatting about his next work order: He was off to install a tape drive. I asked him what the new hard drive had to do with it, and he mentioned that the server in question had been running continuously for over seven years, and at least one drive was likely to get stuck and refuse to spin again once he turned the frame back on.

I found this remarkable at the time, and couldn't imagine a Windows server lasting that long. I have moved on, and frankly expect my Windows servers to survive that long today. Very few share this position, and I'm trying to find out why it's so lonely on this side of the fence.

In this series of posts, I'll be looking at the most common complaints from Windows engineers and administrators they feel are adequate to justify rebooting servers, either as (or instead of) a diagnostic step, on a schedule that can best be described as arbitrary, or even artificially to apply fixes for problems the system doesn't have.

In this series:
Part 1: But I Have to Reboot My Own Windows System All the Time!
Part 2. Windows Just Isn't That Stable
Part 3. Console Applications, Java, Batch Files and Other Red Herrings
Part 4. Windows Updates and File Locking
Part 5. Nobody Ever Runs a Server That Long
Part 6. It's OK, the Resilient Partner Can Take Over
Part 7. I Don't Think You Understand What A Server Is

Next: Part 1: But I Have to Reboot My Own Windows System All the Time!

How to Make Your Customers Feel Like Meat in a Tube

Few things annoy me more than web-based forms for initiating customer contact. My experience of them ranges from poor to dismal, and even when I point out that I expect companies to fail in their response I am rarely surprised by brilliance (or even adequacy).

The first problem with these forms is actually the result: Your enquiry ends up not as an e-mail for a person, but a record in a database. Some forms are worse than others in betraying this, but if you even have to select your company size or decision-making company role you can be sure you're being slotted into a Customer Spamming Service machine.

From there, around four out of five responses make no reference to your original query. Unlike e-mail, where you can save your initial contact in your Sent folder, and typically hiting Reply generates a new mail on top of your original one, the first response you receive almost always has no history, so you're left scratching your head wondering if you really forgot to mention your product's model number, even when you remember having to look up the unicode for the unnecessarily accented é in the product name.Whether a human typed out your reply, selected a form response or some machine logic matched your keywords to information already available in the FAQ, I will offer odds, without knowing who the company is, that the original question is not included for reference.

It's a pain to fill in forms like these repeatedly for each individual question, so you might be tempted to put more than one question in your query. Beware traveller - the company will choose which answer most closely matches their prepared form responses and send that to you, regardless of the amount of prominence you try to give to the one you really need answered first.

Errors on the form? How about not residing in the US so skipping the "state" field, only to be told the field is mandatory. OK, I live in Wyoming, Netherlands. Ah, the form now tells me having a state filled in outside the US in an invalid choice? Check the dropdown - yep, only US states available and no way to not pick one. Don't bother complaining about the logic in your actual request - you see, the people choosing stock responses to send that don't adequately deal with your query, they're in no way connected with the end of the sausage maker that ruins your customer contact experience from the start. They just turn the handle.

While sending an enquiry to a prominent software vendor, I happened to have NoScript turned on and found the form broken beyond use. This is simply not justifiable. Oh well, I'll enable the site for JS, but lo! The form fails to complete again. This time it is because a piece of code from a marketing firm has not arrived. So prominent, it even has the name market in its' name - answering my question vs completing my digital profile for a third party: Which do you think they care about most?

All this from an IT Security company, that sells products to control mobile phone policies to stop users from doing things like installing untrusted software that sends their data to unknown parties, without telling you.

Why am I running a marketing company's JavaScript to collect my personal information to initiate an evaluation of your products?

I am just meat in a sausage to you people, aren't I?

Wednesday, 18 September 2013

Streetview and WiFi - Courts Need Some Education

I'm hanging my head in my palm in a manner not unlike the Jean-Luc Picard meme today, after reading a decision by a court in the United States. EFF has a great article summarising the effects, but I'd like to expand and go into the cause too.

So Google drove around with an antenna on their Street View cars for a few years, sniffing for wireless networks. This is very useful if you'd like to know your location but can't get a GPS signal, especially on devices with lower quality antennas or a location with poor sky visibility. As long as you have a data connection, you send the local Access Points off to Google's servers, and they will look up the location and feed it back to you. Simple, right?

Well, sniffing for access points (AP) is ridiculously simple. Your phone does that every time you look at a list of networks in your location, by looking for a special network frame called a beacon, which a regular access point send out roughly ten times a second. It's crucial to how WiFi works. It uses the same frame type as normal traffic, so even without a beacon you can still see at least the presence of traffic, and the MAC address associated with the AP. If the network is unencrypted, your WiFi network card automatically accepts those traffic frames too, then has it discarded by your kernel because it is not destined for your computer/smartphone/tablet etc ("your device").

I've done sniffing myself, in a practise known as "war driving". It sounds ominous, but it's also a very interesting excercise, for which I purchased a specific Atheros WiFi NIC, thanks to their products having excellent Linux support. Hook up a GPS receiver, and just go places. The software figures out where you are, looks at the list of APs nearby and pins them to a map. The problem here is that simply enabling your card to listen for APs does cause your system to store those traffic frames, since they are usefuol for determining the IP range in use on that network. Note I'm not trying to use those networks, I'm just interested in seeing how they're being used.

Now comes the interesting part: The court decided these signals being intercepted were not "radio communications" (despite being carried by photons) for the purposes of a legal interpretation, not "readily accessible to the general public" without "sophisticated hardware and software", and finally "most of the general public lacks the expertise to intercept and decode payload data transmitted over a Wi-Fi network".

On each point:

It's radio. Learn physics. The lower court's opinion that the law covered "predominantly auditory broadcast" is an inference form the court; 18 USC § 2511 mentions audio only for satellite transmission, and only then to describe an audio channel used as a carrier for digital communications therein. If I speak ones and zeros into my walkie-talkie, does this suddenly become digital communications? Your meaning is divorced from reality, and I think it suits an agenda instead of fact.
If I walk down a street, glancing into shops as I go, and see a person in one shop/office/etc handing a big photo with a red X through it to another person, with person in photo turning up dead and either of the two persons implicated at trial, I can't be prohibited for testifying because I got information not "readily accessible to the general public"(i.e. they were not standing on a street in plain view); a police officer seeing the same thing under the same circumstances is not prohibited from using the information for lack of a warrant or probable cause - it still doesn't affect the fact that photons bounced off the photo and got interpreted by "specialised hardware " (eyes) and "specialised software" (brain). Nobody's privacy is invaded, but information was blasted into the street nonetheless.
WiFi NIC cost: $10 (shipping costs vary), down to as low as one. These devices are sophisticated, don't get me wrong, but then most computers are mind-numbingly powerful today. Linux kernel cost: nothing (download costs may vary, but is unlikely to make you hit your ISP's download cap), easily installed in an array of distributions. Laptop cost: Variable, but if you've got one lying around you already have your solution.
Interception is easy stuff. Decoding is easy stuff, and your device "decodes" it as part of its' primary function.
A further point the court asserted is that regular (AM/FM) radio communications can be received miles away, versus WiFi that "fail to travel far beyond the walls" of a location. Again, physics. Oh, and define "far" - my balcony is less than 15 meters from my AP and my signal comes and goes, while the street is 30 metres away (in the opposite direction) and I still get an association (but not much throughput) fairly reliably. I despise vagueness in court proceedings.

The fact is, sophisticated software is required to not receive the payload. If somebody decides to configure and use an unencrypted access point and I happen to walk past with my phone doing the searching (I left the WiFi on when I left my home), or researching which models of router or service provider is prevalent on the street, or finally if the NSA, FBI, federal or local officials are parked in a van across the road, simply turning on the function puts the traffic in RAM. Even if it destroyed a microsecond later, under this ruling I (and they) have broken the law. The bar for warrant-required searches just shot up.

I'm used to seeing courts being out of touch with reality, especially in computing cases, but this is beyond unreasonable. I have no doubt Google had zero intention of capturing user detail such as e-mails, usernames or passwords (why would they, they run an e-mail service?), and are now being prosecuted for users' inability to secure their own networks.

But mostly: Physics!

Thursday, 24 January 2013

Dawson College: What Island Are You On?

I've been viewing the growing story about Ahmed Al-Khabaz, a Computer Science student at Dawson College in Montreal, Canada, who was expelled for running a security scan against their public web presence to discover if a flaw he found was resolved. I stumbled on a subsequent interview with prominent IT Security professional Chris Wysopal. I hadn't heard of him, but when I saw he was previously associated with l0pht Heavy Industries my eyes snapped open.

This guy has credentials, and I don't know of an IT professional active around 1997 to 2003 who hadn't heard of, or actually used, l0pthcrack, often to solve real-world problems. First and foremost a password auditing tool it can be used maliciously, but the so can a toaster oven. It is a piece of code art: Necessary, useful and (at the time) industry-shaking.

White Hat hacking is a tricky business. Even I've done it, against a bank no less, fully in the knowledge that I was doing something the system owners would be very unhappy about. In some cases it can get you arrested. I was was pleased with the results when my concerns were taken seriously and fixed fairly quickly. I've worked in financial services companies and know their software release process is iceberg slow so this was very reassuring. There's one thing Mr Al-Khabaz and I both know that drives thousands around the world to the same end: I'm at risk.

Dawson College is hand-wringing and special pleading: "the law ... forbids us from discussing your personal student files" is in this case weak. I am pretty sure the former student would agree to a waiver of his right to privacy to clear the air, but I have seen no mention of an offer. Fourteen out the fifteen professors convened voted for his expulsion, for doing what some professionals get paid extremely well to do (even I've been offered this job): Evaluate the security of publicly-accessible websites. I would like someone better informed than me to comment on what the implications would be for the institution if it was discovered a breach because of this flaw caused losses thanks to the personal information disclosed.

I can appreciate that the college does in fact have to abide by law, and is unwilling to get into a mudslinging match in the public forum. They have rules for ethical behaviour that may have been violated (I haven't seen them). But beyond those considerations, every one of the fourteen professors needs to answer one simple question:

Why, if these actions are so outrageous of a Computer Science graduate that it demonstrates
"behavior that is unacceptable in a computing professional" has the company whose software flaws he exposed taken it upon themselve to pay for his further education?

Academia is often seen as disconnected from reality; some lines of research beggar belief, and the same could be said of Computer Science. I've met a few graduates who arrive in the IT industry ill-prepared, full of theory of operation and design but unable to command a command-line. No matter what their actual instruction is, a critical point they need to learn is that the Internet is a hostile place. It is also a collaborative place, where FOSS abounds and Creative Commons is richly rewarding. Poking around is the norm, and if this college is telling their students that they are to accept their instruction blindly without considering real-world implications, or use those skills to explore, then they don't deserve to be associated with the term Higher Education.

They may perhaps be able to educate Code Monkeys, but thinking professionals able to design and protect systems that impact their lives? Not really.

Wednesday, 23 January 2013

How Important is MariaDB? Let's test the fork with butter.

MySQL has interested me for quite a long time. I first came across it in 2000 when trying to find a better way to analyse the contents of a 20,000 user Active Directory and needed more relational DB-stuff than Microsoft Access could deliver and cheaper than SQL Server (wow MSDE was terrible). I was deeply impressed (though probably because I was easily impressed back then) with the performance and cross-platform support, and ever since it's been around my life.

I currently use it for my XBMC and Logitech Media Server (SqueezeBox) media databases, as the back-end for my Gallery3 site, and other ad-hoc databases whenever I need to crunch data. Before my 64-bit processor created a new ISA that ensured a reasonably complete instruction set, it was a favourite of mine for optimising binary compiles over the stock i386 build supplied by most distros, but more for interest's sake than actually squeezing performance for any measurable benefit.

MySQL AB was of course the owner of the copyrights and code and opted for a relatively unique license, both proprietary and open. As the owners of the code, they could choose to do this, but anyone trying to make a buck out of the code was obliged to release their modifications. Now that Oracle (through their acquisition of Sun, who acquired MySQL AB) have that right, the open-source community is in a bit of a fluster. Can we trust a corporate giant with custody of the code that runs a significant fraction of the Internet's websites? The answer is slowly coming down on the side of "no".

Oracle (and others, and unsurprisingly) is being guarded about bugs and fixes. Stories of vendors forcing customers into NDAs before even admitting bugs exist, hiding bugs from other customers, and silently including fixes are common. It's face-saving. Andy Grove's "Only the Paranoid Survive" starts off with how Intel hoped to keep their Pentium FPU bug quiet while they implemented a workaround simply smacks of arrogance. While it doesn't yet seem Oracle are trying to hide any actual code and still supply source, MySQL has historically had test cases for bugs published alongside them to protect against regression and anyone can run the suite on their installation to verify code quality. Not only are they apparently now keeping some cases secret, they are also not clearly marking which code updates fix bugs they are refusing to publish.

This is not how open-source works, but I don't agree with the prevailing rationale. RedHat came into the firing line for being less than open they handled a code signing infrastructure breach, but in that instance I support the way they behaved as it was not their source they concealed, rather their own systems and controls that were embarrassingly compromised. They have shareholders, and revealing too much would have cost them. Oracle too have value invested in their products and would like to keep flaws hidden. This is not nefarious, it's capitalism.

MySQL as a product is different, no matter who owns it. It is very closely tied to the spirit of the open-source movement, being both highly regarded for performance and features, and for the competition it gives proprietary offerings. For Oracle to claim that ground back is entirely within their right, but the edge is gone. The most ardent supporters and influencers of purchasing are not happy and a slow exodus may be starting.

So Fedora and Wikipedia are both contemplating pulling out. The MariaDB fork has all the features and more, is fully open in the original spirit of the project, and is attracting attention including mine. I have no idea how easy it will be to do the fabled "drop-in replacement" every source claims is possible but I feel ethically compelled to leave MySQL in the dust. I have a server that runs my digital life and it is a conscious choice to run on open software only and it has not been easy, but as an experiment and learning tool it is invaluable.

The great thing about open-source is anybody can fork. I can clone a source and apply my changes as I like, but the moment I try to give it to anyone else (especially selling the result) I have to disclose my whole body of work. This can lead to some confusion as the early days of Linux showed, but in the end the market weeds out the under-performers and delivers better products through sheer market forces. MariaDB seems to be that winner.

I do know one thing: testing the transition is going to be a breeze: After switching from Fedora to Gentoo four months ago, I rolled the root over to BTRFS (once kernel 3.6 gave me the necessary confidence). Add a distinct IP to the NIC, snapshot, chroot, and I've got a clone of my server ready to go in about two seconds without that system-level virtualisation stuff and hideously slow LVM2 snapshots.

Rollback to base for a fresh attempt? Yep, two seconds.

Architecture Matters