Architecture Matters

Thursday 26 January 2017

Don't Just Tell Me "It's Encrypted"

"It's Encrypted"

These two words are a trigger. They seem innocuous enough, but the devil is in the details, and the details of cryptography are notoriously byzantine. It's tempting to allow ourselves to be soothed by these words, but they fall well short of an assurance.

In 2002, Bill Gates sent a memo inside Microsoft introducing his vision for Trustworthy Computing, a form of which we now know as Trusted Computing (these are distinct). In addition to Security and Privacy, the Availability goal in that memo has since been likened to a light switch. You don't think about whether the power will arrive when you flip that switch, you simply trust that it will. It's hard to see the forest for the trees sometimes, but when you consider the immense complexity and level of engineering in the products we depend on fifteen years later (tablets, laptops, mobile phones, streaming music systems to name a few), it is a remarkable achievement.

The light switch analogy is instructive. Power generation and distribution is a gargantuan field, and consumes spectacular amounts of money to provide you with the simplicity and affordability of that light switch. Similarly, achieving the level of stability and performance in our computing devices has consumed millions of productivity hours and billions of dollars. Privacy, the often forgotten cousin of Security and Availability in the original memo, is expected to varying degrees by different audiences but nonetheless depended on for our daily lives.

"It's Encrypted" sounds like the light switch - problem solved, it's encrypted, I can rest. It is a rare day indeed where that claim is picked apart that I do not find (sometimes significant) issues with the claim. If not done correctly, badly implemented security (but encryption especially) can be worse than no security.

Cryptography is an esoteric field. As I write this, Mozilla is due to start warning users of websites that continue to use certificates signed with the SHA-1 algorithm. This is likely to be Greek to the average user, but to security professionals this is a reasonable step in the march forward, and if naming and shaming is required to prod those last holdouts into the future, then at least users are informed that these providers are not up to snuff. This stuff is ubiquitous, and few users realise that their $150 smartphone is loaded with some of the most advanced cryptography available to the average Joe. They depend on it often more than they know, so interventions can be useful.

The problems are not in the ciphers, and handshakes & protocols are continually improved to counter new threats while enabling new functionality. The biggest headache (and the most frequent downfall of "It's Encrypted" claims) is key management. As Bruce Schneier reminds us in Applied Cryptography: "In the real world, key management is the hardest part of cryptography."

Private keys in PKI are often treated with trivial regard. The most recognisable for administrators would be the PFX, or .P12 file. Incorrectly referred to as a "Certificate file" (a wildly inappropriate shorthand), these few kilobytes are dealt with as arbitrary packages that a system uses to function, as opposed to critical pins in the security apparatus. This type of file (PKCS#12) as designed, is intended for key archival, yet is often sent via e-mail, stored on accessible drives to ease administration and troubleshooting.

On the other hand, Pre-Shared Keys are frequently recorded when they are not strictly required - if a system has issues, generate a new pair, then discard any record, though only a storage corruption would necessitate this, in which case you have bigger problems. TLS1.2 recommends this with their Ephemeral Keys, but IPSec VPN administrators have yet to learn the lesson and these are still recorded as if they will be required in the future, undermining the confidentiality and integrity of the link.

I worked on a proposal for a customer who demanded full encryption of all storage volumes in a cloud hosting provider, and quickly determined the theory and practise of these cryptosystems prohibited a robust solution. Of course, it could be fudged (and eventually was over my objections), but it took me a while to realise that the actual integrity of the system was not the point. It was merely to be able to make the claim "It's Encrypted".

The eventual implementation made use of "self-decryption". This is not a thing. By all means, system rely on obfuscation to hide key material and defeat trivial intrusions, but if a system is aware of how to decrypt its' own resources, an attacker will be able to achieve the same goal. The Trusted Computing Modules located on modern mobile devices, with the built-in Hardware Security Module, provides a slick solution to this question in performing both assurance that the system is trustworthy, and providing the eventual decryption keys once this assurance is performed. A virtual environment has not such opportunity.

Another issue in the eventual implementation was the use of a central key repository, using COTS. Since the designers have no real training or experience in cryptography or security principles, they made the fatal mistake of trusting the systems they were trying to protect too much. Every server is hosted at a provider, and on startup will request access to a database of keys over CIFS. Authentication is handled by means of the computer AD account. Since the single database is decrypted in its entirety by any participating server, every server has access to the keys for every other. This is disastrously insecure; the cloud provider has administrative access to these systems, and it is against their view that the cryptosystem is supposed to protect; ISVs and integrators are frequently given administrative credentials to install and test their products, again allowing them to retrieve as many keys as they please should a nefarious actor be present.

But hey, "It's Encrypted".

Monday 13 October 2014

Posturing over Encryption Defeats Everyone's Security

Apple and Google announced in short succession that they will be turning on encryption by default for all of their devices and the reaction from law enforcement (and some press elements who pander to them) is nothing short of incendiary. The nexus of national security, organised crime, think-of-the-children and technology concpepts barely understood by laypeople is fascinating to watch but one thing stands out that makes this posturing baffling: Law enforcement already knows what the limitations are of modern cryptography on mobile devices, and they're borderline lying when they decry this latest move.

Encryption on Android devices at least is very strong relying on the deep structures in the Linux kernel and associated tools to create and manage encrypted storage devices. Apple have improved things a bit in iOS8 but they've a ways to go. The announcements change nothing about the capabilities of these devices that law enforcement don't already worry about but that hasn't stopped them from launching a PR push in an attempt to show they're actually doing something - anything - to tackle the problems under their purview.

There is a serious flaw in this. The announcements merely indicate the starting position of the state of a new device, in that existing methods for securing the device will be turned on by default. Any savvy user who cares for their privacy (and unscrupulous ones probably more so) will have long ago figured out how to activate these features. In the grand tradition of DRM, where normal users were left exposed or under-served while underhanded consumers found the best release schedules by simply downloading as they wanted, viewing on devices that pleased them, and avoided limitations like being forced to watch ads or in formats sub-optimal; regular users of mobile devices were left woefully underprotected despite the facility being there to enhance their device security for the good while the security conscious (and I will keep admitting not always those with high-minded intentions) could exploit these features for their own protection.

Law enforcement knows this. Their hand-waving (and predictable, think-of-the-childrenness) responses are at best facile and at worst outright lies. If they did not know these devices were already capable of encryption then they are not competent, and if not but think this default setting changes the landscape for law enforcement then they are certainly dishonest - someone choosing to hide data from law enforcement already has those tools. These decisions are security for the rest of us.

This is not new. I used to have respect for Dianne Feinstein, the present (as of October 2014) chair of the US Senate Select Committee on Intelligence. She once remarked that Edward Snowden should have come forward with his concerns privately, even directly to her, rather than disclosing his information to the public; that this would still have resulted in the reforms and protections now underway at the NSA, CIA and other intelligence bodies in the US. Again, either incompetent in thinking the level of scandal this information would cause would be outweighed to safeguard individual rights in an environment of total secrecy (it won't) or posturing for effect and misleading the public about what she knows to be a false tale of her and her colleagues' desires to curtail intelligence gathering if it infringes those rights (it would never). Cue outrage when the CIA is found to be violating those rights, this time hers. She cannot be taken seriously.

Neither can any government official who says that the two largest smartphone OS vendors are hurting law enforcement because of a non-technical change. They are posturing, lying or showing incompetence (at best because of bad advice, but unlikely). They would like to intentionally leave us exposed to data theft, privacy violations based on the flimsiest of evidence, insufficient safeguards for that data and abuse by criminals who use government and law enforcement's own tools and methods.

I don't blame them for this as it's their job to pretend we're all at risk, that their job is hard, and that we should trust them. I know this and can't fault them (too much) for it.

My true outrage is with media outlets that don't just blindly parrot their talking points, but add on their own ill-informed scaremongering flourishes. None of these articles mention the present availability of these techniques, and in the omission imply that this is a new level of crime-friendly protection with no benefit for ordinary users. If these tech journalists stand by their stories and insist government needs to take action to cripple security or mandate backdoors, history has a lesson for you, as do real tech journalists.

Tuesday 5 November 2013

WYCRMS Part 7. I Don't Think You Understand What A Server Is

In 1997, a HP 9000 engineer wouldn't blink telling about a server that had been running continuously for over five years.I found this remarkable at the time, and couldn't imagine a Windows server lasting that long. I have moved on, and frankly expect my Windows servers to survive that long today. Very few share this position, and I'm trying to find out why it's so lonely on this side of the fence.

7. I Don't Think You Understand What A Server Is

It's taken me a long time to find the right words to explain this title, because it's a bit contentious. This is a longer article, but I hope it's of value and I hope expresses my sense of pride and love of computing.

I've explained in previous posts that Windows Server is accessible to those new to IT engineering because it is a simple learning path from the system most of us have in our homes, schools, libraries and elsewhere. Moving to depending on a server to provide service takes different people different amounts of time to get right, and some organisations are more tolerant of slip-ups than others.

There is something that starts to take over. It's not obvious, and since it applies to something commonplace and often the subject of passion for those true geeks, it is also often unrecognised in oneself.

Fear

In no other industry would a consumer or customer tolerate advice such as rebuilding a product, service or transaction because of a minor fault. No mechanic would expect to be paid if they told their customers that the brakes may fail after an hour on the freeway, advising the driver to just come to a complete stop and then move off again, yet this is what we in the (Windows) IT Service industry resort to all-too-otfen. It's a mild paranoia that all things Microsoft (and others, as I will show) are prone to misbehave, and I've been told this explicitly, in writing, more than once from multiple providers after I've requested a feature or function be deployed. Providing vendor documentation in favour of the product's capabilities doesn't sway these types, as they've experienced their own horror stories of staying up until 2am while functionality is restored. I know, I've been in that trench.

But imagine for a moment if you'd gone to a garage affiliated directly with your car's manufacturer. A Ford mechanic providing me the aforementioned brake-fault workaround would be held to account if he also displayed his certification of training or affiliation with Ford Motor. I could challenge him and go to his management to insist that real advice and a fix is provided for my product. If he could show you that his advice was sound my beef would be with Ford selling me an underperforming product. I'd have recourse.

IT engineers seem to think they don't, either in being the place to receive this criticism (hey, they didn't write the software they just run it), or in being able to back up their fears with proof that vendor products in fact don't behave as they would hope. Yet the offices of IT service providers proudly display partner certifications, while engineers with IT credentials flowing of their resumes continue to fear the products that are their livelihood and the foundation for the logos they too show off with some pride. Doublethink indeed.

I am very active in reporting bugs in open-source and even paid-for products, because I expect that the product is only good as those who help make it better. I've already mentioned that failure to even start reporting faults to vendors is negligent, and how engineers should have more pride in their platforms and confidently defend it from detractors with authoritative sources. What I haven't spoken about is the relationship those engineers have with the platform itself.

There is a widely held belief in ICT over a decade old that is demonstrably false: Ethernet autonegotiation is quirky, and potentially dangerous. There was a time this was true, but not for at least five years since 100Base-TX lost it's leading position in datacenter, server and finally desktop connectivity method. Implementing Fast Ethernet (as 100Mb Ethernet was known) needs some knowledge of standards. When the Institute of Electrical and Electronics Engineers (IEEE) published their 802.3u standard for Fast Ethernet, I recall an interview with one of the panel members who stated that it is technically possible to run 100Mbit over stacheldracht (German for barbed wire) and you may have success. He made it clear though that your experience cannot be guaranteed as it is not 802.3u-compliant.

That's the crux: when a vendor states something as a standard, part of a reference architecture, or included in their documentation, they're making a promise.

The section of the new standard dealing with how the two nodes selected the operating speed and duplex setting was, unfortunately, not precise enough and open to some interpretation. Cisco and a few other vendors chose one interpretation while everyone else chose another, and the resulting duplex mismatch is notoriously hard to diagnose, occurring as it does only at moderate load and a ping test over an idle cable will likely succeed. It's insidious, and resulted in the universal abandonment of Autonegotiation in implementations (especially datacenters and core networks).

The problem is, Autonegotiation is not only working well in Gigabit Ethernet (over twisted-pair copper, or Cat5e/Cat6 cabling), it is mandatory. Even network professionals, burnt previously in the 90s and later with Fast Ethernet, advise against turning on a feature that is explicitly required to be a truly standards-compliant implementation, with all the promises attached. A prime reason is that the applicable line is buried deep inside section 28 of the IEEE-802 standard, as amended for Gigabit. It's dry reading...

Gigabit Ethernet was a big jump forward that started to seriously tax memory buses and CPUs like no other iteration before, and includes a highly valuable feature known as a Pause Frame to stop transfers flooding receive buffers and being dropped. This facility is only used if the opposite end cooperates, and the only mechanism to advertise this is autonegotiation.

I've seen an implementation of Microsoft Exchange 2010 come to its knees for lack of Pause Frames, and it is again an insidious failure since packets are only dropped under load, and ping tests and even high-load throughput tests succeed. It is the clinging to an old wisdom without knowing the cause, and then failing to keep up with developments that has caused this issue. Not running with Autonegotiation means you aren't running a standards-compliant Gigabit Ethernet network, and all promises are void.

Not following vendor advice is a bad idea. If the vendor promises a feature that you feel is not ready for primetime then by all means hold off. But if I expect something to work that a vendor promises will work, I don't expect to be told war stories of how this breaks - especially when I last saw that issue, myself, over 12 years ago. It's old thinking, stuck in past fears, and it's stopping you from unleashing your platform's potential. Windows Server especially has become a solid, dependable and performant platform, yet doubts linger and fears cling to dark corners, an uneasiness that is sometimes not even apparent to those harbouring it.

I enjoy reading on the history of computing, and contemplating how modern computers implement both Harvard and von-Neumann architectures depending on how closely you're looking. It's esoteric to speak of privilege rings or context switches, but knowing these things has been of immense help to round out my understanding of computing and gain trust in the models deployed. But the biggest thing I would like to see engineers embrace is this:

The Turing Machine

It's a simplistic representation of any computer, from your old calculator wristwatch to supercomputing clusters: A processor reads instructions from a sequence, implements those instructions with some data, and stores the result somewhere before moving to the next instruction. The next instruction may be a pointer to a different instruction, but all of computing boils down to this concept. there may be more than one processor, and there may be complex layouts of memory, but at its most basic form every computer works this way, and building on your model of a system's internals starts here.

It is deterministic, in that the state after an instruction is performed can be predicted from the initial state. In principal all of computing conforms with this principle, and any unexpected behaviour simply means the initial state was not well-understood enough. It is this mountain that engineers need to climb to truly excel at their profession, and I've met some expert climbers in my time. They have no fear of digging down to each root cause, and unearthing an even deeper root.

Rebooting is not the answer. It indicates a lack of knowledge on cause of faults. It is a sign of an unwillingness to investigate further. Worst, it is a misunderstanding of what your server is, what is meant to do, and the longer you allow that mentality to perpetuate the worse off you will be.

Old tales have value, but they are no substitute for knowledge and verifiable fact. If those facts contradict your experience, investigate, shout at vendors, check your implementation.

But most of all, be proud of your platform, because as obscure as It appears ot be, it is genuinely not that hard if you are willing to do better.

Previous: Part 6. It's OK, the Resilient Partner Can Take Over

Thursday 17 October 2013

WYCRMS Part 6. It's OK, the Resilient Partner Can Take Over

A rolling reboot schedule should be implemented for the XenApp Servers so that potential application memory leaks can be addressed and changes made to the provisioned XenApp servers can be reset. The period between reboots will vary according to the characteristics of the application set and the user base of each worker group. In general, a weekly reboot schedule provides a good starting point.

More imprecise advice is hard to find in technical documents. How exactly does the administrator, engineer or designer know the level of his "potential" exposure to memory leaks? I've spent some time exploring this issue in the previous articles, and I stand by my point - if an administrator tolerates poor behaviour by applications or - worse - the OS itself without making an attempt to actually correct the flaw (e.g. contacting the vendor to demand a quality product), that administrator is negligent, scheduled reboots are a workaround, and nobody can have a reasonable expectation of quality service from that platform.

But most of all: How are you ever going to trust a vendor who has so little faith in their product that it cannot tolerate simply operating? I'm not singling out Citrix here, but their complacency in the face of bad code is shocking. I admire Citrix, so I'm not pleased at this display of indifference. Best practice I guess...

Then we get to sentences two and three of this three-sentence paragraph, which informs our reboot-happy administrator to try a particular schedule without a definitive measure in sight. There's a link on how to set up a schedule and how to minimise interruption while it happens, but not one metric or even a place to find them is proposed. He/she is given a vague "meh, a week?" suggestion with zero justification, apart from being "feels-right"-ish.

If a server fails, it is for a specific reason. Sometimes this is buried deep in kernel code, the result of interactions that can never be meaningfully replicated, or much more exotic reasons. In most cases however it is because of a precise reason (memory leaks included), and computing is honestly not so hard that these cannot be fixed.

You might tell I'm an open-source advocate, because I firmly believe in reporting bugs. I also believe I ge tto see the response to that bug. I've found some projects to be more responsive than others, but generally if I've found something that is not just broken but damaging I see people hopping to attention - and that's people volunteering.

If you're buying your software from a vendor they have that floor to start from in their response to you. Tolerate nothing less than attention, and get your evidence together before they start pointing fingers.

When you work in a large organisation you realise things have designations and labels for a reason. Resilient pairs are for unanticipated failures, and DR servers are for disasters.

You don't get to hijack a purpose just because it's unlikely it will be needed - they exist precisely for the unlikely.

Previous: WYCRMS Part 5. Nobody Ever Runs a Server That Long

Next: Part 7. I Don't Think You Understand What A Server Is

Monday 14 October 2013

WYCRMS Part 5. Nobody Ever Runs a Server That Long

Next: Part 6. It's OK, the Resilient Partner Can Take Over

Thursday 10 October 2013

WYCRMS Part 4. Windows Updates and File Locking

Next: Part 5. Nobody Ever Runs a Server That Long

Wednesday 9 October 2013

WYCRMS Part 3. Console Applications, Java, Batch Files and Other Red Herrings

Next: Part 4. Windows Updates and File Locking