They do take security seriously

Earlier today, I read an article about the plethora of information security breaches in recent history. Its title reads:

“We take security seriously”, otherwise known as “We didn’t take it seriously enough”

The article then lists a number of companies informing the public that they've been breached.

I think this article doesn't just blame the victims of those attacks, but subjects them to public ridicule. Neither helps anyone, least of all end users.

I'm surprised to hear such comments from Troy Hunt. He's certainly an accomplished professional with extensive security experience. This is not the first time people have expressed similar thoughts; the HN thread for that article is rife with them.

The explicit assumption is that these companies wouldn't have gotten in trouble if only they had taken security more seriously. In a world where the information services store is increasingly valuable and software increasingly complex, breaches are going to happen. The idea that getting breached is their own darn fault is unrealistic.

This idea is also counterproductive. Firstly, there's one thing all of the victims being ostracized have in common: they disclosed the details of the breach. That is exactly what they should have done; punishing them creates a perverse incentive for victims to hide breaches in the future, a decidedly worse end-user outcome.

Secondly, if any breach is as bad as any other breach, there is no incentive to proactively mitigate damage from future breaches by hardening internal systems. Why encrypt records, invest in access control or keep sensitive information in a separate database with extensive audit logging? It might materially impact end-user security, but who cares -- all anyone is going to remember is that you got popped.

Finally, there's a subtle PR issue: how can the security industry build deep relationships with clients when we publicly ridicule them when the inevitable happens?

These commentators have presumably not been the victims of a breach themselves. I have trouble swallowing that anyone who's been through the terrifying experience of being breached, seeing a breach up close or even just witnessing a hairy situation being defused could air those thoughts.

If you haven't been the victim of an attack, and feel that your security posture is keeping you from becoming one, consider this:

  1. What's your threat model?
  2. How confident are you in your estimation of the capabilities of attackers?
  3. Would you still be okay if your database became three orders of magnitude more valuable? Most personal data's value will scale linearly with the number of people affected, so if you're a small start-up with growth prospects, you'll either fail to execute, or be subject to that scenario.
  4. Would you still be okay if the attacker has a few 0-days?
  5. What if the adversary is a nation-state?
  6. How do you know you haven't been breached?

That brings me to my final thesis: I contest the claim that all of the companies in the article didn't take security seriously. It is far more probable that all of the companies cited in the article have expended massive efforts to protect themselves, and, in doing so, foiled many attacks. It's also possible that they haven't; but the onus there is certainly on the accuser.

Clearly, that's a weak form of disagreement, since "taking something seriously" is entirely subjective. However, keep in mind that many targets actually haven't taken security seriously, and would not even have the technical sophistication to detect an attack.

(By the way, if you too would like to help materially improve people's security, we're hiring. Contact me at [email protected].)

HTTPS requests with client certificates in Clojure

The vast majority of TLS connections only authenticate the server. When the client opens the connection, the server sends its certificate. The client checks the certificate against the list of certificate authorities that it knows about. The client is typically authenticated, but over the inner HTTP connection, not at a TLS level.

That isn't the only way TLS can work. TLS also supports authenticating clients with certificates, just like it authenticates servers. This is called mutually authenticated TLS, because both peers authenticate each other. At Rackspace Managed Security, we use this for all communication between internal nodes. We also operate our own certificate authority to sign all of those certificates.

One major library, http-kit, makes use of Java's javax.net.ssl, notably SSLContext and SSLEngine. These Java APIs are exhaustive, and very... Java. While it's easy to make fun of these APIs, most other development environments leave you using OpenSSL, whose APIs are patently misanthropic. While some of these APIs do leave something to be desired, aphyr has done a lot of the hard work of making them more palatable with less-awful-ssl. That gives you an SSLContext. Request methods in http-kit have an opts map that you can pass a :sslengine object to. Given an SSLContext, you just need to do (.createSSLEngine ctx) to get the engine object you want.

Another major library, clj-http, uses lower-level APIs. Specifically, it requires [KeyStore][keystore] instances for its :key-store and :trust-store options. That requires diving deep into Java's cryptographic APIs, which, as mentioned before, might be something you want to avoid. While clj-http is probably the most popular library, if you want to do fancy TLS tricks, you probably want to use http-kit instead for now.

My favorite HTTP library is aleph by Zach Tellman. It uses Netty instead of the usual Java IO components. Fortunately, Netty's API is at least marginally friendlier than the one in javax.net.ssl. Unfortunately, there's no less-awful-ssl for Aleph. Plus, since I'm using sente for asynchronous client-server communication, which doesn't have support for aleph yet. So, I'm comfortably stuck with http-kit for now.

In conclusion, API design is UX design. The library that "won" for us was simply the one that was easiest to use.

For a deeper dive in how TLS and its building blocks work, you should watch my talk, Crypto 101, or the matching book. It's free! Oh, and if you're looking for information security positions (that includes entry-level!) in an inclusive and friendly environment that puts a heavy emphasis on teaching and personal development, you should get in touch with me at [email protected].

Call for proposal proposals

I'm excited to announce that I was invited to speak at PyCon PL. Hence, I'm preparing to freshen up my arsenal of talks for the coming year. The organizers have very generously given me a lot of freedom regarding what to talk about.

I'd like to do more security talks as well as shift focus towards a more technical audience, going more in-depth and touching on more advanced topics.

Candidates

Object-capability systems

Capabilities are a better way of thinking about authorization. A capability ("cap") gives you the authority to perform some action, without giving you any other authority. Unlike role-based access control systems, capability based systems nearly always fail-closed; if you don't have the capability, you simply don't have enough information to perform an action. Contrast this with RBAC systems, where authorization constraints are enforced with pinky swears, and therefore often subverted.

I think I can make an interesting case for capability systems to any technical audience with some professional experience. Just talk about secret management, and how it's nearly always terrifying! This gives me an opportunity to talk about icecap (docs) and shimmer (blog, my favorite pastimes.

Putting a backdoor in RDRAND

I've blogged about this before before, but I think I could turn it into a talk. The short version is that Linux's PRNG mixes in entropy from the RDRAND in a way that would allow a malicious implementation to control the output of the PRNG in ways that would be indistinguishable to a (motivated) observer.

As a proof of concept, I'd love to demo the attack, either in software (for example, with QEMU) or even in hardware with an open core. I could also go into the research that's been done regarding hiding stuff on-die. Unfortunately, the naysayers so far have relied on moving the goalposts continuously, so I'm not sure that would convince them this is a real issue.

Retroreflection

An opportunity to get in touch with my languishing inner electrical engineer! It turns out that when you zap radio waves at most hardware, the reflection gets modulated based on what it's doing right now. The concept became known as TEMPEST, an NSA program. So far, there's little public research on how feasible it is for your average motivated hacker. This is essentially van Eck phreaking, with 2015 tools. There's probably some interesting data to pick off of USB HIDs, and undoubtedly a myriad of interesting devices controlled by low-speed RS-232. Perhaps wireless JTAG debugging?

The unfinished draft bin

Underhanded curve selection

Another talk in the underhanded cryptography section I've considered would be about underhanded elliptic curve selection. Unfortunately, bringing the audience up to speed with the math to get something out of it would be impossible in one talk slot. People already familiar with the math are also almost certainly familiar with the argument for rigid curves.

Web app authentication

Some folks asked for a tutorial on how to authenticate to web apps. I'm not sure I can turn that into a great talk. There's a lot of general stuff that's reasonably obvious, and then there's highly framework-specific stuff. I don't really see how I can provide a lot of value for people's time.

Feedback

David Reid and Dwayne Litzenberger made similar, excellent points. They both recommend talking about object-capability systems. Unlike the other two, it will (hopefully) actually help people build secure software. Also, the other two will just make people feel sad. I feel like those points generalize to all attack talks; are they just not that useful?

Everything I've learned about running a financial aid program

For the past two years, I've been running PyCon's financial aid program. Starting this year, the event coordinator has asked all staff members to document what they do for PyCon. Firstly, this helps to objectively recognize the hard work done by our (volunteer) staff, and to help make sure there is continuity when the time comes to pass the torch. Since organizers of other conferences have expressed interest in my opinions for creating their own financial aid programs, I am posting my notes publicly instead.

This is a collection of hard-earned opinions, and is very much work in progress. It's written as if it were a conversation with a hypothetical financial aid organizer; so, whenever I say "you", I mean you, the awesome person running financial aid at a conference somewhere.

Basics

Financial aid programs are one of the most effective ways a software foundation can spend their money. Even if you completely ignore the effect it has on diversity, the number of speakers, sprinters and other contributors that attended PyCon thanks to the financial aid program is staggering.

Naming your program

I inherited the term "Financial Aid". It's a fine term, but some other conferences have come up with different terms that you may want to consider, like "opportunity grants" and "diversity grants".

Taking care of yourself

Running a financial aid program is a lot of work. It scales linearly with the number of applicants; it's your job to create a process that keeps the work per applicant small, so that you can make the number of applicants large. (You'll know you've succeeded when the fixed overhead dominates, and it wouldn't really matter if you added another dozen people.)

It is also an exercise in deferred gratification. Typically, you will need to start preparing about a year before the conference. The gratification part only comes at the conference itself. Since you'll probably be quite busy as an organizer, it may only come after the conference is over. You probably want to make sure that you have an excellent social circle and that you're fairly self-motivated.

As the Financial Aid Chair, I have been on the receiving end of verbal abuse once. Don't put up with it.

Beat the drum

Your financial aid program is useless if people don't know about it.

Some very talented speakers refrain from sending talk proposals because they don't know if they can afford to attend.

Numbers

PyCon's financial aid program is quite large. There are a number of reasons for that:

  • PyCon's financial aid program has been around for many years, so it's quite mature.
  • The program can count on support from PyCon leadership and the Python Software Foundation.
  • Because PyCon US is the largest PyCon in the world, it acts as a nexus for people across the globe; therefore, it's important for the Python community that as many people as possible have a chance to attend.

This is just to give you a ballpark idea of our numbers.

Regardless of what your numbers will be, expect that the primary bottleneck of your financial aid program will simply be lack of funds.

Confusing parts and sad truths

No-shows

A lot of people will not show up, and not notify you (or notify you in the days around the conference). Sometimes, there's good reasons (illness, emergencies...). Sometimes, the reasons are less great. Sometimes, you won't know the reason.

From the perspective of the Financial Aid Chair, no-shows are terrible. It's a dead grant: money that's been allocated that can't easily be translated into an extra attendee. Hence, many of the suggestions I make for running financial aid processes are focused on minimizing no-shows.

Visas and travel

Many recipients have to cancel because they are unable to acquire visas to Canada or the United States. In some cases, the visa process took several months and simply did not complete in time for the conference. In others the visas were declined for various reasons.

Occam's razor tells me that many countries, particularly the United States, to some extent now Canada, but also Schengen zone countries are simply actively hostile to foreigners visiting their country.

Planning

Stand by your Chair

At the end of the day, someone's responsible for making your conference all it can be. That position is typically called the Conference Chair. They get help from teams like the Program Committee and Financial Aid to make that happen. At the end of the day they make the hard calls, and you have to execute within their guidelines. That typically includes the budget, but it also includes how you want to allocate grants. You will almost certainly be resource-constrained, so there are trade-offs to be made:

  • Do you want to help newbies, or advanced programmers?
  • Do you want to help marginalized groups? Which ones? How much?
  • Do you want repeat attendees, or first timers?
  • Do you want a few people from all over the globe, or many locals?
  • Do you want to benefit people who directly contribute to the conference? Which ones (speakers, staff...)? How much?
  • Do we care if people are receiving funds from other places? What if their employer is paying? What if their employer is also a sponsor? Does it make sense for them to give us x sponsorship fees when we're giving them a significant portion of that back in financial aid?

Your budget is, unfortunately, zero sum. Every group you benefit means less for everyone else. Helping everyone is the same as helping no-one. Everyone wants to help everyone, but it's unlikely you'll get to do that. Make everyone understands exactly what you want to accomplish; you don't want to have this argument in the middle of trying to run a financial aid process.

Free or reduced-price tickets

For many conferences, the tickets themselves can be quite expensive. It makes sense to provide them to financial aid recipients at no (or reduced) charge as part of their grant.

PyCon previously provided free registration, but now provides reduced-cost registration. This helps with no-shows, giving applicants a financial incentive to let you know if they can't attend.

Make sure that you document clearly that people will be receiving tickets, at what price they'll be receiving them, and that their spots are reserved. Common concerns from financial aid applicants:

  • "Your conference blog says that you're sold out, and I haven't received financial aid yet. Will I be able to attend?"
  • "I've already registered to reserve my spot; what do I do now that I get financial aid?"
  • "I'm a student. Will I still be able to register at the student rate if I apply for financial aid?"
  • "I applied for a larger amount because I didn't think ticket price would be included." (Unfortunately, most people tell you this far too late.)

As usual, clarity in communication is key here.

Previous years, PyCon optionally provided free registration for people who asked. This optional part was somewhat confusing. Whatever you do, make it part of the default grant application. That also means that you should probably offer the reduced-price ticket to anyone who applies for financial aid, even if you can't otherwise give them a grant. Otherwise, someone who applies for financial aid for whom you simply don't have enough funds will get punished twice: no financial aid, and no access to early bird ticket price.

Housing

Housing, in the context of financial aid, means that you pay for a bunch of hotel rooms for various dates at your conference, and then put financial aid recipients in them. To save costs, you want to pair them up, and you want to utilize the rooms maximally.

Some people think that it's a good idea to organize housing as part of your travel grants. Those people are mistaken. Housing is a terrible idea all round:

  • It doesn't scale, up or down. If you're small, you can't negotiate a worthwhile hotel block contract. If you're big, the system crashes under its /O(N²)/ weight (see below).
  • It's not good for the financial aid recipients. While there are many nice things to be said about conference hotels, they are typically not economical. When we still organized housing, many financial aid recipients opted out: they could get significantly more bang for their buck otherwise.
  • It's not good for the conference. People leave, people join, people change dates, people have preferences (or hard constraints) about who they'll stay with... Doing this for any nontrivial number of people is a logistic nightmare; doing it for trivial number of people isn't worth it.

Having humans solve the allocation problem produces inefficiencies at larger scales (i.e. humans typically come up with fairly suboptimal solutions). It's a pretty tricky problem to solve even with computers (believe me, I've tried extensively), but computers will never solve the logistic issues caused by human factors.

PyCon used to manage housing for financial aid recipients. Getting rid of this was the single best decision I've ever made for the financial aid process.

It worked out quite well for the attendees too. Providing simple tools (i.e. the equivalent of a classifieds section) is more than ample to help people find great groups to room-share with. This increased opportunities for roomsharing, because it made it much less of a hassle to mix-and-match between financial aid recipients and other attendees.

Plenty of FA people stayed in large AirBnBs or the like in groups of 6 or more, and ended up getting fantastic deals that allowed them to stay an extra few days to attend other events like sprints and tutorials.

Before the conference: from applications to allocation

Applications

Keep it simple. Use a form generator (Google Forms or Wufoo or something) for data collection. All of the processing was done with simple Python scripts, most of it in an IPython/Jupyter notebook. This enables you to create well-documented processes, which helps everyone. CSV files are your best friend.

Make as many fields on the application form as possible directly translatable to something in your allocation process. Multiple choice and boolean values are your friend; the review process will make sure the applications are accurate. Have free-form fields for documenting things, but only use them in the review process. For example, you can have a multiple choice field for Python expertise ranging from beginner to expert, and then have a free-form field for applicants' portfolios.

Names are weird. There are lots of falsehoods programmers believe about names (link). You probably want to ask for a legal name; it's quite likely that you need to keep the legal name around for your records. Ask a lawyer and/or an accountant for details. However, you probably want to ask for an (optional) preferred name as well, which you should always use when communicating with them. There's a bunch of reasons those might be different, and people may have excellent reasons for not using their legal names. For example, a legal name might give someone away as being transgender. Sometimes, you want to do that just for your /own/ convenience. People will put all sorts of stuff down as their "name", but full legal names are quite consistent. This can be useful to match up records from different sources, such as your registration database.

Speaking of gender, asking for people's gender is also tricky. Make sure you have at least a cursory understanding of how gender works before you ask. Always have a "decline to answer" button, which is distinct from "other/nonbinary". If all you really want to know is whether someone qualifies for earmarked funds, just ask the specific question you want to know; e.g. "Apply for a PyLadies grant (people who self-identify as women only, please)".

As usual in programming, state is the bane of your existence. Keep it in one place whenever possible. A (Google Drive) spreadsheet works just fine. Your scripts should operate on data extracted from them (again, CSV works fine), but not store any state. This is trickier than it sounds, but the alternative is that you'll probably end up destroying some data.

Review

Reviews are easy to do in parallel, so get help from volunteers if needed. Establish clear guidelines for what your discrete values (e.g. Python experience) mean; not everyone agrees on what "expert" means.

Allocation

I've written about allocation before.

PyCon allocates approximately 15% over budget; i.e. the budget that we allocate is 1.15x the actual grant budget. This does not include aid in the form of e.g. reduced ticket prices; remember to account for those separately!

PyCon's no-show rate is somewhere between 10% and 20%, but it's very hard to predict if the factors that contribute to that will affect your conference equally.

As mentioned in that previous blog post, you don't always want to allocate the full grant asked for. People will still be able to attend with partial grants but typically wouldn't with an empty grant. Therefore it typically makes sense to reduce everyone's grant slightly if that allows you to provide more grants.

We ended up going with a fairly simple "flood fill" algorithm. An applicant's score maps to a fraction of the budget:

$$f_i = \frac{s_i}{\sum_j s_j}$$

Where \(f_i\) is the fraction of the budget you're willing to assign to applicant \(i\), \(s_i\) is an applicant's score.

If \(f_i \ge q \cdot r_i\) (where \(q\) is the fraction of the grant you're willing to allocate and \(r_i\) is what the applicant requested, you grant them \(q \cdot r_i\); otherwise, grant them 0.

Some applicants will be below this fraction, some will be above. They could be below this fraction because they have a very high score (e.g. they are a speaker), or because they're "low hanging fruit" and not asking for very much money.

That means that if you run the algorithm again, the fractions will be bigger; the people that received allocations in the previous round were allocated less than what would've been their "fair share". Rinse, repeat until you're out of money.

Communications

Have a central website where people can see their current status and updates, both specific to the applicant and generic to the entire process. Training people to expect information there will drastically reduce the number of repetitive questions you get to answer by e-mail, which will contribute enormously to your happiness. Therefore, if people ask questions that are answered there; answer, be kind, but point out where they could have gotten that information from.

Most of the things you will have to say will be generic points about the process. Nonetheless, I have spent huge amounts of time answering individual questions about that generic process, which got get quite tedious. Hence, even a static page that doesn't show any information specific to the applicant, but just explaining the process in detail is extremely valuable.

Being up-front about how your process works is also great for prospective applicants who wouldn't feel comfortable asking. This also attracts speakers to submit talk proposals; many speakers would not propose a talk because they know they can't afford to come without financial aid. Therefore, it's important to communicate clearly if you intend to support speakers, both through your financial aid communication and your call for papers.

Once that fails, send e-mail. Once you're over a dozen or so recipients, use Mailgun. Emails particularly automated from your personal email account is a good way to get stuck in spam filters.

Disbursement (giving out money)

First off, talk to your treasurer. Possibly talk to a lawyer, too. It's quite possible, particularly if you're in the United States, that giving out a bunch of money as a non-profit comes with some fairly complex strings attached. For example, you probably have certain standards in terms of what records you have to keep.

As a European, I found it somewhat comical to still see checks in active use, but hey; it works. For many parts of the world (apparently not the United States, though) wire transfers are how you send money to people. PayPal seems to work better for larger, established organizations that use it often. There is less of a problem with the accounts or funds being frozen. It does actively restrict (or, perhaps more accurately, enforces restrictions on) sending funds to certain countries, including Brazil and India.

Cash works, but is hard to scale. With PyCon's budget of well over $100,000.00, managing cash is clearly less than ideal.

At the end of the day, be pragmatic. Do whatever your recipients can accept. Be sure to ask ahead of time, as your financial aid programs may prove to be successful in bringing people in from very different countries, with very different cultures and very different means available to them for accepting your grant. This is the case for PyCon, but I consider that a superb problem to have.

Conclusion

I'd like to thank the following people in no particular order:

  • Ewa Jodlowska, for managing PyCon and being the love of my life
  • Van Lindberg, for continuously inspiring me to serve
  • Diana Clarke, for Charing PyCon US in Montreal

Conflicting threat models

As I mentioned in my previous post, we have a long way to go when it comes to information security. I'll be presenting a talk on building secure systems at PyCon 2015 next month, and I hope to blog more about interesting bits of comprehensible security.

I'm a strong believer in the importance of threat models. A threat model is your idea of what you're protecting against. It may seem obvious that you can't effectively protect anything without knowing what you're protecting it from. Sadly, simply contemplating your threat model puts you ahead of the curve in today's software industry.

Threat models often simply deal with how much effort you're willing to spend to prevent something from happening. In a world with finite resources, we have to make choices. Some models are unrealistic or prohibitively expensive to defend against. These questions aren't all strictly technical: perhaps some risk is adequately covered by insurance. Perhaps you have a legal or a compliance requirement to do something, even if the result is technically inferior. These questions are also not just about how much you're going to do: different threat models can lead to mutually exclusive resolutions, each a clear security win.

Consider your smartphone. Our phones have a lot of important, private information; it makes sense to protect them. The iPhone 6 provides two options for the lock screen: a passcode and a fingerprint sensor. Passcodes have been around for about as long as smartphones have, while fingerprint sensors are new and exciting. It's clear that either of them is more secure than not protecting your phone at all. But which one is more secure?

Most people instinctively feel the fingerprint sensor is the way to go. Biometric devices feel advanced; up until recently, they only existed in Hollywood. Fingerprints have their share of issues. It's impossible to pick a new key or have separate keys for separate capabilities; you're stuck with the keys you have. A fingerprint is like a password that you involuntarily leave on everything you touch. That said, turning a fingerprint into something that will unlock your iPhone is out of reach for most attackers.

Passcodes aren't perfect either. People generally pick poor codes: important dates and years are common, but typically not kept secret in other contexts. If you know someone's birthday, there's a decent chance you can unlock their phone. At least with a passcode, you have the option of picking a good one. Even if you do, a passcode provides little protection against shoulder surfing. Most people unlock their phone dozens of times per day, and spend most of that day in the presence of other people. A lot of those people could see your passcode inconspicuously.

Two options. Neither is perfect. How do you pick one? To make an informed choice, you need to formalize your threat models.

In the United States, under the Fifth Amendment, you don't have to divulge information that might incriminate you. I am not a lawyer, and courts have provided conflicting rulings, but currently it appears that this includes computer passwords. However, a court has ruled that a fingerprint doesn't count as secret information. If you can unlock your phone with your fingerprint, they can force you to unlock it.

If your threat models include people snooping, the fingerprint sensor is superior. If your threat model includes law enforcement, the passcode is superior. So, which do you pick? It depends on your threat model.

Disclaimer: this is an illustration of how threat models can conflict. It is not operational security advice; in which case I would point out other options. It is not legal advice, which I am not at all qualified to dispense.

We're just getting started

Most conference talks are transactional. The speaker has a point to make. After the presentation, it's "over"; only spoken about in perfect tenses. You've communicated your thoughts, perhaps had a conversation or two, but, mostly, moved on.

I've given talks like these. However, about two years ago, I gave a talk that had a deep impact on my life. That talk was Crypto 101.

Right before the presentation, cryptanalytic research was released that popped RC4. I couldn't have asked for a better setup. Turns out it wasn't just luck; eventually our systemic failure as an industry in taking security seriously was bound to catch up with us. Since then, the proverbial piper has been well-paid. We've seen a plethora of serious security bugs. Huge corporations have been the victims of attacks in the billions of dollars a pop. As I'm writing this blog post, there's an article on a new TLS attack in my reading list.

It quickly became clear that this wasn't just a one-off thing. I started writing Crypto 101, the book, not too long after giving the talk. We were, unwittingly, at the crest of a wave that's still growing. Projects like PyCA and LibreSSL started fighting tirelessly to make the software we use better. Security talks became a mandatory part of the programming conference food pyramid. My friends Hynek and Ying gave fantastic talks. They, too, got "lucky" with a security bombshell: Heartbleed happened mere days before the conference.

Last week, I presented Crypto 101 again at rax.io, Rackspace's internal conference. It was well-received, and I think I provided value for people's time. One thing, more than anything, it crystallized where we are. We're not done yet. There's still a huge audience left to reach. Interest in information security has done nothing but grow. With a total of just over 100,000 downloads for the book and about half as many for the recording of the presentation, people are definitely listening. We've made real impact, and we have people's attention, but we need to keep going.

One of the two talks I'll be giving at PyCon is a more high-level overview of how we can build secure systems. More friends of mine will talk in about TLS there too. Within Rackspace, I'm focusing on information security. There are awesome things brewing here, and I hope that we can continue the great work we've been doing so far.

We've accomplished a lot, but we're just getting started.

Updated GPG key

This message is also available as a Gist. I have also updated Keybase since writing the GPG signed message below.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

I am cycling my GPG key.

My old key has fingerprint:

D9DC 4315 772F 8E91 DD22 B153 DFD1 3DF7 A8DD 569B

My new key has fingerprint:

45DC 13EB 6A01 21E8 5219 8C09 8763 869B E2B2 663E

(If you're looking for the key ID, that's the last 8 hex characters of
the fingerprint.)

While my new key may superficially seem less secure, since I have gone
from a 4096 bit RSA key to a 3072 bit one, the new one has the
wonderful advantage of living on a smart card.

I have no reason to presume my old key to be compromised.  I have
changed the expiration date of my old key to March 7th of this year. I
have signed the new key with the old one.

I am in the process of updating https://keybase.io/lvh.
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJU54Q5AAoJEISZopctIA+8JqYP/2agk8RHNklaPqQ6JHdK7Rtu
ehtok2X7wcirWAridRK/l0Tfjl5x2lFJitb+rP5X3k30qw2FvoLF9YOvbQBzezR+
ma9S050GPlvs2knmRQb9f53KmjlmC9DLHT40f3BUJtUteH5X8KgEy2YfbThN2B4C
Z6P30w03gqMkOu5vpaUTe6wkTMpMeGfQz240Kwa3N84UkzzAP3dTBOkm1AiHDUeJ
yj4a9zz+qzayVGI0A1W5W8zd4+GK7Pant7I/lRd02jRQHoHtnbgiBm+5PbGvihFp
zdtrd9YDIhWJzo84qSawQCVAuhy+8CGMFqOHBtTo/BV6HklVLUuOdrfy+IwpV9Jh
cj2Cc5AauFYcWYzJkYL9MHj0b6UI4Uxx1OiAq7onBsajaIE97nLbt1j9A4I4Pb4d
7ub6YmTnwA5aLwqPbfl/egX5xKEIXq/TGcVnbpxY65fw4GsG/hJyq5JHrW43ATqX
sSTdnmIbjyw/PQFr+U0ddUfOnbITJKUElZnCami/JnZV6jDUOPY/Kn48nsxF6bk2
UatqaXpR7yQAvzHz9Yl2sZHcMw/TguumqwuYUQWLFUVZJmmc3iunCfFDVD9tiEz1
00M4PZxhIZt8zKDKIb0PSVa46yHt+kSlgtdgwIvvbuZn9TokXdp/n/DXvBkqohQg
De57mY9RnWwt5fy6AWd1
=lzks
-----END PGP SIGNATURE-----

Securing APIs with shims

Imagine that you had a capability URL, except instead of giving you the ability to perform a specific action, it gave you the ability to perform a (limited) set of operations on a third party API, e.g. OpenStack. The capability URL wouldn't just be something you exercise or revoke; it'd be an API endpoint, mostly indistinguishable from the real API. Incoming requests would be inspected, and based on a set of rules, either be rejected or forwarded to the API being shimmed.

Proof of concept

At my day job, we had a programming task that I thought logic programming would be well-suited for. Unfortunately, logic programming is kind of weird and esoteric. Even programmers with otherwise broad experiences professed to not being quite sure how it worked, or what to do with it.

Therefore, I used up my hack day (a day where we get to hack on random projects) to cook up some cool stuff using logic programming. I demoed the usual suspects (the monkey with the banana, and a sudoku solver), illustrating the difference between the relational nature of the logic programs and the imperative nature of the algorithms you might otherwise write to solve the same problems. Finally, I demoed the aforementioned proxying API shim. The proof of concept, codenamed shimmer, is up on Github.

Let's take a look at the handler function, which takes incoming requests and modifies them slightly so they can be passed on:

(defn build-handler
  [target-host target-port]
  (fn [incoming-request]
    (if (match (spy incoming-request))
      (let [modified-request (-> incoming-request
                                 (dissoc :scheme) ;; hack
                                 (assoc :host target-host
                                        :port target-port
                                        :throw-exceptions false))]
        (spy (request (spy modified-request))))
      {:status 403 ;; Forbidden
       :headers {"content-type" "text/plain"}
       :body "Doesn't match!"})))

(Those spy calls are from the excellent timbre library. They make it easy to log values without cluttering up your code; a godsend while developing with some libraries you're not terribly familiar with.)

The matching function looks like this:

(defn match
  "Checks if the request is allowed."
  [req]
  (not= (l/run 1 [q]
          (l/conde
           [(l/featurec req {:request-method :get})]
           [(l/featurec req {:request-method :post
                             :headers {"x-some-header"
                                       "the right header value"}})]
           [(l/featurec req {:request-method :post})
            (l/featurec req {:headers {"x-some-header"
                                       "another right header value"}})]))
        '()))

Future work

Make this thing actually vaguely correct. That means e.g. also inspecting the body for URL references, and changing those to go through the proxy as well.

Start collecting a library of short hand notations for specific API functionality, e.g. if you're proxying an OpenStack API, you should be able to just say you want to allow server creation requests, without having to figure out exactly what those requests look like.

The spec is hard-coded, it should be specified at runtime. That was trickier than I had originally anticipated: the vast majority of core.logic behavior uses macros. While some functionality is fairly easy to port, that's probably a red herring: I don't want to port a gazillion macros. As an example, here's conds, which is justconde as a function (except without support for logical conjunction per disjunctive set of goals):

(defn ^:private conds
  "Like conde, but a function."
  [goals]
  (if (empty? goals)
    l/fail
    (l/conde [(first goals)]
             [(conds (rest goals))])))

That's not the worst function, but let's just say I see a lot of macroexpand in my future if I'm going to take this seriously.

URLs and bodies should be parsed, so that you can write assertions against structured data, or against URL patterns, instead of specific URLs.

If I ever end up letting any of this be a serious part of my day job, I'm going to invest a ton of time improving the documentation for both core.logic and core.typed. They're fantastic projects, but they're harder to get started with than they could be, and that's a shame.

Reverse ungineering

(Title with apologies to Glyph.)

Recently, some friends of mine suggested that "software engineer" is not a good job title. While they are of course free to call their profession whatever they like, I respectfully disagree: I think "engineer" is a perfectly cromulent description of what we do.

This is an opinion piece. Despite arriving at opposite conclusions, the disagreement is feathery at best.

What if buildings failed as often as software projects?

To illustrate the differences between software development and other engineering disciplines, Glyph compares software to civil engineering.

For example, when it comes to getting things done, we're just not very good:

Most software projects fail; as of 2009, 44% are late, over budget, or out of specification, and an additional 24% are canceled entirely. Only a third of projects succeed according to those criteria of being under budget, within specification, and complete.

Such shenanigans would never be accepted in a Serious Engineering Discipline, like civil engineering:

Would you want to live in a city where almost a quarter of all the buildings were simply abandoned half-constructed, or fell down during construction? Where almost half of the buildings were missing floors, had rents in the millions of dollars, or both?

I certainly wouldn't.

Computers are terrible, but not quite that bad, as Glyph points out. "Failure" simply means something different for software projects than it does for construction projects. Many of those "failed" software projects were quite successful by other measures; the problem isn't with software projects, it's with applying civil engineering standards to a project that isn't.

Software projects aren't civil engineering projects. Attempts to treat them as such have done much more harm than good. That said, that doesn't mean that software development isn't engineering.

Firstly, civil engineering is the outlier here. Other engineering disciplines don't do well according to the civil engineering success yardstick either. The few engineering endeavors that do are usually civil engineering in disguise, such as the construction of nuclear and chemical plants. Rank-and-file projects in most fields of engineering operate a lot more like a software project than the construction of a skyscraper. Projects are late and over budget, often highly experimental in nature, and in many cases also subject to changing requirements. It's true that we just can't plan ahead in software, but we're not the only ones.

Secondly, we may be confounding cause and effect, even if we overlook that not all engineering is civil engineering. Are software projects unable to stick to these standards because it's not engineering, or is civil engineering the only thing that sticks to them because they have no other choice? Conversely, do we fail early and often because we're not engineering, or because, unlike civil engineering projects, we can? [1]

Finally, software has existed for decades, but buildings have for millennia. Bridges used to collapse all the time. Tacoma Narrows wasn't so long ago. If the tour guide on my trip to Paris is to be believed, one of those bridges has collapsed four times already.

But this isn't science!

Supposedly, software engineering isn't "real" engineering because, unlike "real" engineering, it is not backed by "real" science or math. This statement is usually paired with a dictionary definition of the word "engineering".

I feel this characterization is incongruent with the daily reality of engineering.

Consider the civil engineer, presumably the engineeringest engineer there is. [2] If you ask me to dimension an I-beam for you, I would:

  • spitball the load,
  • draw a free-body diagram,
  • probably draw a shear and moment diagram,
  • and pick the smallest standard beam that'll do what you want.

If you want to know how far that beam is going to go, I'll draw you some conjugate beams. I would also definitely not use the moment-area theorem, even though it wouldn't be too difficult for the reasonable uses of an I-beam.

Once upon a time, someone inflicted a variety of theories on me. Euler-Bernouilli beam theory, for example. Very heavy textbooks with very heavy math. Neither my physical therapist nor my regular one expect me to ever truly recover. Nonetheless, area moments and section moduli are the only way to understand where the I in I-beam comes from.

Nasty math didn't prevent me from dimensioning that I-beam. And I do really mean math, not physics: Euler-Bernouilli is a math hack. You get it by taking Hooke's law and throwing some calculus at it. Hooke's law itself is more math than physics, too: it's a first-order approximation based only on the observation that stuff stretches when you pull it. It's wrong all the time, even for fairly germane objects like rubber bands. Both theories were put together long before we had materials science. We use them because they (mostly) work, not because they are a consequence of a physical model.

That was just one example from a single discipline, but it holds more generally, too. I analyze circuits by recognizing subsections. If you show me a piece that looks like a low-pass filter, I am not distracted by Maxwell's equations to figure out what that little capacitor is doing. I could certainly derive its behavior that way; in fact, someone made me do that once, and it was quite instructive. But I'm not bothered with the electrodynamics of a capacitor right now; I'm just trying to understand this circuit!

This isn't just how engineers happen to do their jobs in practice, either. Engineering breakthroughs live on both sides of science's cutting edge. When Shockley et al. first managed to get a transistor to work, we didn't really understand what was going on. [3] Carnot was building engines long before anyone realized he had stumbled upon one of the most fundamental properties of the universe. Nobody was doing metaphysics. Sadi wanted a better steam engine.

To me, saying that I-beam was dimensioned with the help of beam theory is about as far from the truth as saying that a software project was built with the help of category theory. I'm sure that there's some way that that thing I just wrote is a covariant functor and you can co-Yoneda your way to proving natural isomorphism, but I don't have to care in order to successfully produce some software. It's easy to reduce an applied field to just the application of that field, but that doesn't make it so; especially if we haven't even really figured out the field yet.

So, even if the math and science behind computer engineering is somehow less real than that other math and science, I think that difference is immaterial, and certainly not enough to make us an entirely different profession.

But that isn't art!

Many people smarter than I have made the argument that programming is art, not dissimilar from painting, music or even cooking. I'm inclined to agree: many talented programmers are also very talented artists in other fields. However, I do disagree that those things are art-like unlike engineering, which is supposedly just cold, hard science.

There's a not-so-old adage that science is everything we understand well enough to explain to a computer, and art is everything else. If that's true, there's definitely plenty of art to be found in engineering. (That was a little tongue-in-cheek. Nobody wants to get dragged into a semantic argument about what art is.)

Even with a much narrower view of art, engineers do plenty of it, as I've tried to argue before. Not all engineering calls are direct consequences of relativity, thermodynamics or quantum mechanics. Sometimes, it is really just down to what the engineer finds most palatable. Even civil engineers, the gray predictable stalwarts of our story, care about making beautiful things. The Burj Khalifa wasn't a consequence of a human following an algorithm.

Conclusion

I think the similarities run deep. I hope we don't throw that away essentially just because our field is a little younger. We're all hackers here; and we're all engineers, too.

Footnotes

[1] I suppose this is really analogous to the anthropic principle, except applied to engineering disciplines instead of humans.
[2] I'm using civil engineer here in the strict American sense of person who builds targets, as opposed to the military engineer, who builds weapons. Jokes aside, perhaps this is related to the disagreement. Where I come from, "civil engineer" means "advanced engineering degree", and encompasses many disciplines, including architectural (for lack of better word; I mean the American "civil engineer" here), chemical, electrical, and yes, computer.
[3] While it is very easy to make up a sensible-sounding narrative time line after the fact for the breakthroughs in physics and engineering that eventually made the transistor possible, this ignores the strong disagreements between theoretical predictions and practical measurements of the time. Regardless of their cause, it would be foolish to assume that Shockley just sat down and applied some theory. The theory just wasn't there yet.

On multiplayer turn-based game mechanics

Most classic turn-based games, from chess all the way to Civilization V, are sequential in nature. A player makes a move, then the next player makes a move, and so on. The details can vary, for example:

  • There could be two players, or multiple. This number is tightly bound for scaling reasons, which we'll discuss later.
  • The game could have perfect information, like chess, where all players see a move as soon as it is played. The game could also have imperfect information, like Civilization V, where players see part of a move, but the effets may be obscured by fog of war.
  • The players may play in a consistent order (chess, Civilization V), or in a somewhat random one (D&D's initiative system).

All of those things are more or less orthogonal to the turn system. Players play turns sequentially, so I'm going to call these sequential turn-based games.

Sequential turns make scaling the number of players up difficult. Even with only 8 players, any given player will spend most of their time waiting. While 8 players are a lot for most turn-based games, it's nothing compared to an MMORPG.

An alternative to sequential turn-based play is simultaneous turn-based play. In simultaneous turn-based play all players issue their moves at the same time, and all moves are played out at the same time. The simplest example is rock-paper-scissors, but Diplomacy works the same way. More recently, this system has been explored by the top-down tactical game Frozen Synapse.

While simultaneous turn-based play gets us closer to making massively multiplayer turn-based games feasible by turning a linear scaling problem into a constant time one, we're not quite out of the woods yet.

Consider what happens when a player does not make a move. There are a few reasons that might happen:

  • The player is not playing the game right now.
  • The player has stopped playing the game altogether.
  • The player may be in a hopeless position, where stalling is better than losing. (Stalling may tie up lots of enemy resources.)

If you've ever gotten frustrated at a multiplayer game that has a "ready" system before you begin a game, but had to wait because one of the players disappeared; this is essentially the problem turn-based games face every turn.

There are a number of ways to mitigate this problem. Games can duplicate playing fields. That works for both sequential games like Hero Academy and simultaneous ones like Frozen Synapse. If a player doesn't make a move, that particular instance of the game world doesn't go anywhere; but you can play any number of games simultaneously.

For this strategy to work, the playing fields have to be independent. You don't lose heroes or soldiers because they're stuck on some stale game. The worst possible outcome is that your game statistics don't reflect reality.

That works, but rules out a permanent game world with shared resources. If there's a larger story being told, you would want these worlds to be linked somehow: be it through shared resources, or because they're literally the same game world.

There's a number of creative ways to get out from under that problem, usually by involving wall-clock time. For example, if a player doesn't respond within a fixed amount of time, they may forfeit their turn. Fuel consumption might be based on wall-clock time, not turns. [1] There's a lot of degrees of freedom here. Do you use a global clock, or one local to a particular area?

A global clock is probably simpler, but poses some game play challenges. How long is the tick? Too fast, and a player may see their empire annihilated while they're sleeping. Too slow, and the most trivial action takes forever. There isn't necessarily one right answer, either. In an all-out cataclysmic struggle between two superpowers, a complete tactical battle plan may take a long time. Any timescale that isn't frustratingly short for that situation will be frustratingly long for anyone trying to guide their spaceship (or kodo, depending which universe you're in) across the Barrens.

Local clocks have their own share of difficulties. You still need to answer what happens for anything that isn't in a particular battle; you still need to answer what happens when battles merge or diverge.

I'm currently exploring the shared global clock. In order to mitigate the issues I described, I'm contemplating two ideas:

  • Allow programmable units; a la Screeps, CodeWars...
  • Allow players to plan several turns ahead of time.

These are, of course, not mutually exclusive.

Footnotes

[1] I don't particularly like this, because it "breaks the fourth wall" in a sense. If my engines are still consuming fuel real time, why can't the enemy fire missiles? Either time is stopped, or it isn't. Sure, games can be abstract, but that feels like an undue inconsistency.