I don't understand REST pagination

Hey.


So, in txYoga, in implemented pagination in terms of query parameters to a collection. If you want to access the range of elements [100, 110), you would say: http://whatever/collection?start=100;stop=110. The feature here is that pages give you the link to the next and previous page. The response (JSON) looks something like {"results": [...], "next": nextURL, "prev": prevURL}. That way, you pretty much just have to follow URLs to walk down the collection as a doubly  linked list of pages. The supposed feature is that I can change how my pages work, and users hopefully won't notice, since they shouldn't be building their own URLs anyway.

Now, at the same time, I'm looking at dojo's JsonRest API. It's suggesting that I use the Range and Content-Range headers for pagination support. It also has a single JSON array as a response (so, [{...}, ...]) instead of my JSON object. I started RFC diving and hey look, they're right, HTTP understands ranges with arbitrary units. So, are my URLs wrong? Are people supposed to conjure Range URLs themselves? Why are these darned ranges inclusive? Another problem is that the specification doesn't really seem to support non-numeric ranges, whereas to txYoga, that's not really a problem as long as the underlying collection understands it.

Isn't REST supposed to be hypertext-driven? Aren't people basically supposed to never construct URLs, and rely on me to provide them to them? Where am I supposed to put those URLs? Yaaargh.


cheers
lvh

Crowdsourcing opinions on a REST API: POST, PUT or both?

As some of you undoubtedly know, I'm writing a thing called txYoga, which is a REST framework for Twisted.

The fundamental operations in this post are CRUD:
  1. Creating elements
  2. Retrieving them
  3. Updating them
  4. Deleting them
Now, retrieving is obvious: send a GET request to the element. Getting rid of them is too: same thing, but a DELETE. Updates are done using PUT to the element. Creating elements, though, is a bit more ambiguous:
  1. If the client knows the complete path to the element (so, particularly the final part of the new element's URI), it PUTs to that URI (for which there is currently no element).
  2. If the client doesn't know or care, it POSTs to the collection. The collection creates the element, and probably returns the URL under which the new element is available.
One of the fundamental differences between POST and PUT is that PUT is idempotent, but POST is not. If you send me the same PUT request once, twice or ten times, you'll end up with the same state. POSTing to the collection might work once (or pretty much indefinitely, if you're producing objects with an UUID identifier for example).

This is because in txYoga, each element has exactly one identifying attribute (which is the thing that's used to access the element). That's what makes PUT feels so awkward: what you're putting contains the information to decide what the name of the object is (since it has to have the identifying attribute), but the object doesn't know what name it's going to be put under (since that name only exists after the object is created), so you have to repeat yourself. With POST, you don't really care.

Additionally, you'd be using the same thing for updating and creating new elements, which I guess sort-of makes sense, but reminds me of Perl's autovivification and not in a good way.

The question is: should I support both PUT and POST for creating elements? On the one hand, There should be one-- and preferably only one --obvious way to do it. On the other, this is what REST pretty much does -- and practicality beats purity (although this isn't very practical for me...).

Reflections on stories from (disgruntled) Launchpad users

Hello again :-)


I tweeted something. I'm quite proud of it. It's exactly 140 characters, and it captures the essence of the last few days' worth of articles quite well. Obviously it makes concessions to accuracy for reasons of brevity, but here goes anyway:

It seems to me Github people use Github as a coping mechanism for git, and Launchpad people use Bazaar and a coping mechanism for Launchpad.

That explains virtually all of the feedback and workflow I've received. It seems Github users use git as a tool to get code into Github as quickly as possible, so magic and sprinkles can be added to it. That means the Github people have done an excellent job. It's managed to create an environment where you have awesome features like truly polished pull requests and a great code browser that lets you peek into someone else's repository without you having to worry about adding a remote, fetching objects, removing a remote...

Github tries to minimize your exposure to git -- hub even more so (with hub, clones and pulls look like they do with bzr for me. Huge win.). Even without having to call git's UX bad, that makes (business) sense: the Github people aren't the git people so they want you in their service as much as possible. And man, have they succeeded.

On the other side of that argument, there's Launchpad and Bazaar. Most of the discussions about Launchpad's UI warts (and, keeping in mind my last article, some of them are just a question of catering to a different audience but some of them really are simply faults) go something like this:
unhappy user: Hey lvh, take a look at $BOGUS_UI_CHOICE. Come on, you have to admit, that just sucks.
lvh: Huh. You're right -- that is pretty bad. I just never noticed because I use $BZR_INVOCATION to get to that information.

Case in point: merge proposals. Apparently they had an in-page diff. I had no idea.

Github users use Github to do things. Launchpad users use Launchpad as branch storage + metadata (bug tracker, merge proposals, primarily) so they can convince bzr to do things. When Github users go to Launchpad, perhaps without truly trying to use it, they see the prospect of using something like Github but worse. When Launchpad users go to Github, they see the prospect of getting to use git for everything instead of bzr. (And, let's be honest -- if you're diffing between remotes all day, Github is more pleasant than plain git, and bzr is too.)

cheers
lvh

On sysadmins, programmers, and reconciliation (a response to Zed Shaw's post)

Zed Shaw posted a response to my article on why people hate Launchpad. Apart from causing the number of readers to skyrocket by an order of magnitude, it's given me some new perspective on the problem. As always, I love feedback, especially if people agree with things I say ;-)

Assuming Zed's right (and I think he's at the very least got a point), my previous list of grievances splits up into two things:
  1. Things that make Launchpad more like Github. In Zed's terminology, make Launchpad less of a sysadmin place and more of a programmer place. Following Zed's conclusion, these are bad changes.
  2. Things that make everyone's life easier and aren't necessarily about one group versus the other. As the contrapositive to Zed's conclusion, they are good changes.
Now, I think at least some of the UI changes from that list are in that last group. Particularly the code browser UI issues (and they are legion -- Loggerhead is on occasion hard to like) are something I don't really see how anyone could object to. Concrete examples are:
  • Renaming "View branch content" to "View code". "Code" is a word programmers scan for. To quote Zed's article: code, code, code. Contrary to popular belief you can actually access trunk's code with a single click from the project's front page! It's just cleverly hidden.
  • Merging the branch and branch content pages, embedding Loggerhead in the page like Github's file browser, instead of making it a separate page. Sysadminny types probably wouldn't ever have looked at that page in the first place.
  • Removing dead project features (like translations, blueprints, answers) for J. Random User
There are, of course, also ideas that would piss off the syadminny group, like moving series around or putting a Github style code browser on the overview page. (Probably explains why I dislike that last feature.)

But yeah: if you fixed every single point in that list of perceived flaws, my in-brain mockup of what Launchpad 2.0 would look like would still decidedly be Launchpad and not Github: which is probably what Zed is talking about. And, like he said, that doesn't actually have to be problem.

I agree that both Github and Launchpad would be very hard-pressed to transform into something everyone likes, but the "why" of that wasn't quite clear enough in my head yet for me to write a blog post about it (my posts are bad enough when I'm convinced I do know what I want to say). Zed's article helped quite a bit there.

The difference is I have no idea what this third system that caters to both in the same place would look like yet. Zed probably has a better idea of what he's talking about than I do.

My idea is different. It's probably a bit worse, since you'd have N places where code live instead of 1, but at least in the short run it seems like less effort to end up with a working thing. The idea goes like this: if Github and Launchpad are really different beasts catering to different beasts, maybe we shouldn't try to make them be the same, and instead let them cooperate. This is why I think (well, hope) the idea of bridging Launchpad and Github -- having them play nicely together in the same sandbox instead of the current situation we're they're direct competitors -- may have some value to it.

I'd really like some feedback on Github fans on that. I know Launchpad people are going to like it since they can pretty much just use Launchpad and don't have to care Github exists. I'm hoping people don't find Launchpad so revolting that using its bug tracker and merge proposals (as cited in that article I linked, features at least as good on Launchpad as on Github) becomes a contribution blocker.

cheers
lvh

Bridging the gap between Launchpad and Github

Hey.


Ever the diplomat, I've been trying to figure out a way to use Launchpad yet keeping the people that dislike it happy.

It is based on two premises. If you disagree with them, you may as well stop reading now. (or, even better: tell me why)
  • Launchpad's bug tracker is preferable, or at least equal, to Github Issues.
  • Launchpad's merge proposals are identical to Github Pull Request 2.0's, with the exception of bug tracker integration.
First of all, the project uses Launchpad as a bug tracker, and Launchpad merge proposals as the mandatory code review for getting code into trunk. Launchpad trunk is constantly mirrored into a Github repository.

Launchpad users that want to develop the project pretend Github doesn't exist.

Github users that want to develop the project pretend Launchpad doesn't exist. They just fork the repository and make some changes, as they normally would. Once they want to get some code into trunk, they file a pull request, which is where the magic starts.

A bot creates an alternative merge proposal equivalent to the pull request. There are two options for doing that:
  • Grabbing the pull request diff
  • Using bzr-git to import the branch into git
(The latter is preferable because it preserves commit structure. The former is probably easier to implement and less hairy to implement/maintain due to impedance mismatches.)

Either way, a new branch is created with the same name as the Github branch. Ideally, the merge proposal is linked to the appropriate bug. That means the branch should contain information about the bug it relates to (possibly in the name, eg 54321-fixQuantumTransmogrifier), since, as far as I know, git has no equivalent to bzr commit's --fixes. The pull request gets a link to the merge proposal. All review-related work happens in Launchpad, not Github. Based on the second premise, this is acceptable.

The obvious problem here is that Github users can't really monitor what's going on in terms of development in Launchpadland and vice-versa. A solution may be to push more branches, but I dislike doing that. Perhaps using their respective APIs, the data can be syndicated into a common source. Ideas welcome.

cheers
lvh

A compiled list of Launchpad's perceived flaws

(Note: just because I say "perceived" flaws doesn't mean I don't agree they are. It's just a way to encourage feedback from people who disagree.)

I've compiled the responses I've received so far to my previous article about Launchpad. This is intended as constructive criticism and hopefully stuff that can be used to build a better Launchpad.

I'd like to thank everyone who bothered to have a sensible, adult discussion about Launchpad with me. I very much appreciate your feedback, even the rants. The Launchpad people are also expressed interest in this previously unreceived feedback (which makes sense, I guess). There certainly were a lot of you! If it ends up making Launchpad a nicer place for everyone maybe it's worth the time it took, though.

The vast majority of complaints I've seen are about Launchpad's interface.  People perceive the Launchpad interface as hard to use and anything but obvious. They particularly feel there's just too much noise, making useful features hard to find.

I tried to distill stuff that smells like it's easy to fix to me. It didn't always work, of course. Some problems (primarily about Loggerhead) require non-trivial amounts of work.

Here are some concrete examples:
  1. As a user, I don't care that a project doesn't use Launchpad for translations, but it takes up the spot of a primary feature in the main menu and the Get Involved box. It should only be shown to the Maintainer (a single user or a group). This also counts for  Blueprints and Answers, except it's less obvious what they mean to people, apparently. Most projects really only need Overview, Code and Bugs (in fact, Github and Bitbucket do away with the entire Overview tab, too -- but that's not universally seen as a requirement for a usable thing).
  2. Latest reported bugs, latest questions and the FAQ box on the landing page are considered unnecessary noise: it's duplicated on the respective specific pages, and the people who actually need that kind of notification probably should get e-mails anyway.
  3. Announcements on the other hand are a far more important feature, but they're tucked away: on the Bazaar's Launchpad Overview page, I have to scroll down to find it. Perhaps they should have a more prominent position on the page.
  4. Downloads are important enough that they probably warrant a separate page. The Overview page's Packages in Distributions pages can be moved there, too, further decluttering the Overview page.
  5. A lot of people don't understand Blueprints or consider it a misfeature. Fixing the first point would obviously alleviate that since it'd be opt-in.
  6. As a user, I can't figure out how to browse the code. "View branch content" ought to be called "View code", or something -- it makes sense to a Bazaar/Launchpad user but not to J. Random Contributor.
  7. People really like Github's Readme Driven Development. Perhaps the overview could be grabbed from the development focus' README file? Admittedly some README's may be too large for this to work well with the current landing page.
  8. Subscribe to bug mail is a primary feature on the project landing page, even above the Get Involved box. This doesn't make sense: that's a Bugs specific feature, and the Bugs page has an identical feature already. It could just be removed without loss of functionality.
  9. Series and milestones are a code feature: I don't particularly care about them on the overview page, and they would probably be better off on the code page. Preferably after the branches or at least foldable, since it does take up quite a bit of screen real estate. In order to parse it and turn it into useful information, you probably need to be more than a little involved in the project already. The idea is that not sufficiently many users care to warrant such a prominent place on the landing page.
  10. Loggerhead's date format is hard to parse. Github and Bitbucket use relative, human-readable timedeltas, except for dates sufficiently far away (where they use a plain date). You could still make the real date accessable on mouseover if people wnat that feature.
  11. Loggerhead is very poorly integrated with the rest of Launchpad: it feels like a tacked on feature. Compare to Github and Bitbucket where code viewing is an integral part of the user experience.
  12. Somewhat conjoined with the previous point: there's a separate page for a branch and the branch content and that confuses people. Perhaps they could be integrated without overloading branch pages. (One particularly important feature is merge proposals, but they're accessible from the Code page).
  13. People are unhappy with the lack of progress on Loggerhead. I feel kind of bad about this point since it makes me sound like I'm calling people lazy bums, but journalistic integrity demands I report it. I had quite a few bugs listed as examples. I don't know if these are unique or illustrations of a more general problem.
  14. Although merge proposals are used for code review, people are confused by the Code page saying "code review" when it means "merge proposals". It would be nice if everything said merge proposals. Or is the point here that Launchpad might integrate with other code review tools?
  15. Some people can't use the e-mail interface because the DKIM whitelist is hardcoded.


To be continued.

cheers
lvh

Why do people hate Launchpad so much?

Following a discussing on Convore recently, I noticed some people (names withheld to protect the guilty -- I'm sure they will speak up if they feel like it) strongly dislike Launchpad. Apparently they dislike it so much they would rather use a different project if they find something that uses it.

I don't get it.

First off, lI'm not trying to make this a git vs hg vs bzr vs whatever discussion, except perhaps to the extent that those tools shape their respective hosting system. Not that I think those discussions have to devolve into religious wars (a subject on which I have blogged recently), it's just that I'd like to limit the scope so we end with a bunch of things that can actually be improved.

My main problem with the 500lb gorilla in the room is its issue tracker. I know, it's a simple feature tacked on later, and all in all the ability to link to branches and revisions and to be able to close tickets from commit messages ain't so bad.

One particular unfortunate choice is that they decided to make pull requests issue tracker artifacts. I'm convinced it should be an attachment to one, instead. Pull requests are about merging code. There must be a reason to do that -- that reason is a ticket. That means if you use pull requests as a mechanism for code review (which they seem to be marketed as), ergo having all code be added using them, all changes to trunk have at least two issue tracker artifacts: the ticket describing the change, and the pull request carrying the code that does it. I think that's a bit silly. Pull requests hence often become somewhat of a ticket in their own right, which works well for one-time contributions which are typically limited in scope, but it seems this results in an inconsistent workflow which is a bit more annoying for things that actually do require a ticket.

Launchpad's issue tracker, on the other hand, is one I particularly like.

It's unique in the sense that it lets you track the state of bugs in different parts of downstream. Additionally, bugs aren't unique to a project -- allowing people from different groups to collaborate on the same issue. It lets me assign bugs to a person: a feature I sorely miss in Github. The equivalent of Github Issue's voting feature is just a "this affects me" link: similar to Google Code's starring idea. The goal is always the same: measure the importance of a bug without getting the noise of a million "me, too" messages. (Github and Launchpad's way of doing it seems more effective, though.)

Launchpad's alternative to pull requests, called merge proposals, are quite similar to Github's pull requests 2.0. The main difference is the bug tracker interaction: like I've described above, they attach to issues, instead of being them themselves.

A complain I often hear yet don't quite grok is the apparent difficulty of getting to code. Personally, I don't quite see the problem. The main branch (lp:projectname) and its contents are one click away from the front page. All of the branches of a project (from everyone! not just the project leads), are visible with a single click from the project front page, under the "Code" menu. I must admit I rarely use that web UI: instead, I just bzr co lp:projectname if I want to take a look.

The contributor story is also fairly pleasant, in my opinion. I don't even need to fork anything (at least not explicitly): just get the branch, make some changes, push them to lp:~lvh/projectname/mybranch and done. That magic syntax isn't Launchpad-specific: that's called a bookmark, and you can just as well have your own if you don't use Launchpad.

So what exactly is the problem? I'm planning on moving a project from Fossil to something else at Pycon -- but if everyone hates it so much, I'm willing to reconsider. For example: I'd use Github if I can find a useful issue tracker to go with it. I'd use bitbucket if there's some reasonable way to have in-repo pull requests (like Github pull requests 2.0 or Launchpad merge proposals).

Religious wars considered harmful considered harmful

Hey.


Can we stop avoiding subjects because we've elevated them to the state of personal opinion? Especially since "personal opinion" usually ends up meaning "dogma", and is quite often used as an excuse to cover up ignorance.

The usual suspects here are of course:
  • vim versus emacs versus whatever
  • git versus mercurial versus bazaar versus whatever
  • twisted versus gevent versus eventlet versus tornado versus whatever
  • django versus pyramid versus web2py versus web.py versus cherrypy versus whatever
Here's why. Not every discussion about them has to devolve into a full-blown flame war. Yes, there are trolls. Yes, there are clueless fan-boys who've drunk one glass too many of the delicious Kool-Aid. We have those pretty much everywhere else, too, though: and we just ignore them. But we're not trolls -- we're reasonable adults with different favorite tools. Perhaps that's based on extensive experience. Perhaps it isn't. Perhaps it's based on stuff that's not even true anymore. Can't we talk about them as one reasonable adult to another? Preferably without resulting in name calling and blatant demagogy.

Dismissing discussions a priori effectively means you've decided, up front, that they're going to have a net negative outcome. At best, that's insulting to the people you'd be discussing with. At worst, it's a monumental display of arrogance, since you've just decided that there is nothing that can be learned from people who have done things differently.

I'm not talking about people who mindlessly promote their favorite version control system or editor or library, without having given the alternatives the light of day. They're annoying in their own right, but they're nothing compared to the people who then proceed to trash-talk the alternatives without having any real arguments. Like something someone smarter than myself said: "if you promote your project by dissing the competition, I'll assume that your project sucks or you're a dick. Either way I won't use it." -- except it's not even their project.

If you don't know what a mark ring or a kill ring is, any disparaging comment you make about Emacs is probably worth a troll stamp, sorry. If you haven't used anything but git, I'm not particularly interested that you think everything that isn't git is dumb. If you didn't manage to get through the finger tutorial and you've never used any of the alternatives either, I don't really care if you think Twisted is complicated. Quite frankly, you don't know what you're talking about. Your braindead statements don't contribute to anything but my blood pressure.

I've tried to have sensible discussions countless times, and when I succeeded, it ended up being perfectly civil and showing someone how maybe not everything the enemy is doing is so bad. Perhaps they have ideas worth stealing. Perhaps we can stop worrying about what separates us, silence the trolls like we do everywhere else, and focus on what we have in common.

cheers
lvh

Pycon US talk

Finances permitting, I will attend Pycon US 2011. I will also try to give a talk about most of the meat and potatoes on this blog.

What do you think should be included? What do you think shouldn't be?

Cautious deployment for continuous availability

In my last post, I talked about cautious deployment as a technique for recovering from botched deployments. Cautious deployment also solves a second problem by helping you get continuous availability.

In the article about cautious deployment, I had the following graph for hypothetical, optimistic deployment:
[[posterous-content:L2vALkn09tFOvqjIkPGw]]

Obviously, that's a bit simplistic. It would be correct, if all your servers decide to restart the service in a perfectly synchronized fashion, and all of your servers manage do that operation in precisely identical amounts of time, and no interdependencies between your services.

Typically: none of those things are true. starting services is a stochastic process and you really don't quite know how long it'll take. Stuff depends on each other all the time. Hence, the graph is more likely to look something like this:

[[posterous-content:67uOLIMpCLyTivUYZ5MS]]
Of course, it's occasionally way worse than this. Your users generally don't know nor care that you're upgrading, so those few early starters get to deal with a disproportionately high load. That could happen up to the point where watchdogs mistakenly believe the process has just crashed. That watchdog then orders for that process to be restarted, which of course only makes it worse for all of the other processes still up.

(If you're thinking "Oh come on. It's nowhere near that bad" -- how do you know? Did you measure that? For some people, it definitely is.)

This problem is often overlooked, because there's tons of ways of missing it completely (mostly because statistics is hard).
  1. Not even bothering to analyze (unfortunately the most common)
  2. Not recognizing the worst case
  3. Normal distribution not being a good fit
  4. Failure to recognize the weakest link
Unfortunately, most of the people fall in group one. Come on guys, we can all do better than this. If there's two recurring patterns in modern (I hate to use the word, but...) agile businesses, that comes back across the board it's "measure" and "iterate". Even rudimentary tests are better than flying blind.

Number two, failure to recognize the worst case, generally happens when people do some kind of basic analysis. That generally amounts to taking a bunch of values and then computing the mean/median and working from there. Not recognizing the worst case is really a special case of not really assessing the distribution of your values correctly.

A mean is pretty interesting by itself, but you can't get too much useful information out of it completely by itself. Knowing a service starts within a second on average is pretty useless when it has a standard deviation of a minute. (If that's the case, descriptive statistics just showed you that your data probably doesn't fit a normal distribution very well: the lower limit for physically possible events is at Z=-1/60, unless your servers start up in negative amounts of time...)

First of all, that service might have things that are waiting on it to get started. This is the weakest link argument: your slowest step just became a defining factor for the performance of the entire process. Secondly, people perceive jittery things worse than consistent things, all other things being equal. Imagine if Google loaded a lot faster often (would you even really notice?) but about as often as it would load faster, it would take five seconds.

Even when you have people who do some pretty reasonable statistical analysis on individual parts, that becomes only half the story in a distributed system. Even with measurements from real deployments, it's extremely hard to predict how these things will behave.

Bottom line? Statistics is hard, and it's easy to get fooled into thinking you've got it right even when you're miles off (I've been bitten by this myself, more than once). Once you've done that, set up a sandboxed miniature model for your entire (distributed) system, and check the assumptions from your statistics against it. After that, you're only likely to miss scale issues. 

Even when you take all of that into account, you will get things wrong. That was my original point about cautious deployment: as long as you have that and it works, you're allowed to get it wrong once in a while. Fail gracefully. It's not so scary to leap the chasm when there's a net to catch you.