What is Everest? - Rogue Science

A potted history of Everest by our Founder, Rohan Byrne.

Explain like I’m five...¶

An example from the toy box¶

Here’s a question: what is the ideal collection of LEGO? To be clear, we’re not talking about the collector’s idea of what’s ideal (first edition toy sets still in mint condition), but the five-year-old’s idea. What is the ideal (minimal and optimal) set of individual LEGO pieces to own if one’s intention is to use them to build things?

This question occurred to me one day when I was playing with my young half-brother: the proud owner of a sprawlingly immense collection of LEGO. Picking through the vast array of unique and nifty pieces, I only occasionally came across a hand-me-down - a remnant of the (on hindsight) rather small collection that had sufficed for me and all my older siblings back in the early 90s. Despite the limited total volume of our kit, I only rarely recall being frustrated for lack of a certain kind or quantity of pieces, even as I recall the range, detail, and scope of my many proud accomplishments in pre-teen engineering.

By contrast, my younger brother seemed constantly frustrated with the tools at his disposal, even though he had access to everything I had had and much, much more. What was the issue? The much higher proportion of highly-specialised pieces was certainly part of it: the standard two-by-four brick in four primary colours may not have been glamorous, but it was certainly versatile. And the daunting total quantity of pieces was also self-evidently a problem: sifting through all that clutter for the one piece one actually needed was an imagination-stifling chore.

Clearly some optimum had been reached and passed sometime in the intervening thirty years. But what was that optimum? What is it sensitive to? And how could such a solution even be articulated, let alone calculated?

As it turns out, not only does this specific problem have many important real-world equivalents, but the general form of it speaks directly to what Everest is all about, how it works, and why it’s needed.

First, some quick terminology (yes, LEGO has official guidelines for these). A LEGO ‘brick’ is the piece that leaps to mind when we think of LEGO: a cuboid slab of plastic, all of the same height, but coming in a range of lengths and widths, with the two-by-four size a particular classic. Bricks also come in varying heights, but only as multiples of the height of the standard brick. A ‘plate’ is like a brick, but a third of the standard height. Three of the same kind of plate stacked atop each other reproduces the brick of the same length and width. Multiple pieces connected together in this way are collectively called a ‘build’. The little round pips and holes that allow the pieces to snap together to form builds are called ‘studs’ and ‘antistuds’ respectively, and the big stud-covered mat that most LEGO sets begin with are called ‘baseplates’. There’s a lot more we could name, but let’s restrict our analysis to just these for now.

One thing that’s clear already is that many LEGO pieces can be reproduced as builds of other pieces. Consequently, one trivial answer to the question of what makes an ‘ideal’ kit is that you only really need a sufficient supply of two-by-one and one-by-one plates. The two-by-one plates alone can snap together to reproduce almost any brick (and much more besides); the only brick type that two-by-ones can’t reproduce is the one-by-one brick, for which three one-by-one plates will suffice.

We might put our hypothesis to the test by buying our half-brother a gleaming new LEGO kit comprising nothing but tiny atomic plate pieces. But we don’t need to run that experiment to guess where it will end up. Part of what makes LEGO such an effective substrate for the human imagination is that it is chunky and tactile. Ask any kid what their favourite piece is and they will most likely point to one of the big ones: not because big is better per se, but because the right choice of big pieces can send you a long way toward your envisaged endpoint in a few short strides.

In other words, parsimony is important. A good LEGO build not only uses the right pieces in the right places: it also uses as few pieces overall as possible.

With this in mind, we could leap to the opposite extreme and envisage a LEGO kit where every single piece is unique: in other words, every possible build is available as a single piece. The total number of pieces in any given build will therefore be one, with the tradeoff that the total number of kinds of pieces will be infinite. But this would of course be both silly and impossible.

Clearly, somewhere between these two extreme options - one minimising the variety of pieces, one minimising the complexity of the builds - lies the optimum we’re looking for. We don’t know what that optimum is, but we can deduce some very facts about it: we can figure out what it is sensitive to.

One sensitivity is the nature and variety of the LEGO builds we desire to construct. If we’re interested in building battleships and castles, we probably want a good supply of big, chunky bricks in a small handful of colours - mostly grey. If on the other hand we’re more interesting in making colourful mosaics, we really only need a selection of plates - especially the maximally versatile one-by-one kind - but in every colour under the rainbow. A kit that is optimal for one purpose will be an absolute pain for the other - even if the same range of builds is ultimately possible with either kit.

Another sensitivity is to the pain and inconvenience of outliers. Anyone who has played with LEGO will have felt the frustration of working towards a beautiful plan only to fall short by one or two key pieces. For most kids, this is not the end of the world. The imagination is adaptable: the lord of the castle will not object if his walls are somewhat technicolour; the captain of the battleship can make do with one less turret. Conversely, for many adult fans of LEGO - and especially professional LEGO sculptors - such compromises are totally unacceptable. They would rather own thousands of dollars of redundant pieces than be short by even one, even once. (The market for pieces-on-demand is also, accordingly, vast.)

These inquiries may seem frivolous, but we only need to view the situation from the LEGO corporation’s viewpoint to raise the stakes. This is a company that almost went out of business at several points because of overstocking. And whereas we only need to solve for the preferences of a single obstreperous kid, the toymakers need to solve for the model-building preferences of the whole world.

Thinking about thinking¶

No doubt a room of highly-paid mathematicians in Legoland work around the clock to keep on top of it all, but of course, the primary way such large systems are coordinated is with the aid of free-market economics. Pieces that sell get remade; pieces that don’t, don’t. Substitute LEGO pieces for any kind of commodity, and substitute LEGO builds for any kind of industrial process, and you find that our purely whimsical speculation has the most serious and far-reaching implications. The difficulty of attaining or even defining the ‘optimum’ arrangement doomed the socialist central planning of the 20th century, just as the waste and volatility of decentralised approaches plagues the capitalist markets of the 21st century.

But it should not surprise us that LEGO proves such an apt microcosm of something as consequential as the world economy. The dynamics of the LEGO system are ‘scale invariant’ - they manifest at every level, great and small - because they emerge naturally from the root matter of cognition: the subtle art of saying more than we meant.

The mathematical field that mostly obviously relates to our LEGO problem is ‘combinatorics’: the study of discrete pieces and how they can be combined. Combinatorics shows up in everything from biomedicine to game theory; its unanswered questions are the subject of multimillion dollar bounties. But combinatorics merely picks up a common thread that in fact unites many different disciplines: how to construct complex things from a choice of simpler things according to a set of rules. When you put it that way, it becomes clear that combinatorics overlaps with core elements of dozens of other fields. Linguistics has formal grammar; pure maths has abstract algebra; physics has statistical mechanics; computing has type theory; management has systems ontology. LEGO has this essay. In short - to drive our various carts - we have reinvented the wheel ninety-nine times.

What we have here is a blind spot: something important, sitting right in the middle, that we don’t even know we can’t see. And it’s a perfectly natural blind spot too - because we humans (and systems generally) are really bad at describing those things we do instinctively. In this case, the thing we’re doing instinctively - the thing that’s crowding out our understanding - is the thing we’re doing right now: language. Language is the spoken manifestation of a deep facility that humans possess for chunking and combining experience to build abstract mental impressions: impressions that can feel as real to us, or even more real, than the world of the senses.

This is a thing we do so well that we don’t even know that we’re doing it. Indeed, we only realise we’re doing it when we try to formalise and systematise some other endeavour - e.g. organising the world economy or building a LEGO model. Then do we find our fingertips brushing up against the granular matter of “what we’re talking about”, and - thinking it unique to the purpose that motivated us - construct new metalinguistic and metacognitive instruments whose universality we never even suspect.

I am all too familiar with this trajectory because it is exactly how I arrived where I am today. I began with a particular class of planetary models. Interrogating those models deeply, I arrived at questions - not even as explicit as questions; call them ghosts - that I couldn’t even articulate, let alone answer. It was my inability to do what I was told that led me down the many rabbit holes of combinatorics, algebra, grammar, looking for ‘the’ model - the model of my model - that would help me make sense of what I was doing. My notebooks from this period are full of question marks; not even question marks at the end of sentences, but floating in the margins, detached from meaning, bobbing on the surface of the deep.

I gradually found myself to be in good company. There is a long history of attempts at what you might call ‘universal systematics’: a history littered with aborted projects, like the proliferation of mobile phone connectors, each claiming to be the standard to end all standards, only to become yet another competing standard. But in truth, it’s really not important for there to be a ‘universal’ system: it only matters that we have at least one ‘system of reference’. It doesn’t even have to be a very good system. It only needs to be capable of capturing the essential features of every other system, and thereby provide a protocol for communication between them, and what’s more, a ‘ground truth’ that allows them to understand each other without needing to be personally acquainted.

Such a system is what Everest endeavours to be: or rather, Everest’s core object model, Ptolemaic, for which Everest at large is just a set of tools for creating, combining, and organising such objects.

It really is embarrasingly simple. But it works.

From toys to models¶

It may still be a bit unclear in what way Everest - and the business of knowledge-making generally - relates to the LEGO metaphor we have gone to such trouble to expound.

Let’s take a really specific example, from early in my PhD - before Everest was even envisaged.

I had built a model I was pretty proud of. Picture a two-dimensional box with a bunch of swirling colours in it, representing a volume of sticky flowing stuff: runny and buoyant when hot, sticky and heavy when cold. The idea was to characterise how the flow of stuff inside the box changed as a function of lots of different variables - the shape and size of the box, the temperature differential, the initial configuration of the fluid, and more. Naturally I would have to run this model lots and lots and lots of times - with each model ‘run’ being a kind of ‘sub-model’ of its own.

There are lots of challenges that arise when you try to run a model many times - all the challenges which Everest is uniquely designed to surmount - but one specific one we’ll dwell on here is the challenge of how to store all that data efficiently. Of course, I wanted to store each model’s data together with all the metadata I would need to run that same model again: after all, if you can’t reproduce it, didn’t happen. But the metadata was not trivial. In fact, it was pretty heavy stuff. That’s no problem when you’re just running a handful of models. But when you’re running thousands of models, storing the full metadata with each one is not only wasteful: it may not even be feasible. What’s more, there’s the question of at what level exactly the metadata should be appended. After all, in a sense, every single number produced by a model is a legitimate and potentially important result in itself. Should every number be tagged with all that metadata, just in case?

The solution, of course, is to bundle the models as much as possible, so that the metadata shared between all the models is at the top, and the metadata shared by only a portion of the models is at the next layer down, and so on. Then you just have to be disciplined when you pull the larger dataset apart, being sure to take not just the metadata immediately attached to the target, but all the metadata in all the layers above that as well.

(Though I stumbled upon this approach independently, it actually happens to be a widely acknowledged best-practice: an important part, for instance, of the relational paradigm underpinning SQL and many other data systems. Unfortunately, like a lot of best practices, it’s astonishing how ubiquitously it’s ignored.)

Now think about what we just did here in LEGO terms. The overarching model is a bit like the LEGO kit: the whole box with all the pieces in it, together with the rules allowing their interconnection. The ‘pieces’ are the parameters of the model, which, in this very simple example, are basically just being squished together in a hodge-podge to define a particular ‘instance’ or sub-model ‘under’ the larger model.

Already we’ve saved an enormous amount of trouble. The information about the model as a whole can be written down all in one place, and every unique ‘run’ of the model can be written down very efficiently and concisely as just a bag of numbers ‘under’ that top-level overview. The approach can be repeated indefinitely. We could say that bundling numbers ‘A’, ‘B’, and ‘C’ together under a particular model creates a sub-model, under which we may then bundle numbers ‘D’, ‘E’, ‘F’, and ‘G’ to create a sub-sub-model, and so on and so on. What you end up with looks a lot like a file tree, and that’s basically what it is: a particular model is nothing more than a ‘path’ to ‘reach’ that model from some set point of reference (in this case, the ‘LEGO box’ as a whole).

The very first version of Everest was really not much more than a tool to do exactly this, but for arbitrary models - not just my planetary models. Even this rudimentary tool enabled a huge step up: from running mere dozens or hundreds of related models to running thousands or tens of thousands. And it was my ability to do this, quickly and reproducibly, that got me involved in the pandemic (though I ended up working on something quite different at first).

But I was all too aware of the limitations: they nagged at me. I knew there was something much deeper going on. Enter the latest version of Everest.