Why Wagn Matters

New ways of organizing information can help build a brighter future, but not by using the interfaces of the past.

 

Wagn, which began as a wiki with a clever twist, has evolved into something considerably more: a window to the future of data interface.

 

Conventional, relational databases can't keep up with a world working towards deeper and richer data integration. Why should more of us care about something so hopelessly nerdy?  Because data integration will transform economies and enrich lives. Deep data integration opens up the possibility of small and medium-sized economic players collaborating as efficiently and effectively as departments within a large corporation. Any serious approach to advancing local economies includes bringing local businesses out of the Paper Age and into a world of richly networked information. The feedback systems vital to flourishing ecosystems, households, and communities won't be stuck in isolated data tables; they'll be woven together in rich data tapestries.  Deep integration can mean thriving in ways we have scarcely considered possible.

 

Integration calls for distribution.  If my data lives in my silo, and yours lives in yours, it's very difficult to integrate them together. Distributed databases, which can store and query structured information in disparate locations across the web, are vital to opening up deeper collaboration. But distributed databases have been cursed by ill-fitting interfaces. Almost all distributed database projects have focused exclusively on new back-end architectures while defaulting to the same old front-ends that evolved around conventional databases. It's a round datum in a square table.  Distributed data is of necessity organized differently, and the user experience should reflect that new organization. Failing to do so complicates development, adoption, and the critical work of determining appropriate early applications. Could it be that the greatest challenge with distributed data (and thus integrated data) is learning how to interact with it?


Wagn can teach us how to work with distributed databases.  Its data aren't distributed, but they're distributed-data-like and distribution-friendly, and that's enough to make it an excellent interface laboratory.  Not that Wagn was conceived as a lab; it's been developed all along to get things done.  Its data structure and interface have co-evolved to be effective and powerful, not academic. It's pretty much just dumb luck that the resultant data model so closely parallels the structures used in distributed databases.  Wagn is exploring several key questions that really must be answered for distributed databases to flourish:

  1. How do we break information into appropriately-sized chunks?
  2. How do we move fluidly between interacting with a single chunk and multiple chunks of information?
  3. How do we manage permssions on all this information?
  4. How do we integrate this information into attractive public websites?
  5. How do we display query results that can contain all different sorts of information?

 

 

Wagn is not only helping to answer these questions; it's also providing a roadmap for developing distributed databases incrementally.  As Wagn has evolved, it has explored many aspects of distributed database interface independently, and it turns out they're highly useful even before the full system is in place.

 

In all technology, interface and functionality have a lot to teach each other.  Anyone interest in the future of distributed databases (and that should be lots of us) should pay attention to what Wagn has to offer.

 

Distributed Databases 

 

First off, what are distributed databases?

 

Data distribution refers to storing connected information in disconnected places.  In the age of the internet, distributed data is commonplace: web data comes from all over.  But far rarer are distributed databases

 

Technical aside:  For present purposes I'm using "database" as a shorthand to mean a collection of data organized with a queryable database management system: one you can search, update, delete, etc with database "queries."  And I'm using "distributed database" to refer to databases that can perform queries on data stored at different locations.  Admittedly, it's not always clear what's meant by "stored" when caching is involved.  Suffice it to say that for present purposes anything that networks data through import/export or complete replication of an authoritative relational database doesn't count as distribution.  We're talking about systems in which it's possible to retrieve a datum from a remote system in order to satisfy a live query. 

 

A Google search may give us access to data distributed across the web, but it does not use a database distributed across the web.  Instead it's searching its own local database(s) on its own servers that store information that they've already gathered about websites.  By contrast, when you query a distributed database, it might actually go out all over the internet to produce its reply.  This won't necessarily produce a superior search engine, but it could be tremendously useful for integrating various sorts of business knowledge, from inventory to supply chain tracking to availability of shared equipment to meeting materials.

 

In order to achieve this kind of distribution, we have to organize information differently.  It's not a matter of flipping a switch on all the standard "relational" databases that house the vast majority of our structured data today.  Relational databases are distribution-unfriendly.  They organize information in a way that makes it very hard to split into chunks and store different chunks in different places.  They lack three interwoven qualities of distributed data that I'll call autonomy, atomic equivalence, and relational equivalence.

 

Autonomy refers to information that stands on its own.  Distributing data means putting one chunk of data here, another chunk there.  To do that, obviously, the data must be broken into autonomous chunks.   Relational databases are hard to distribute because a piece of data in such systems has no autonomy – there's no way to refer to a datum except by its context.  In a spreadsheet, you can only identify the contents of A4 as "A4".   By contrast, any piece of data in a distributed system has a unique identifier that is not based on its context.

 

Atomic Equivalence refers to the idea that all these chunks of information, regardless of size, can be considered atoms, or basic building blocks, of the database. Distributed databases usually achieve this through the identification mechanism -- the way you identify data is the same whether it's a  single number (like the amount of candy bars in stock) or table or a dissertation or a project or anything else.   In relational databases, there is no computationally meaningful way to complete this sentence: "cells, rows, fields, columns, tables, and databases are all ____s".  The Atomic equivalence mean you can say "everything is an ______".

 

Relational Equivalence means that the relationships between different bits of information are stored the same way as any other piece of information.  In distributed databases, relationships themselves are atoms.  My father is an atom, my mother is an atom, and their relationship is a third.  In a relational databases, we might represent their marriage as a spouse field in a person table or a marriage cross-reference table or some other way, but in every case the relationship is built into the structure, which is anything but atomic.

 

(Give some examples of distributed data projects, including RDF / semantic web)

 

Cards


Wagn's data are not yet distributed. In fact, Wagn data distribution is still a good ways away. At the time of writing Wagn's latest release is version 1.4; the distributed "WagNet" functionality isn't due out Wagn 3.0.  We haven't committed to porting to any distributed data engine, mapped WQL to RDF, or settled all key design decisions on our data distribution blueprint.

 


So how could Wagn possibly be a window into the future of distributed data interfaces?  The answer is that Wagn data has all the key structural elements of distributed data, and with this distribution-friendly data it has pioneered distribution-friendly interfaces that may well forecast where distributed database interfaces can go.  (Note: the ideas that follow will make a lot more sense if you have a least a cursory look at the Wagn interface on Wagn.org.)

What makes Wagn distribution-friendly?  The fact that it organizes data into cards.

 

Cards are autonomous.  Every card has a unique name that doesn't depend on its context.  This design decision is basically inherited from wikis, which make use of unique names to make it easy to create and link to different pages.  But Wagn uses the term "card" rather than pages because a given webpage often has many different cards on it.

 

Everything is a card – cards have atomic equivalence.  This is a design principle that Wagn has pushed further than any other application we've seen; all user accounts are associated with cards, configuration is done with cards, fields within cards are cards, card types are cards...   All cards have a dedicated page, a revision history, permissions, etc. 

 

Relationships between cards are stored as cards themselves – cards have relationship equivalence.  Wagn relates cards through three primary mechanisms: types, names (plus cards), and references (links and inclusions).  The relationships vary in the degree to which they map to .........


There's nothing particulary impressive about a distribution-friendliness, but there is something impressive about Wagn, in that it has deeply explored how best to interact with this kind of data.

 

 

Inclusion


Much of Wagn's magic comes from "inclusion" (or to be more technical and scary-sounding, "transclusion").  Inclusion just means including one card inside another card.  So if you're creating a "Contact Me" card and you've already made a card called "My Address", you can simply include "My Address" inside "Contact Me."  Then if you update "My Address", the new address will show up automatically on "Contact Me."

Notice how smoothly inclusion translates into a distributed data system.  You can include my address on your system.  I can include yours on mine.  The web actually already does this in some ways, as with images and iframes, which have built-in browser support, and widgets like youtube, which does not.

 

Technical aside: for fellow geeks this claim might trigger a slew of implementation questions about caching and namespaces (which we won't go into here) and permissions (which we will briefly).  If so, you might check out this page to see what we're currently planning and join the discussion: http:/asdfasdsdfsd

 

So Wagn is far from the first application to experiment with inclusion, but it's arguably taken it furthest.  On Wagn inclusions can be collapsed and expanded, edited in place (without page reloads), and displayed in a variety of "views".  Inclusion is used in layouts, formatting, configuration, and relationships.  Inclusion plays a role in almost everything that Wagn does.

 

Wagn's inclusions are powerful for the same reasons they're distribution-friendly: cards are autonomous (thus simple to display them in varied contexts), they have unique identifiers (thus easy to find and include), and everything is a card (so we only have to build one interface).   It is extremely difficult to offer such a uniform interface in a relational database system, precisely because they have no "everything-is-an-X" model.   But almost all distributed databases do, meaning not only could Wagn's inclusion system fit with a distributed database, it could fit almost any distributed database.

 

A key benefit of inclusion is fewer context shifts.  Rather than having to navigate to separate editing pages to edit a piece of data, you edit it where it is, regardless of its context.  So unlike most other web interfaces, it's often very handy to edit one piece of data at a time.  If this feature is handy for today's Wagn users, it's vital for tomorrow's distributed database users.  Autonomous data needs autonomous editing.  When a datum and its context are often stored separately, bulk edits can even get silly.  How practical is it to assemble an edit interface for twenty pieces of contextual data found all over the web in order to edit one line of an address or comment or product description?

 

I would go so far to predict that any evolved graphical interface for a distributed database will feature single-atom interface as a central component, and that this interface will likely bear some resemblance to Wagn's inclusions.  The arrangement and appearance will undoubtedly evolve further, but every tab on Wagn's current card interface is likely to remain relevant; honing them on Wagn will only teach us more.

 

 

Formatting


The most natural graphical interface for interacting with relational databases is the "web form".  Say you have your relational database table "companies" with its list of fields (id, name, address_line_1, address_line_2, etc.).  This is often translated fairly directly on webpages into a form with fields for name, address line 1, etc.  Even fairly direct translations from a database definition to the graphical interface requires some technical know-how.  Over the last decade technical architectures have come a long way in making this translation easier for developers, but it's still certainly a developer's job.

In Wagn, it's often much more sensible to edit a single card than to edit an entire form, thanks to inclusions.  However, there is still a lot of value in the kind of bulk data entry that is allowed through a form.  Wagn has recognized that value and has evolved a way to build forms using patterned inclusions.  A card about a company might include cards for addresses, phone numbers, notes, etc. that correspond with each field in a relational database model, and the resultant web form does little if anything to betray to novice users that the data beneath is organized in any way differently from a standard database table.  This is surely a strength of Wagn; well-configured sites keep all the inclusion syntax out of casual users' way.  Contrast that with the bewildering blast of syntax that greets a Wikipedia newbie upon editing any substantial article.

The key lesson for distributed databases is that we can still do forms in a fairly straightforward way with complex structures built from simpler ones.  And whenever possible the interface and back end should have the same relationships between wholes and parts.  Wagn lets users move fluidly between editing single cards and editing forms precisely because the interface, like the back-end, is built bottom-up with inclusions.  This may ultimately prove a huge benefit of distributed databases, as this bottom-up approach may mean developers spend more time focusing on data structure, and less time focusing on custom layers to translate between database and interface.

 

Any massive infrastructural shift depends on translating old memes into new ones.  Distribution won't mean the death of the web form, but it will mean a make-over.

 

 

Permissions


Like a lot of wiki lovers, I'm a proponent of transparency.  Unlike many wiki lovers, I'm also a big fan of nuanced permissions systems, which I think are critical to building transparency.  I've often used the analogy of a (fictional) diary.  If I can keep you from reading the chapter about Bangkok, I might let you read the rest of my diary.  If not, you can't see any of it.  More nuanced permissions yields more openness.

Similarly, while distributed database systems make all sorts of new sharing and integration possible, they won't necessarily lead to greater transparency unless they have a nuanced permissions system that lets folks see one thing but not another.  And in fact, it's hard to imagine almost any of the distributed data use cases mentioned above ever taking flight without pretty fine-grained control over who sees what.

At the time of writing, Wagn is transitioning from its old card-based permissions system to a new "set-based" system.  The former is a study in how awkwardly old relational permissions models often fit nonrelational data.  The latter is an exciting demonstration of the possibilities for distributed system.

In the old system, every single card has an independent setting for who can read, update, and delete it.  The defaults con be configured in a patterned way, but once set they must be changed one at a time.  Of course, we could improve this within a traditional relational paradigm by adding some sort of bulk update system.  Instead, we're going to do a much richer overhaul.

Over the past several months we've been converting many of Wagns configuration options into a "set/setting" model.  The idea is that any setting (eg layout or captcha) is associated with a set of cards, which can be as general as all cards, as specific as a single card, or somewhere in between.  Settings on more specific sets override more general ones, so that if captchas are on for the set containing all cards, but off for the set containing just one card called "open forum", then captcha is turned off for that card.  We're now working on converting our permissions system to this same set/settings model.  So you can set permissions on as general or specific a set of cards as you like.

Such a model might(?) be possible with a relational model, but it would be a ton of work, because it lacks atomic equivalence.  The natural approach is to set permissions on the level of tables or rows.  To do something like set/settings you'd have to be able to represent permissions separately for entire databases, tables, rows, and cells, and probably for awkward groupings thereof.  In Wagn, it's all cards, so we really just had to add the concept of a "set" of cards, which piggybacked heavily on our existing query system, which was already designed to return lists of cards.

It stands to reason that the same model could be replicated in almost any distributed data system.  And the benefits are enormous - permissions control is extremely nuanced and powerful, permissions administration is dramatically more concise and tractable, and permission interface adheres to the same structural paradigm as any other set/settings configuration that the system might require, making it much easier to get one's head around.

 

 

Queries and Names

 

The lessons Wagn has to offer distributed databases in the realm of queries are, to date, a bit more abstruse.  Wagn does have an excellent, powerful, proven query language (WQL).  WQL is highly useful and promises to map well to RDF, the closest thing we have to a standard query language for non-relational data structures. But WQL does not yet have a graphical interface, and interface is our focus here.

 

So I will save the exploration of WQL's lessons for distributed databases for another time, but suffice it to say (a) that Wagn's success in combining snippets of queries together in generative ways is just beginning to hint at what distributed databases will be able to do, and (b) we very much look forward to creating that graphical interface!

 

Similarly, Wagn's use of human-readable names and relationally significant naming conventions may contribute ideas to distributed databases, but most are not related to interface.  In any case human-readable names are likely to be a point of departure with many distributed systems as is the "plus card" convention Wagn uses to extend human-readable names into creating cards that act like relational fields.

 

However, its worth making the wikiphile's point that it's extremely useful to be able to name what is not yet there.  Wagn actually expands on wiki's capacity to do this by creating naming patterns.  Distributed databases, which will always be particularly susceptible to data gaps, will be well advised to remember that names matter: rows by other names may smell sweeter.

 

 

And So...

 

Wagn, which has no distributed data and an under-polished interface, is now arguably the premier laboratory for distributed database interfaces. Odd and accidental as it all is, Wagn will embrace this role. 

 

Our key focus for Wagn 2.0 is a modular plug-in system.  It will now become a key priority for this system that both the api and the meta-structures for these plug-ins will respect and enhance wagn's distribution-friendliness.

 

Wagn 3.0 is already expected to center around WagNet, an umbrella term for Wagn as a distributed data tool.  As we move in this direction, we will make every effort to respect and advance emerging distributed database standards.

 

But perhaps most importantly, we will look to engage much more deeply with the human network that's already working to make these kinds of information networks a reality.  If distributed back ends have not learned enough from interfaces, then it is certainly likely that interfaces should be learning a lot more from back ends.  We can do so much more if we integrate; that's the whole idea.

 


I haven't read your whole thing yet, but have been thinking in broadly similar directions for a while. Have you read Tim O'Reilly's The State of the Internet Operating System? (and part 2)

 

 

One payoff graf re Wagn:

 

 

The breakthroughs that we need to look forward to may not come from explicitly social applications. In fact, I see "me too" social networking applications from those who have other sources of identity data as a sign that they don't really understand the platform opportunity. Building a social network to rival Facebook or Twitter is far less important to the future of the Internet platform than creating facilities that will allow third-party developers to leverage the social data that companies like Google, Microsoft, Yahoo!, AOL - and phone companies like ATT, Verizon and T-Mobile - have produced through years or even decades of managing user's social data for communications.

 

No doubt i will have more to say/edit as i work my way through your manifesto...

  --John Abbe.....Thu Jul 15 19:38:08 -0700 2010