Why the web is sub par for building apps
Many Web developers at one time or another reflected on the inefficiency of the way they are forced to construct interfaces on the web. Most people don't spend lots of time on this thought because there's not really much to be done about it. It's the way of the web, and it buys a lot of portability at the expense of some network overhead. Put another way, what do web developers do to increase performance of their sites?
The common answer is "decrease download time", which is directly related to available bandwidth at the client. Client bandwidth is that most scarce and uncontrollable of resources. This relationship can be represented as:
If we let t be web site download time,
let S be the total size (in Kbytes) of a site,
let i be the initial time required to respond to an HTTP request,
and let r be connection speed, measured in Kbytes/second.
now we can say that:
t = | S | + i |
r |
In essence, "lighter is faster". This is an oversimplification that neglects factors such as network latency, number of unique requests, and protocol version, but it's still useful for our discussion as many of these other factors can be given as components of i and ignored. The relationship between t and the success of a site is debatable, but a sufficiently high t is often detrimental to the success of a web application. Since the only component of t that the designer has any control over is S, it makes sense to attempt to optimize away S first.
No matter what other optimizations a site employs (ubiquitous caching, fast databases, etc...), decreasing the overall size (in bytes) of a site will increase it's responsiveness and speed. Factors such as page complexity and rendering speed of various browsers will affect this, but all things being equal, data transfer dwarfs these factors. Interface developers will attest that perceived speed equates to real speed in the mind of the user, and so loading and inter-page delay is of prime concern if the user is not to get horrendously bored and go read The Onion instead of experiencing ones web application masterpiece. This all boils down to knowing that page size (bandwidth) is inversely proportional to load time, which is likely to be related by some exponential function to the user's propensity to leave. And that's a Bad ThingTM.
Thinking a bit about the page-by-page nature of the web in the context of applications, one can easily start to see how the need to re-generate an entire interface every time someone clicks on a link or wants fresh data can be incredibly wasteful of that most precious commodity: bandwidth (that translates directly into response time). It's not hard to see how the overhead of having to send an entire set of formatting along with the requested data can put a huge burden on both servers and network connections, not to mention the patience of the average user. This is what I'm going to call the "round trip problem"; a term I'm borrowing from Joel Spolsky.
The round trip problem dictates that in order to provide a useful interface for an end user, it's necessary to do as much in a single page view as possible, and to limit the number of page refreshes. Basically, what we're saying here is that from an interface standpoint, the single most "expensive" thing you can do is refresh the page. You loose both the attention of the end user, but you are also hindered in your ability to provide feedback to the user because the user's interface is unavailable for comment until such time as the full round trip (client-server-client) is complete. The good news is that there are effective (enough) ways of addressing these shortcomings. CSS provides a language that allows developers to separate data from formatting (to an extent), and web developers have become adept at teaching browsers how to cache frequently used items like images and scripts. But the central problem remains: it's not possible to provide an interaction-rich environment to the end user with acceptable performance characteristics if it's necessary to ungracefully jerk the user away from the current interface and then just as equally ungracefully insert him/her into another one.
By now, I'm quite sure that some of you are sitting there thinking "but for a lot of data the page paradigm makes sense! why are you maligning it so?" Lets be clear about this: I have nothing against the page paradigm, and in fact it works wonders for static data and presenting data that lends itself easily and naturally to these representations. I'm not addressing these data sets when I talk about the round trip problem; I'm referring specifically to the problems of interface design for web applications.
The evolving web provided developers first with static, useful, page-based information. And it was a great thing. Gopher could hardly compete with the rich linking abilities of HTML, and as the formats evolved, the web has become an ever more powerful tool for presenting information. But evolution is rarely confined, and it wasn't long before encryption, state-keeping mechanisms (cookies), and other pseudo-application tools became available and were exploited by web developers. Online tools such as web mail and online banking have solidified the webs usefulness in providing applications as well as data to users. It is for applications such as these that the round trip problem is most acute. If you're still unclear on the difference between web pages and web applications, odds are pretty high that you don't need what netWindows can do (aside from the "ooh! ahh!" factor), but read on anyway, it might help somewhere down the road.
The problem in (more) detail
If one puts on ones web application developer hats for a minute and go back to thinking about the round trip problem, it might be useful to think about situations where developers actually incur the penalty of a round trip. Even for information presentation. Seriously, how often does all of the navigation information and layout change? Heck, how often does any of that information actually change between page views? Be honest. Don't think about what gets sent to the client, but think about the actual structure of the information in question. How often does that honestly change?
Does that answer suprise you?.
Next question. How much of that data is present for each page view? How much formatting information is sent with that purely structural data? How much CPU and bandwidth are you chewing on the server side to make all of that possible?
Next Question. What would you do with that CPU and bandwidth if you didn't have to send formatting with every piece of data or re-send all of that data every time you changed a single interface element or data set in your application interface?
And what of all that modular code you've got on the back-end of any moderately complex web application? What is the net effect of having to coalesce all manner of components and modules together to generate a single page view? One can easily see that understanding any one of these components isn't necessarily difficult. Understanding how they all fit together, well...that's another story...
What we're starting to see is that's it's not the round trips that are (necessarily) killing performance, it's the repetition. It's all of that data that needs to be re-sent endlessly, just so an interface can give the user something to click on so that the server can change a bit of data (that will need to be re-sent all over again). Granted, this is a gross over generalization of the situation, but more often than not, it describes by degrees a given web application or it's components.
It's likely that every DHTML coder and his brother is in a tizzy at this point. The argument goes something like this: "ahh! but with DHTML I can create interfaces that allow for interaction! they can give the user feedback! they can even perform useful application functions!" And they're right. But how does that address the round trip problem? I mean, with (normal) DHTML aren't you just inventing a bunch of new stuff that you have to re-send from the server every time you load the page? For most interface interaction, DHTML doesn't necessarily provide anything that can't be gotten less expensively from judicious use of that most hated of HTML elements, the FRAMESET. And what happens when application state changes? Does anyone honestly think that the kludge of sending new configuration information to the client for each widget every time state changes is an elegant solution? Or does one just omit that kind of sophistication?
Just like the argument about static interfaces, it can be argued that it's not the DHTML that's slow, it's downloading the DHTML that's so painful.
Interface by network protocol event sucks. There's gotta be a better way.
Some bad news
But it could be a lot worse.
Non DOM browsers need not apply for the following approach. The netWindows framework currently supports IE 5.0, 5.5, and 6.0 for the PC, Mozilla (NN6, NN6.1, NN6.2, Galeon, K-meleon, etc...), and IE 5.1 for the Mac. Partial support for IE 5.0 for the Mac is in the works, but miscellaneous issues remain. These browsers make up better than 75% of the current browser market, according to most statistics available as of this writing. The penetration numbers for DOM compatible browsers are likely to be close to 100% in corporate environments where homogeneous desktops and quarterly (or continuous) upgrades leave most users with generation-old technology (but rarely much older).
It should also be restated that this paper does not describe a panacea for web application interface problems. This approach does not degrade gracefully to platforms that do not (or cannot) support it, and so for many applications it may be a "handy" interface option for those clients that can support it, but a plan B may always be necessary. Honestly, if you're using netWindows for simple informational sites, there's probably a problem. It's a powerful approach for difficult problems, but it may not necessarily be the best for your application. Consider netWindows as a layer on top of your content that can give it new presentation, but can never replace the shape or meaning of your content. If you're not saying anything meaningful, all the prettiness and sophistication in the world isn't going to help (despite MTV's apparent insistence to the contrary).
Some good news
But not as much as one might hope for.
The good news is that it's possible with DOM-based DHTML to help remove portions of the round-trip problem. We're not talking about some magical utility that will let developers move data over the ether at unimaginable speeds with absolutely no latency. Instead, we're talking about increasing perceived performance. This is acceptable, as we've already established that perceived performance is what matters to the user.
What netWindows proposes is forcing the client to work harder in order to minimize bandwidth usage (or at least even it out over time) and increase interface responsiveness. Moore's Law implies that this is a pretty good gamble, because as inefficient as building dynamic interfaces can be, relying on speedier last mile connections isn't a timely solution to our problem, as broadband media suppliers will attest. One is not tied to the netWindows framework to implement this approach, but most of the pieces are built for already if one uses it, and so for purposes of explanation we'll refer to NW and the DHTML round-trip shortcut interchangeably, as the first is merely a familiar (and documented) implementation of the second.
netWindows builds a client-side infrastructure for building interface components and communicating with servers in a way that's much more asynchronous than the page-by-page model currently employed by most sites. Request data, get data. Not formatting. Sample applications shipped with the framework demonstrate the use of the client as a data repository and application platform, allowing web applications to move the heavy lifting of manipulating and creating interface elements to the client. The server interaction model in use by netWindows promotes abstraction of formatting (widgets) from data (data blocks). They work in union, but are delivered separately from each other, one capable of forcing the environment to instantiate the other at will. To boil it down, it's an environment that lets interface developers communicate with both users and servers in ways that are more natural to both. In ways that are driven by data and application events, not protocol events.
Moving the front end
All of the sample applications shipped with netWindows have one thing in common: they defer much of the work of building an interface to the NW framework. Instead of managing state, figuring out interface status, and any number of other things, backends for these applications simply receive and act upon data requests. They provide JavaScript applications that manage state for interface events on the client side in order to provide instantaneous response to events. These applications need not be complex, as they have no need to attempt state management across page views or in some way fake state across a stateless protocol. Given this "correct" separation of interface state management and application logic, it's not hard to see why solving the round trip problem by avoiding it is such an attractive proposition.
It is not possible to entirely sidestep the stateless nature of the HTTP protocol, but netWindows lets developers create and update interfaces independently of protocol events, which has been established to be slow and cumbersome from an interface feedback perspective.
The web mail sample application that is provided in the netWindows distribution provides an example of how batching of data requests combined with instant interface updates give users interface usability that most web applications just can't touch. What makes it all work together well is offloading of much of the display-layer logic to the client. This enforces an interesting side effect of keeping state on the client: your server pages only do one thing at a time. In contrast to current web application development which takes many components and packs them together at the last minute to generate a page, NW subtly encourages a "one page, one function" design ethic that often leverages modular programming techniques already in use on the server side.
User Experience Improvements: the content loader
The next usability improving feature that NW provides is the content loader. The content loader is a "hidden" inline frame around which a queue mechanism is created to manage server interaction. It's this mechanism that handles all of the seemingly out-of-band server communication for a netWindows environment. Others have called this technique various things (web-RPC, event dispatching, an HTTP "buffer", remote scripting, etc...), and there's not really any consensus right now about what to call it.
show (or hide) netWindows loader example
The content loader is managed by the netWindows core and provides the preferred mechanism for round-trip server communication in netWindows. It's possible to both request an arbitrary URL to be loaded into an arbitrary DOM node (or null). The simplest way to use the content loader is to provide a reference to a target node, a simple example follows.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>content loading example</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <!-- include the netWindows core --> <script language="JavaScript1.2" type="text/javascript" src="winScripts/netWindows.js"></script> <!-- load a url into the element with ID "foo" --> <script language="JavaScript1.2" type="text/javascript"> function loadFoo(){ //get a reference to the container node var fooRef = document.getElementById("foo"); //load the url foo.html into the container __env__.addPageToQueue("foo.html", fooRef, false); } //set function pointer loadFoo to execute when the page loads: __env__.addEvtFP(loadFoo, 'onload'); </script> <style type="text/css" media="screen"> .container { width:300px; height:300px; background-color: #cecece; border: 1px solid black; overflow: auto; } </style> </head> <body> <!-- a container for the url to be loaded --> <div id="foo" class="container"></div> </body> </html>
And a listing for foo.html, which includes a single file (absolute URL is fine for this, as it doesn't need to execute from any particular domain). This include allows the file to "tell" the NW environment that it has finished loading, and so the onLoad handler is necessary.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>a content page for the loading example</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <!-- include the callback script --> <script language="JavaScript1.2" type="text/javascript" src="winScripts/load_page_callback.js"></script> </head> <body onLoad="load_page();"> <!-- content of page... --> </body> </html>
These two files together outline the skeleton use of the content loading system. Our content loader provides some benefits beyond what an iframe can offer. Firstly, it's not as resource intensive. A single iframe vs. as many iframes as re-loadable content sections seems to provide an interesting argument for our system. Secondly, one can z-index the the HTML elements one is loading content into, whereas that's not possible with an iframe (in a cross browser fashion). We can also use the content loader in many situations where the thought of loading an entire iframe might be over and above the call of duty.
While loading arbitrary content into arbitrary locations on page is nice, it's big interface win is in allowing developers request data from the server without having to refresh the entire page. Think about it: even for a "normal" DHTML interface, allowing the server to send only what you need to update in an interface allows developers to create long-lived, stateful interfaces using common DHTML techniques. Moving a step beyond that, one can begin to create client-side data repositories in JavaScript that can mediate these types of transactions and implement a subscribe-notify data publishing model.
User Experience Improvements: widgets
netWindows builds everything in it's environment on classes that define components and widgets. Components are the building blocks for widgets, which in turn are templates for presentation of data or interface elements. Widget definitions are loaded when the environment is created (the page is loaded), but are external JS files that are cached across pages (incurring their expense only once). netWindows keeps track of components and widgets, allowing developers to create arbitrary numbers of each, referencing them without worry about identifying them when an environment is generated. The framework handles all of that.
This seemingly trivial independence of components from foreknowledge of their existence by the page writer buys developers some very important gains in usability. First, it's possible to create interface elements as they are needed, without worry for how many have been created before or how many were allocated with the page was generated and without need to refresh the entire page simply to add another interface element. Secondly, it allows developers to build long-lived pages that can host application interfaces with arbitrary degrees of richness. After all, one creates and remove the rich interface elements as one goes, right? A rich set of user interactions becomes possible when one can instantaneously provide feedback to the end user for their actions, many times differing the "real" application interaction to behind-the-scenes utilities and scripts.
Benefits of using widgets become evident when talking about formatting and sending data to and from a server. Oftentimes, a widget is simply a container for data. Whereas in the past it was necessary to send formatting instructions (table layouts, CSS classes, etc...) along with data, netWindows provides another layer of abstraction on top of this, allowing developers to just send data without any formatting, other than the structure of the data. Widgets may be populated with data independently of their creation. Additionally, it's possible to change the layout and look of any of these pieces of data in response to interface events via the DOM and/or widget specific methods.
Since widgets are created and deleted dynamically from what amount to "templates" for each widget, the netWindows team has called this "truly dynamic DHTML", which sounds a bit pretentious, but they haven't got a better term right now. The bandwidth savings of this client-side template approach can be substantial, and the response times for data-intensive widgets can improve substantially over "traditional" DHTML widgets by allowing them to be created and then populated with data (independently), or by allowing them to "batch" data requests via the content loader. It should be noted that bandwidth requirements for pages built using the current page-by-page paradigm will not be significantly, and current techniques such as script caching will play an equivalent role in these situations as they do now. Only in longer-lived interfaces will the bandwidth savings proposed be realized.
Using an object-oriented framework, one defines the behvaior and appearance of a widget once, and it propagates to every instance of that widget, meaning that one can upgrade and change the look and feel of widgets without changing any code generated by the server. We have also shown that the content loader also promotes modular programming, a practice that can further separate content from application form.
In closing
As web application complexity increases and more data-intensive web applications arise, techniques such as the ones netWindows employs will come to play a bigger role in the arsenal of web developers. Shortcutting the round-trip problem by building interface components on the client side and deferring much of the data-transfer until it's needed will provide interface developers a better way to address the shortcomings of HTML. If you have questions about this article or would like to contact the author, Alex Russell may be reached by email.