The keyword geek: CSS

    If you are a follower of the OxfordWords Blog, you may have seen the launch of the OxfordWords Text Analyser, coinciding with the 200th anniversary of Charles Dickens. Here follows a technical description of the feature, what it does and how it works.
    The challenge was to show the logophile visitors to OxfordWords some of the computational linguistic techniques used in the preparation of the Oxford English Corpus, and thus give some insight into the preparation of a modern dictionary.
Finding ourselves unable to share the corpus itself with the public, we settled on the idea of delivering a much smaller text with similar analytical techniques applied to it to those used by our corpus analysis software. Since we already publish a huge range of classic texts in the Oxford World's Classics range it made the most sense to use those as our sources and provide collocate and frequency analysis as well as example sentences for each text. This presented a data problem: to incorporate all this in a single web page would mean adding several megabytes of data to the page, resulting in an unsustainable page load time.
The obvious solution was to create a lightweight page containing just the Javascript display code, and deliver all the data on an as-needed basis via AJAX calls. There was a time when writing this to work reliably on all browsers would have been the bane of a programmer's life, but fortunately we now have the jQuery library to abstract such nasty jobs from the programmer, so taking that route was a no-brainer decision.
    AJAX having been decided upon, the next decision related to the server-side component. I wrote a piece about this last summer, about how experience of database driven back-ends for language analysis had led me to precomputing data as json files rather than querying a database. Disk space is fast and cheap, server processing power isn't. So my next task was to write a set of command-line PHP scripts that generated a tree of JSON files for each word in the source text, containing collocates, frequencies and example sentences.
    While these tasks are essentially simple ones, they are quite computationally intensive. In a typical Dickens novel for instance, there will be as many as 20000 unique words, and every one of those words needs to be searched for across the whole text and the frequency of its colloscates in all the locations it appears in computed. The whole process takes about six hours, and produces a roughly 20Mb tree of thousands of tiny JSON files which are then uploaded to the web server.
    So that's the surprisingly low-tech backend for the feature, how about the front end?
    The core functionality of jQuery makes coding an application like this very simple. The three main parts of the feature live in hidden DIVs which are shuffled using the jQuery show() and hide() functions. TheJSON data is pulled in using jQuery's getJSON() function. Collocates are shown in a word cloud courtesy of the excellent jQCloud plugin, example sentences are simply loaded into an unordered list, and frequency graphs are created using Google's Image Chart API.
    These plugins and code made a working feature. But to make a single-page jQuery application like this one feel like a proper application, there was one further component required. Users expect to be able to use the back button, and to be able to return to a particular part of the application by URL alone. Miss Havisham's dress needs to be readily conjoured up by linking directly to the word "bridal".
     We thus used the jQuery-BBQ plugin to provide URL and history functionality through the use of in-page anchors. Because the page can't be reloaded, the plugin appends the word to the end of the URL after a # symbol as it triggers any changes to what is displayed.
    In summary, the use of precomputed JSON files made the server-side compontnet of this feature very simple at the expense of using more filesystem space, and the use of jQuery and its plugins made client-side development a lot faster and more reliable across browsers. Sometimes libraries like jQuery are used for the sake of it when simple Javascript would have sufficed, but in this case I believe its use made for a far better application.

    Writing an HTML placement for third party sites used to be so easy. Back in the bad old days of tables and frames, you simply created a little table with all those nasty width, height, cellpadding and cellspacing attributes and called it good. You knew you had a pretty good chance of it working as you intended it to on pretty much any site it would be placed on.
    CSS has changed all that. Our HTML is much cleaner and easier to understand because all the styling and layout has moved into the style sheet, but placing outside code into a CSS page is now fraught with danger. The Cascading bit of Cascading Style Sheets means that the developer has to be extremely careful to ensure that no stone is left unturned and no opening has been created for a global style declaration to affect the placement styling. All of which leads to excessive amounts of inline styling that rather negates the point of CSS to create cleaner code.
The placement below is an example. Created as a search box for the Oxford Dictionaries Online site, it should work with all modern browsers and degrade gracefully with older versions. But as all developers know, you can test something on everything you have and there will still be a platform out there that will trip you up.
    And true to form, on first load something's changed the formatting. Watch this space, CSS tweaking at work... Carriage returns replaced with <br> tags by the WYSIWYG editor. Lose the carriage returns and there it is.
    Oh well, that's Blogger ticked off the list of platforms it's been tested on.

The keyword geek

Wednesday, 15 February 2012

An AJAX and jQuery driven web feature, the OxfordWords Text Analyser

Monday, 29 November 2010

Testing an HTML placement for third party sites