Website Publishing Specification

Website Publishing Specification

Lance Hendrix


In the course of working to develop a personal web site for the publishing of technical information on the internet, it has become apparent that documentation of how this site should work in order to objectively evaluate potential technical solutions is critical. This document describes a method of organizing and publishing a website from a functional (rather than technical/technology) point of view.

1. Purpose

The purpose of this document is to capture my thoughts on how I would like to organize my web-site and to determine how I want this site to “work”. That is, what, based on my preferred method of creating content, would I consider the optimal way of managing this content within the site. This document therefore, will describe a method of creating and adding content to a website such that management of the site is “optimal”.

This document will describe my requirements, desired functionality, and use cases for the evaluation of technologies required to implement the site, whether those technologies currently exist in whole, in part but requiring integration, or do not currently exist but must be developed.

2. Background

After having decided to finally create a presence on the World Wide Web (WWW/Internet), I began the process of “creating my site”. Having some experience with development and management of web-sites for enterprise organizations, I took a little time to evaluate how I wanted my site to work and what technologies I might want to leverage in creating my site.

I began by first “googling” the terms opensource web templates which lead me to a number of sites that provided Cascading Style Sheets (CSS) and one or two basic HTML pages demonstrating the use of the CSS. All fine and good, as I prefer to refine the design of the wheel for my specific needs, rather than undertaking to reinvent the whole damn thing. The purpose of the exercise is to get my (hopefully) useful information into the hands of others, rather than develop (what I assumed) was yet another content management system.

However, while the CSS and templates were well made and interesting, I quickly realized that the manual editing and management of “straight” HTML pages for building the entire site would require significant duplication of content and significant potential for issues regarding management of links, RSS feeds, cross-references, and especially the addition of content. In my experience, the task of managing and working with the site would quickly require a significantly greater investment of time that the creation of the content for the site. In my opinion, this is unacceptable, as the goal of the site is to capture and make available technical information that is of potential interest to others. That is, while site management is expected to consume some amount of the time required for the overall objective, it should be a small to insignificant percentage of the overall time invested in the site, as the value of the site should be invested in the information on the site (that is the creation of the content) rather than the management of said information.

So, at this point in the endeavor, I set out to evaluate what I could find that currently existed to assist with the management of content on the site. Having some experience with enterprise (read expensive) content management systems and implementation of these systems, I first turned back to Google in order to evaluate the currently available open source content management systems. However, as I expected, these systems, while in various states of development and maturity (I was actually impressed that there are a number of rather solid, that is feature rich and stable, open source content management systems available), none of these systems seemed to meet my needs. That is, what I required, while in concept a system to manage content, in practice is seems to be considerably divergent from mainstream content management systems.

It is at that point that I decided to create this document in order to capture my thoughts on what it is that I need and how I would expect the system to work, so that I can objectively how to move forward with the creation and management of my site. I can (potentially oversimplify) simplify my expectations as the following in order of preference:

  1. An existing system that meets most or all of my needs

  2. A set of existing technologies that meets most or all of my needs, but requiring integration

  3. A set of existing technologies that meet some or most of my needs, with only development of some components and then requiring integration

  4. The need to develop a new system (hopefully leveraging existing code where possible)

3. Site organization

In the process of creating my site, it seems logical to first describe how I would like the site organized in order to optimally present my information to a potentially interested user. As user interfaces and developing systems for user interaction are not my strongest skills, I tend to rely heavily on the work of others and the evaluation of existing systems to leverage what seems (to me) to be good in order to piece a system together that I feel provides the best of those systems into what hopefully results in an overall stronger solution (most assuredly a stronger solution that if I developed the interface from scratch).

As a result, I will not attempt to define a “revolutionary” method or site organization (at least not at this time ), but rather the intention of this section of the document is to describe and define the major sections of the site in order to:

  1. Provide explicit understanding of each component of the site

  2. Ensure focus and clarity of the content within each section of the site

  3. Provide a basis for understanding the challenges of maintaining the sites complete content

  4. Ensure consistency across all elements of the site

More plainly, this section is a documentation of what I seem to have deduced as the best practices for web site organization based on my experiences as both a creator and consumer of content, specifically focused on the internet.

Having done a preliminary search for concise and focused documentation on this topic (you should begin to notice a pattern in this regard, that is I prefer not to reinvent the wheel), I was not able to find anything that explicitly described the best practices for site organization. It may well be assumed to be understood, given the significant number of generally similar sites on the web that there is no need for the documentation of a preferred site organization; however, at a minimum, for my own benefit and to ensure some measure of organization and focus, this section will attempt to do just that.

3.1. Hierarchical Organization

I would also like to point out one obvious items regarding web site organization. Specifically, that web sites are typically (potentially as a result of the nature of the technology) hierarchical in nature. Why this is the case may seem rudimentary, but further consideration may provide insight that this is not necessarily true; however, as human beings, we seem to be hierarchical by nature and most of our organizational structures (at least in their base or initial instantiations) are hierarchical in nature. The determination of whether a hierarchical organization of a website is optimal or not will be left to others or another time, as we will explicitly state that at least the site as described by this document will be presented in a hierarchical nature at its core. That is not to say that there will not be cross-referenced information, but rather that the nature of presentation on the site will at its core be hierarchical in nature. One caveat to this claim is that the information may be classified under more than one hierarchical structure. That is, the site overall will have a basis in one hierarchical structure, but it is also likely that the site will have one or more additional hierarchical structures superimposed (if you will) on-top of the main hierarchical structure. The point being that the site is organized hierarchically.

3.2. Top Level Site Components

The initial structure of this site will be as follows:

  1. Home or landing page (page presented when requesting site like http://lancehendrix.com )

  2. BLOG

  3. Technical Articles

  4. About Me

  5. Contact Me

  6. Links

Additional sections or modifications to the structure will most likely be needed as the site specification matures or as the volume of content on the site increases. It is assumed that site organization (that is information organization) should be regularly reviewed and assessed in order to ensure that users are able to quickly the specific information they are looking for or to ensure that users can efficiently browse and navigate the site to find information that may be of interest to them.

3.3. Page Level Components

It is assumed (and remains to be validated) that each page on the site should have a consistent look and feel. That is, there are a number of standard components that while their specific content or presentation may and/or will likely change from page to page, should nonetheless be consistently applied across the site.

At a high level, each page should contain the following items:

  1. Page header

  2. Site Navigation Functionality

  3. Cross-references

  4. Page footer

More specifically, each page at a minimum should contain:

  1. Link back to the home page

  2. Navigational menu

  3. “Bread crumbs”

Each “piece” of information, which we will call article for simplicity, which could be a blog entry, technical document, or section of information on a page should generally contain the following attributes in addition to the base content of the article:

  1. Title

  2. Date of last revision

  3. Categorical classification

  4. Author

Additionally, any significant article, for the sake or definition/argument let’s say anything that “stands alone” on one page or more, should additionally contain the following attributes:

  1. Abstract

  2. Purpose

  3. Date of initial publication/creation

Additional suggested attributes for “significant articles” would include:

  1. Background

  2. Intended Audience

  3. Table of Contents

4. Top Level Site Components

This section describes each of the top level site components that provide the top level structure for the information available on the site. This section is provided in order to ensure that the first level of organization on the site remains clear, focused, and rational. As the amount of content on the site grows, it is assumed that this structure will be re-evaluated and potentially revised as it make sense to do so.

4.1. Home Page

It is assumed, that the average person, upon either attempting to find my presence, or after being directed to some specific information on my site (most likely from a cross-reference/link from another site or from a search engine), will desire to “start from the beginning” of my site, which is normally considered the home or landing page. That is, this is the location on the site from which all other information can be accessed. It is often called “index.html” as it is generally considered a best practice that upon a user entering a URL (Universal Reference Locator, such as http://lancehendrix.com)that the user is presented with a page of content, rather than a listing of the contents of the location. In layman’s terms, it is the difference between seeing content that describes the site and provides the user a friendly means of navigating and finding content rather than just a list of documents and folders generated by the web server (which is also considered a potential security issue).

4.2. BLOG

This section of the site will contain less structured information from the author. That is, the information in this section is more informal, both in the nature and tone of the content as well as in the overall structure of the content. The author will most likely use this section of the site to introduce new ideas or as yet incomplete concepts. Additionally, the typical nature of a blog allows community comment and as such, the author can leverage this input from the community in order to achieve the following:

  1. Asses the level of value/interest in a concept

  2. Gain additional insights into the concept

  3. Assess the priority of a concept in relation to other concepts/projects under develop

  4. Determine whether to move forward or abandon a concept

  5. Incrementally refine a concept in a less structured/formal venue before formalization

4.3. Technical Articles

This section is envisioned to contain the most significant information on the site and it is further assumed that this section will contain the bulk of the content. *** This may be debated, as the BLOG, buy its unstructured nature may (unintentionally) prove to have a larger volume of information and additionally, as it also (most likely) contains the evolution or history of a concept, may prove of greater interest that even the final “formalization and publication” of the concept. We shall see… ***

Initially, this section will not need further sub-division, as the initial amount of content will be (well, you have to start with one) small. As this section grows, it will required further consideration as to whether this section is formally “sub-divided” or whether this section should be divided and made into several top level components/sections.

4.4. About Me

This section will be used to convey more information regarding me, however, it should be focused in nature and of relevance to the rest of the site. That is, as this is intended to initially be a technically focused site this section should focus on aspects of me and my activities as they relate to technical interests and activities.

4.5. Contact Me

This section will provide the user a means of contacting me in a less public forum (less public that through feedback on a blog or on a technical article, which will be included on the site for general public consumption by anyone using the site). It should provide a means for more direct contact (email, IM, etc.) as well as indirect (comment card, or “local” email) to the user.

It may also provide information regarding publish appearances and schedule so that people interested in direct interaction know when this is a potential.

It must also be noted that care should be taken to consider the amount of information that is made publically available in the interest of personal security and in the interest of security of family and friends. This concern should not be taken too lightly in our current age of spam, identity theft, and “stalking”. It is these concerns that have kept me off of the web in the past; however, as my situation has changed and the prevalence of this type of presence has increased, while the level of concern has not changed, it seems that the issues are manageable and one might even argue that these issues have just become a part of “life with the Internet”.

4.6. Links

The value of this section of the site is still under consideration, as while links are interesting and of value, a section of links, I have found other sites with a links section without context of a blog entry or an article to be of minimal value, especially for technical sites. However, one potential mitigation of this would be through better organization of the links as well as providing better documentation/description as to the nature and value of the link, which is usually lacking on other sites.

5. Site Functionality

While the previous sections were focused on the primary hierarchical organization of the site, this section will discuss cross sectional and overarching functionality required for the site from two perspectives, the perspective of the user and the perspective of the content creator and site manager.

5.1. User Perspective

From the perspective of the user, the site as a basic principle should adhere to the ideal of “no surprises”. That is, the site should operate in a way (at least for now, for better or for worse) that is typical of the vast majority of the web sites on the internet. A couple of requirements along this line would be:

  1. In general, all content that is intended for access by users should be navigable from the home page

  2. The site should enable the user to create bookmarks to specific pages of content

  3. The site should provide navigation from any depth within the site hierarchy to any other level of the site hierarchy within the given hierarchical “path” from the current content to the home page

  4. A page should have a link from the bottom of the page back to the top of the page

It is intended to further elaborate on these items later in this document or with a more formal requirements document.

In addition to general usability and “no surprises”, the site also intends to expose the following cross-cutting functionality from a user perspective:

  1. RSS feed(s)

  2. Summary of new/revised/significant items on the home page

  3. Categorical indexing of information (another hierarchical organization imposed on-top of the main hierarchical system)

  4. Site search

    1. Category based

    2. Title based

    3. Full document search

More narrowly defined user functionality should include the following:

  1. Ability for users to leave comments with regard to specific blog entries

  2. Ability for users to leave comments with regard to specific technical articles

5.2. Content Manager Perspective

The needs of the content creator and site manager are significantly different from those of a user of the site and as such, it is assume the bulk of the functionality provided by the system is focused on assisting the content creator(s) and site manager(s) with providing the site users with the “no surprises” experience and to take the ideal further, consistent site experience. That is, it is intended that this site be rather static in its day to day use (it is a content publishing site much more so that an application, from the site users or site visitors perspective).

This document will also breakdown the content manager’s perspective into two sections, as the activities associated with this component of the system, while initially envisioned to be filled by a single person, naturally fall into two major tasks. The first is the task of creating the content to be published, which will be covered in the second section of this perspective. The second major activity of the content manager’s role is that of publishing the content and ensuring overall site consistency. This will be the focus of the next section of this document.

5.2.1. Publishing

The activity of publishing finished or completed content to the site should require as little interaction from the site manager as is possible. That is, ideally, the site manager should notify the system that content is required to be added to its “catalog” as well as the site manager specifying where the content for publishing is currently located and the system should handle the rest. However, at least in the initial implementation of this system, it is also assumed that this level of automation is not practical and additional design, detailed documentation, and review will be required to determine if this is ever even possible, as the current level of specification only allows what seems to be a very naive and generalized answer of “well, sure it can”.

More information on the activity of publishing can be found in the use case section of this document; however, for the sake of incremental definition and clarification, some initial assumptions and preliminary requirements desired of a system to facilitate publishing will be enumerated here.

  1. The system should accept the article in a standard format - presumed to be XML marked up with a standard, preferably a published standard, such as “Docbook”

  2. The system should expect to apply or validate the document against a standard template for the syte – assume a CSS

  3. The system should expect a categorization of the article against a defined set of categories

    1. The system should add the document to the system such that it can be access via the defined category for that article along with other articles associated with the same category

    2. The system should add the document to the appropriate location within the primary site hierarchy

  4. The system should maintain the list of recent/new articles on the home page

  5. The system should provide needed information to the RSS system for revised/new articles for notification to subscribed users

While the author understands the challenges associated with “negative” requirements, there are also a number of challenges that are currently known with publishing in general, from having worked on sites or with other systems that at least at the present time, are only expressed in terms of what the system should not require the site administrator to do. These items should be carefully considered and revised such that they are no longer expressed in the “negative”.

  1. The site administrator should not have to cut and paste information into the article in order integrate it into the existing site and meet the site standards as generally indicated above

  2. The administrator should not have to “hand code” the “bread crumbs”

  3. The administrator should not have to add a specific link to the document in the primary site hierarchy – the administrator should only have to indicate to the system where in the site hierarchy the content should be located and the system should update whatever links (list(s) of content) are required in order to access the document

  4. The list of content by category should be updated by the system so that the new content can be accessed (searched ?) by category

  5. The administrator should not have to specify or apply styles to the document – the document should already be marked up with either existing styles or with styles based on a published standard that can be customized through changes to the CSS

5.2.2. Content Creation

The activity of content creation should ideally be completely decoupled from the final presentation of the content to the user. That is, the process of creating content should be (again ideally) completely independent of the final publication. Specifically, the tools, process, and constraints of creating the content should in no way be derived, based on, or a dependency, of the fact that this information will be published to the web site (or any web site, or even HTML, or even published at all).

However, it is also clear that this ideal is not possible in reality. Further details on why this ideal is not possible will not be covered at this time, but for the sake of moving forward, this document will make that assumption. As a result of this assumption, we will attempt, based on best practices, to limit the number and scope of items that are required dependencies between the activities of content creation and content publishing.

To this end, the author’s understand of a number of technologies and current or recent (or recent for the author at least) come into play to help assist with the decoupling. However, let us first examine a potential set of the least amount of information conveyed between both activities that required a common understanding in order for the publisher to render the information to a user that minimally reflects the content creator’s intention.

First, let’s state the obvious and most significant (and yet the least coupled) data that must be communicated. That is the content. The raw content that is created by the author are (let’s at least assume) words. These words are expressed as letters and we while it is possible to get more detailed, the value of further technical description of the model provides little value, because, in general, the communication of content is *generally* considered (again, let’s just make the assumption at this time) decoupled.

So, other than raw content, letters, spacing between those letters, and basic punctuation, the remaining information that is generally communicated between content creator(s) and publisher(s) can in general be considered meta-data. That is information that is not part of the core narrative or ideal of the information, but information that either aids in the organization, categorization, or presentation of the information that has been created.

For instance, one piece of such meta-data would be paragraph delimiters. In most word processing systems (traditionally, or if you want to get esoteric, consider a traditional typewriter) paragraphs are delimited by a carriage return in the first column of a row of text. While this definition relies on a number of undefined concepts at this point (rows, column, characters, carriage return, et. al), it should be relatively straightforward to understand that while the construct of a carriage return aids in the comprehension of information, at its root, it is simply method of organizing information for presentation.

As such, the following list attempts to provide the minimal list of information that is useful to communicate from the content creator to the publisher, along with specific requirements as dictated by how we are organizing information, while still attempting to limit the number and scope of these items in order to limit the coupling (dependencies) between these two activities:

  1. Potentially “hidden” meta-data that is not (in general) critical to the idea/ideal presented

    1. Author

    2. Categorization/Keywords

  2. Meta-data identifying unique or special information

    1. Title

    2. Purpose

    3. Abstract

  3. Meta-data that can potentially be derived by the publisher

    1. Date of creation (publication)

    2. Date of last revision (last published change)

    3. Table of contents

    4. Index

  4. Meta-data regarding presentation and organization

    1. Hierarchical designations

      1. Heading/Section designation and hierarchy (heading level 1, heading level 2, etc.)

      2. Book, Section, Chapter

      3. Etc.

    2. Lists and list items

    3. Emphasized items

    4. Italicized items (cites, references)

    5. Paragraph delimiters

    6. External items to be rendered with the document (external images)

    7. Links to external information (hyperlinks, as an example)

  5. Additional items that I am unsure of at present

    1. Structured items (tables, for instance ???)

    2. Subscript

    3. Superscript

This list represents the minimal items that seem to be required for my specific publishing purposes; however, the final solution could offer additional capabilities, but must provide at least these required capabilities.


After having written this seciton and then also doing some additional research, it seems apparent to me to leverage the Docbook standard as the standard format for creating content for my site. I have also begun working with a number of editors for Docbook content and while still not optimal, either a WYSISYG or XML editor should do the trick. Additionally, the Docbook format seems well support with regard to transformation into final form for various publishing needs.

6. Use Cases

This seciton covers a number of use cases that should provide a more narrative description of the requirements and functionality expected for creation and management of content on my site.

7. Solutions Explored

This section describes a number of solutions that were explored in order to meet the requirements as set forth in the previous secitons of this document.


My decision to use or not to use any specific solution should not be used to infer the quality of the solution, but rather, my decisions were based on how well the specific solution/product met my specific needs. For instance, I think that Lenya is a pretty good CMS; however, it does quite meet my requirements as stated above.

7.1. Content Management Systems (CMS)

It would seem obvious that leveraging a typical content management system should provide the required functionality based on the described requirements however, in reviewing the state of current content management systems, this does not, at least to the level of review done by the author, appear to be as obvious as one would hope.

Specifically, as is generally expected and encountered, the broader the scope of application functionality provided by a system, the more complex the systems becomes from the perspective of any single users application. It is also true that in order for a solution to be viable within a market segment, it must cover a generally broader set of requirements than would typically be useful to any specific application of the product, that is just the nature of packaged products, as opposed to custom application development; however, it is also possible that the broader set of requirements and thus functionality provided by a given solution can serve to limit its appeal for any specifically defined application. Thus it seems to be with CMS and my specific needs.

Specifically, there are a number of items that add significant complexity to CMS solutions (as expected and required by the broader set of content management needs of an organization) that do not seem to be required by a simple web site publishing solution such as described for my needs. That is not to say that my needs cannot be met by a CMS solution, just that it would seem that there would be less complex solutions that could be implemented that would have a lower learning curve. As a matter of admission, I may come back to this solution, even with the steep learning curve, as these solutions may end up being easier to manage and maintain over the longer term than other solutions that I am currently contemplating.

A number of items that are generally part of a CMS solution that I don't think are needed for my site (at least at this time) are the following:

  1. Workflow

    Since there is only a single author and even with a limited number of people, the overhead of a workflow component does not appear necessary. This does not seem to add too much complexity, but adds a touch.

  2. Ability to author document outside of the CMS and easily publish them online

    Most of the solutions I looked at seemed to prefer online editing of content. I would prefer to work outside of a browser, preferrably in a word processor of some sort and then publish my articles by allowing the system to apply rendering specifics.

  3. Ability to easily apply CSS to change appearance of the site

    Having decided to leverage a number of CSS document available on the web (some of the open source web template sites), I am quite taken by the idea of using them to apply to the XML documents (Docbook) that I am creating. Again, the application of custom style sheets within a CMS solution seems to be possible, but it also seems these systems add significant complexity to the task of using custom CSS. This may only be due to a lack of knowledge on my part, but this is probably the single most significant barrier to my attempted use of these systems, when coupled with item two above.

During my assessment of CMS as potential solutions, I reviewed (without significant depth, I must add) the following systems:

  1. Drupal

  2. Joomla!

  3. Apache Lenya

7.2. Apache Cocoon

The apache foundation has created an interesting framework for content publishing and this may become a component of the final solution that I use to build my site; however, it should also be noted, similarly to the CMS solutions above, that there is significant functionality provided by Cocoon that is not needed that brings along significant complexity to the small site maintainer and content creator.

Cocoon is an interesting framework and I would encourage anyone interested to visit the Apache Cocoon website.


Cocoon did become a component of the solution I am using to build and publish my website, by the fact that I am useing Apache Forrest, which is built on top of Cocoon.

7.3. Apache Forrest

I have settled on Apache Forrest as the solution I am using for publishing of the content that I am creating. While there are some changes that have to be made to Forrest in order be able to consume and publish DocBook xml documents, the changes are minor and the support provided by the DocBook comunity is quite significant.

As a note, it should be understood that Apache Forrest is built on top of Apache Cocoon and therefore, Cocoon has become a part of the solution I am using. For more information, please see the Apache Forrest website.