[转] – Manifest and spine management in C#

16 Feb 2010

Manifest and spine management in C#

This post presents the requirements for C# classes that manage the manifest and the spine – both of which are elements in an epub package. The manifest identifies all of the files that are part of a publication while the spine specifies the linear reading order of its content documents.

A reading system needs only to read and parse these elements; it doesn’t modify them in any way, with one exception. However, an online wysiwyg epub editor needs the ability to insert, remove, and rearrange files in both the manifest and the spine.

Manifest Items
The <manifest> element of an epub <package> contains <item> elements, one item for each file that is referenced from anywhere in the publication. A manifest item has the attributes shown in Table 1.

Attribute Name

Attribute Description

id

Mandatory, unique identifier of the file within the manifest.

href

Mandatory, URI of the file for this item.

media-type

Mandatory, MIME media-type for this item.

fallback

id of the manifest item to which a reading system should fall back if it is unable to process the namespace of the current item. Mandatory when the current document is an Out-Of-Line XML Island.

fallback-style

id of the manifest item which holds a CSS stylesheet using which the contents of the current item may be rendered.

required-namespace

When the current document is an Out-Of-Line XML Island, this attribute must be present and it should be set to the namespace of the document.

required-modules

A comma-separated list of Extended Modules, which might belong to XHTML or to the namespace of an Out-Of-Line XML Island. This list of modules helps the reading system decide whether it has the capabilities to process the current item.

Table 1. manifest item attributes

In the context of a C# class designed to read and write manifest items, these attributes are simply strings to be accessed through the methods and properties of the class.

Attribute Handling Methods
Extracting the attributes of an XML node is a common activity in epub code. The most succinct code to access an attribute value is:

XmlNode targetAttribute = node.Attributes.GetNamedItem(attributeName);

However, many attributes are optional, and the variable targetAttribute will be set to null if the attribute is not present. Therefore, I prefer to wrap this statement up with some defensive programming which checks for a missing attribute and also distinguishes the case where the attribute is present but is set to an empty string. I use an overloaded TryGetAttribute method which offers a few ways of handling these situations. One example of the method is shown below.

 

public static bool TryGetAttribute(XmlNode node
   ,string attributeName
   ,out string attributeValue) {

  // initialise the results
  bool result = false;
  attributeValue = string.Empty;

  // try to get the named attribute
  XmlNode targetAttribute = node.Attributes.GetNamedItem(attributeName);

  // if the attribute was found
  if (targetAttribute != null) {
    // extract the value and set the result to true
    attributeValue = targetAttribute.InnerText;
    result = true;
  }
return result;
}//TryGetAttribute


The converse of reading a potentially missing attribute occurs when we want to set the value of an attribute that may or may not be present in the target XmlNode. Again, this happens often enough to make it worth creating a method to handle it. I call this SetOrAddAttribute and a listing is shown below.

 

public static void SetOrAddAttribute(XmlNode node,
   string attributeName, string attributeValue){
  // try to get the attribute
  XmlAttribute targetAttribute = TryGetAttribute(node, attributeName);
  // if the attribute is not present in the given node
  if (targetAttribute == null){
    // create and add an empty attribute
    targetAttribute = node.OwnerDocument.CreateAttribute(attributeName);
    node.Attributes.Append(targetAttribute);
  }
// set the attribute value
  targetAttribute.InnerText = attributeValue;
}


The manifestitem class
With attribute handling in place, it’s straightforward to create a manifestitem class in C#. The constructor is given a reference to an XmlNode which it stores in a private variable:

 

private XmlNode _node;

public manifestitem(XmlNode node){
  _node = node;
}


Each attribute of the manifest item is then provided with a property which can be used to get and set the attribute value. For example, look at the following snippet which handles the href attribute

 

public string href {
  get {
    string _href;
    utilities.TryGetAttribute(_node,”href”, out _href);
    return _href;
  }
  set {
    utilities.SetOrAddAttribute(_node, “href”, value);
  }
}


The get method returns the attribute value, if it is present, or an empty string. The set method replaces the value of any existing href attribute or adds an href attribute with the given value if the attribute is not present in the item node.

This pattern is repeated for each attribute.

The manifest class
A C# class to handle an epub’s manifest is concerned with the manifest’s <item> elements. It needs to find them, add them, and remove them. To that end the methods in Table 2. make up the manifest class which is part of the project to develop an online wysiwyg epub editor.

Method

Description

manifest(XmlDocument package)

Constructor which receives the epub package as an XmlDocument.

ManifestNode()

A method which returns the <manifest> as an XmlNode.

ManifestItems()

Method returning the manifest item elements as an XmlNodeList.

Add(manifestitem item)

Method to add the given instance of a manifestitem to the manifest.

Add(string id, string href, string media_type)

Method to add an item to the manifest, assigning it the given mandatory values for id, href, and media-type.

Remove(string id, string packagePath)

Remove the item with the given id from the manifest. Also, delete the file from the file system using the physical path in the packagePath argument.

GetManifestItemById(string id)

Return the item element with the given id as a manifestitem instance.

CreateManifestItem()

Create a new manifestitem instance which can be adorned with attribute values and inserted in the manifest using the Add method.

Table 2. properties and methods of the manifest class

Note that the node order in the manifest is not important, unlike in the spine. Therefore, the Add methods simply append new items at the end of the manifest.

Spine
In some ways the <spine> is easier to handle than the <manifest>; there are fewer attributes to work with; but it does have a few complications. Firstly, the <spine> element includes the toc attribute which holds the id of the manifest item that holds the NCX document for the publication. That attribute has to be accessible so the reading software can find and open the NCX.

Secondly, the spine provides the reading system with the linear reading order of the content documents. Therefore, the order of the nodes in the spine is important.

Spine nodes are called <itemref> because they refer to items in the manifest; the idref attribute of each itemref element is the id of a manifest <item>. Each item id must only appear once in the spine.

The only other attribute that the Open Packaging Format schema allows is the linear attribute. This distinguishes primary content documents (value=”yes”) from auxiliary content  documents (value=”no”). “yes” is the default, so this attribute can be omitted.

Useful Enumerations
Before presenting the spine class, it’s worth introducing two enumerations that support the code. The first of these describes the position where a new spine itemref should be inserted. The InsertPosition enumeration is shown below.

 

public enum InsertPosition {
   after
   ,before
   ,bottom
   ,top
}


This provides options to insert a new itemref at the top or bottom of the reading order, or to insert it before or after a given other itemref node.

The second enumeration allows the code to specify the value of the linear attribute without passing a string. The Linear enumeration is show below.

 

public enum Linear {
   yes
   ,no
}


The spine class
A C# class to provide basic handling for the <spine> element could have the methods and properties shown in Table 3.

Method

Description

spine(XmlDocument package)

Constructor which receives the epub package as an XmlDocument.

tocId

Return the id of the NCX manifest item from the toc attribute of the spine element.

itemrefs

Method returning the itemref elements as an XmlNodeList.

Add(string idref, InsertPosition ip, string refNodeId, Linear linear)

Add an itemref instance to the spine. The new itemref will have the given idref and linear values, and the position will be determined by the InsertPosition value relative to the itemref element which has the idref value in the refNodeId argument.

Remove(string id)

Remove the itemref with the given id from the spine.

Table 3. properties and methods of the spine class

Earlier I mentioned that with one exception a reading system does not modify the manifest or the spine. The Open Package Format says that any part of the publication that can be referenced during processing of an epub must be included in the spine. However, if the reading system encounters content that is not present in the spine:

the Reading System should add it to the spine (the placement at the discretion of the Reading System) and assign a value of ‘no’ to the linear attribute.

So, a reading system can add itemrefs to the spine. I interpret this to mean that the in-memory representation of the spine is modified and not the package file in the file system nor the compressed version of the package held in the .epub file. Please contradict me if you know this to be false.