mgaip

Last Published: 2011-07-26 | Version: 1.2-SNAPSHOT
mgaip at SF.net | Apache Maven | GoogleAds

Plugin
  • Introduction
  • Goals
  • Usage
  • Logo and Name
  • Rights of use
  • Examples
  • FAQs
  • Issues
Project Documentation
  • Project Information
  • Project Reports
AngocA Creative Commons Attribution 3.0 Unported License Get Maven Google AdSense inserter plugin at SourceForge.net. Fast, secure and Free Open Source software downloads ohloh mgaip

DOCTYPE

HTML files declare the DOCTYPE in the first line, and they indicate that the DTD is remote in w3c.org page. However, each time a file is processed, the parser will try to retrieve this file, and this takes a lot of time.

To deal with this issue it is necessary to define an Entity Resolver and one possible way is to use the EntityResolver from Apache. Then, it is necessary to have a CatalogManager.properties file that indicates where is the catalog. All of this is heavy but it prevents to search for the DTD in the internet if they are already locally (it should be a copy of the xhtml-transitional.dtd and the three .ent) but at the same time there is a range of different DTD, so I decided to ignore the DOCTYPE, and I create an implementation of the Interface EntityResolver that just ignore them.

This is simple, it does not require other files, and can work with different DOCTYPES. At least, I do not need to know if the document is well formed, all that I need is to replace an HTML tag with another.

Resolving entities locally.

CatalogResolver resolver = new CatalogResolver();
saxb.setEntityResolver(resolver);

Ignoring entities.

saxb.setEntityResolver(new EntityResolver() {
 public InputSource resolveEntity(String publicId, String systemId)
   throws SAXException, IOException {
  System.out.println("Ignoring " + publicId + ", " + systemId);
  return new InputSource(new StringReader(""));
 }
});

Namespace

The xPath does not work with the xhtml namespace, so we have decided to delete it when reading the file, however this is not the best practice. Thus, we change this behavior in the new version to reduce the modifications of the HTML files.


Copyright © 2011 AngocA . All Rights Reserved.