HTML files declare the DOCTYPE in the first line, and they indicate that the DTD is remote in w3c.org page. However, each time a file is processed, the parser will try to retrieve this file, and this takes a lot of time.
To deal with this issue it is necessary to define an Entity Resolver and one possible way is to use the EntityResolver from Apache. Then, it is necessary to have a CatalogManager.properties file that indicates where is the catalog. All of this is heavy but it prevents to search for the DTD in the internet if they are already locally (it should be a copy of the xhtml-transitional.dtd and the three .ent) but at the same time there is a range of different DTD, so I decided to ignore the DOCTYPE, and I create an implementation of the Interface EntityResolver that just ignore them.
This is simple, it does not require other files, and can work with different DOCTYPES. At least, I do not need to know if the document is well formed, all that I need is to replace an HTML tag with another.