The XHTML transformation contained in LibreOffice can be used to transform ODF documents into XHTML. Using ODFXSLTRunner it is not even required to extract the ODF package.
The XHTML style sheet runs only with the SAXON XSLT Processor. Using version 9.1 (or higher) is recommended.
The following command converts an ODF text document <ODT> into an XHTML document <XHTML>:
java -cp odfxsltrunner.jar:<saxon.jar> org.odftoolkit.odfxsltrunner.Main
-f net.sf.saxon.TransformerFactoryImpl -x Pictures/
<ooo-xslt>/export/xhtml/opendoc2xhtml.xsl <ODT> -o <XHTML>
<saxon.jar> is the jar of the SAXON XSLT Processor. <ooo-xslt> is the location of the LibreOffice XSLT stylesheets within an LibreOffice installation. In a typical LibreOffice installation, this is
<BASEDIR>/share/xslt on (Linux and Windows).
Note: A few changes were necessary to use LibreOffice's XHTML transformation with ODFXSLTRunner.
The XHTML transformation in the above example extracts all images from the ODF document, regardless whether these are referenced in the XHTML document. To extract only the images that are referenced a list of referenced images may be created by the create-html-img-list.xsl style sheet that is contained in the sample-xslt folder of ODFXSLTRunner. It is applied to the XHTML file, and the target is a text file that contains the references images. Although the input file is an XHTML rather than a ODF file, ODFXSLTRunner may be used to apply the style sheet.
java -jar odfxsltrunner.jar create-html-img-list.xsl -i <XHTML> -o <img-list>
unzip <ODT> <img-list>
Unlike LibreOffice's HTML filter, the XSLT-based XHTML filter does not convert embedded objects into bitmap images. It is however possible to replace the embedded objects of ODF document with the images that the HTML filter has exported by applying the replace-object.xsl style sheet that is contained in the sample-xslt folder of ODFXSLTRunner to the ODF document.
The following steps are required:
java -jar odfxsltrunner.jar replace-objects.xsl <input odf> <output odf>