Hermes description
Romeo Anghelache
Golm, 26 Oct 2004
- authoring + archiving + publishing documents
- documents:
- generic structure
- natural language
- controlled vocabularies
- administrative metadata (author, title, editor, comment author)
- sciences (mathematics, chemistry...)
- arts (music...)
- archiving:
- preservation of long term meaning:
- semantics oriented XML (postponing rendering issues)
- vocabularies: MathML content, OpenMath
- deep structure: DocBook, OmDoc
- publishing:
- preserve / adapt looks
- long term meaning on classical media
- short term meaning on digital media
- typesetting oriented authoring: TeX, MsWord
- presentation oriented XML: SMIL, SVG, MathML presentation
- de facto authoring standards:
- TeX, MsWord ...
- macros: TeX evolves towards semantics preserving rendering, (almost) same with MsWord;
- none designed for semantic level preservation
- both a muddy mixture of mostly presentation and few semantic primitives
- semantic layers:
- 1st layer: presentation
- can see: by way of screens / printers, just enumeration
- 2nd layer: document structure
- can administer: sort, search, index, answer
- 3rd layer: data structure (document content)
- can check consistency, validation
- can exchange
- can creatively reuse, infer
- TeX
- source deflated into a set of rendering instructions in the dvi
- + \special -> possibly semantic dvi
- dvi:
- the path of least resistance to export to other standards
- 128 instructions:
- to move a pencil position
- to choose a font and pick a glyph from it
- Hermes
- define/use controlled vocabularies
- reuse the old ones
- mathematics
- document structure
- generic text formatting
- parse semantic dvi
- generate XML
- document structure
- administrative metadata
- controlled vocabularies
- MathML (OpenMath)
- more:
- scientific metadata (references)
- PhysML...
- author keywords
- editor keywords
- general aims:
- help a potential search engine (where publishers should compete)
- help a confused reader (where publishers should compete)
- presentation hints (legacy)
- good enough for archiving (validation possible)
- not fit for direct authoring (too verbose)
- prepare the XML for rendering
- reuse legacy presentation hints
- use rendering industry
- use renderers for controlled XML vocabularies:
- MathML Player
- Mozilla MathML engine
- ...(where publishers should compete)
- Hermes@work
- presentation layer, recover basic TeX info: fonts
- map fonts in text region to Unicode
- one to one
- Unicode available precombined accents
- the rest of accents handled according to Unicode combining mechanism
- map font codes in math region to Unicode+MathML presentation
- delimiters, identifiers, operators:
- arbitrary (meaningful) combinations
- nonneutral (left, right) delimiters should be in pairs
- recover the original presentation hints (only font names)
- annoyances (all of them):
- unbalanced parenthesis inside math regions (blocking)
- those undetected by TeX are caught by Hermes: (test $a)$
- a low level bra ket |a><b| is not accepted, use semantic macros \ket{a}\bra{b} instead
- math operations on neutral delimiters (nonblocking):
- they should look bad and should not validate
- semantic layer, enrich TeX source using controlled vocabularies
- map de facto (La)TeX macros into math functions:
- map \cos to presentation cosine (just looks like cos)
- map \Cos to content cosine (means cosine and looks like cos)
- define new math structures:
- a \bra and force the author to use it
- a \binom optional (rendered correctly anyway) but preferrable for accurate meaning (archiving)
- let the author define keywords
- transparent to rendering but hinting the search
- refines the answer to the level of paragraphs and math formulas
- recover the document structure
- leave \special traces for sections, paragraphs, images...
- recover the administrative metadata
- leave \special traces for author, title...
- recover the scientific metadata
- leave \special traces for labels , citations...
- annoyances (all of them):
- by design:
- TeX primitives which backparse the source give garbage output
- $a+b\over c+d$, hermes understands $a+ b/c +d$
- solutions:
- use \frac instead
- add nonneutral (even transparent) delimiters:
- $\left.a+b\right.\over{c+d}$
- $(a+b)\over{c+d}$
- $(a+b)\over{(c+d)}$
- temporary, i.e. fixable bugs:
- embedding \special may still generate conflicts with some packages (currently mostly amsmath in LaTeX)
- due to independent stacks of macro layers
- albeit minimal disadvantage compared to
- defining new formats and convincing authors to use them
- hacking the TeX engine
- modifying extensively the macro layers themselves
- Hermes user
- \input (Hermes macro file: dlt, da, dalt, dt; starting with version 0.8.8) :
- LaTeX:
- \input dlt
- \begin{document}
- AMSTeX:
- \documentstyle{ppt} (optional)
- \input da
- AMSLaTeX:
- \usepackage{amsmath}
- \input dalt
- \begin{document}
- tex it
- run hermes with the dvi result as argument:
- ./hermes document.dvi >document.xml
- send document.xml to your library
- render it on your personal website:
- use stylesheets (Hermes and MathML) to get XHTML+MathML
- xsltproc pre.xsl document.xml >render.xml
- tell your colleagues to print/copy/paste it directly from the browser
- Hermes current status:
- covered:
- LaTeX and AMSTeX math
- accents and presentation hints
- internal crossreferences (sections, equations)
- not implemented yet (for reasons of priority only):
- math:
- AMSLaTeX (LaTeX with amsmath package)
- document structure:
- lists, tabbing, url, image, bibtex, keywords...
- no fundamental limit in conversion from arbitrary TeX to MathML
- no guarantee that the text regions will look exactly the same
- boxed text, hyphenations, paragraph layout...
Hermes is covered by GNU GPL, and it is being developed by Romeo Anghelache,
as involvement, in the EU funded MoWGLI IST project, of LivingReviews, from Max Planck Institute for the Physics of Gravitation, Golm, Germany.