[IPOL discuss] Typographic problem

Nicolas Limare nicolas.limare at cmla.ens-cachan.fr
Thu Aug 18 13:13:29 CEST 2011


Hi everyone,

In this (long) mail, I try to summarize all the ideas that were
floating in my head about IPOL, HTML, LaTeX, PDF, etc... Your opinion
is welcome, your help is needed.

TeX formula alignment in current HTML pages
-------------------------------------------

> 1- Has anyone a solution for aligning .tex formulas with text,  or
> is it a desesperate situation?

Pascal explained some tricks to improve the appearence of mixed text
(standard text and math formulas). This is only partial solution, the
root of the problem comes from the technique currently used in IPOL,
rendering math as images. Because images is not text, there will never
be a very good solution to render math in text as long as it is
rendered as images.

The only potential native rendering technique would be MathML, but it
is unusable because of poor support from major browsers[1] and it is
not likly to improve soon, given the history (MathML dates from
1998!!!).

MathJax
-------

Lots of external solutions exist, based on images[2] or
javascript[3]. The best of these solutions is probably MathJax, due to
its large browser support and strong academic backing. I think it is
possible to write an ikiwiki plugin to correctly handle TeX formulas
for rendering with MathJax, and a very partial result is available
here
    http://dev.ipol.im/~nil/tmp/ipol/rendering/iki_mathjax/
to be compared with the current situation
    http://dev.ipol.im/~nil/tmp/ipol/rendering/iki_teximg/
Note that the ikiwiki directive syntax will still be needed, something
like,
    [[!tex "-\Delta_du(i,j)=F(i,j)"]]
because the MathJax needs '$' delimitors to recognize TeX, and the
markdown syntax used for the IPOL documents will not let these '$'
appear like that.

[1] http://en.wikipedia.org/wiki/MathML#Web_browsers
[2] gladTeX, mimeTeX, Google Chart API
[3] LaTeXMathML, jsmath, MathJax, MatHTML

File formats and interface
--------------------------

There is more than the formula layout:
a. The typesetting quality of IPOL. MathJax is only a very partial
   possible solution to the larger problem: IPOL www articles don't
   look as nice and can not be comfortably read like the usual PDF
   files generated from TeX and usually distributed in our research
   community.
b. The edition interface and workflow. How should authors send their
   article, using which language or format?
I think these 2 points are connected to Jean-Michel's second question:

> 2-Back to .tex + .pdf. My proposition would be to furnish like other
> journals (simple) style files in .tex to authors. They would retain
> most form features of the current web pages, but would of course
> lose some functionality. That's as it is!

PDF vs HTML as an output format
-------------------------------

TeX has 2 avantages: high-quality typesetting and very good
penetration in the community.

Indeed, we could receive articles as a LaTeX files and attachments,
produce a PDF, and distribute this PDF file. This file vould look
good, would be storeable, printable and shareable, and would be not
different from all the other journals. But a PDF file only contain
text and images[4], so I want to make a list of the things we won't be
able to do in TeX/PDF files:

* No more attached files, like the source code. The source code can be
  available from the web page, but the authors loose the possibility
  to present the attached files like they want, and have some comments
  and explanations next to the download links. This applies to any
  attached file, datasets, compiled code.
* No more source code documentation in the article. Here again, it can
  be made available with the source code on the web page, but not from
  the article PDF file.
* No more videos. No sound (I know some people are thinking about
  sound proicessing algorithms).
* No more "galleries" (multiple superposed images).
* No more "optional sections" (show/hide parts).
* No more thumbnail images linkes to the full-resolution version.
* Difficult to have reliable links between the PDF article and the
  demo and archive www pages.
* Probably other things too...

If we are ok to loose these features of the current IPOL for a better
typesetting and familiar edition tools, then we can switch to
LaTeX/PDF as our primary article formats, instead of markdown/HTML.

[4] The other media types, like video and 3d meshes, can not be
    seriously used without Adobe AcrobatReader and I don't think being
    tied to a reader is acceptable. For a long time, there was no
    AcrobatReader for Linux 64bits, and Adobe ReEader 10 is currently
    only available for Windows and Mac.

Publishing in LaTeX/PDF
-----------------------

Some things will have to be decided for LaTeX/PDF articles:
* We need a LaTeX style, who can design one for IPOL?
* We need a list of allowed packages. As far as I understand it, TeX
  is designed to be very independant of its computing environment, so
  there should be no concern on the availability of TeX in the future,
  but LaTeX packages are not designed with the same concern. Which
  packages should we allow. Won't we have some syntax change for some
  packages in the future? Is it possible to convert a LaTeX file to
  plain TeX? Including the bibTeX parts?
* We need to decide how to render LaTeX into PDF. pdflatex is the
  usual solution, but not the only one. And among other differences,
  pdflatex accepts PNG and PDF embedded graphics, while
  (plain-old)latex accepts EPS. What will be accepted by the
  next-generation LaTeX renderer, when pdflatex is obsolete? Or maybe
  we should not matter and be confident about out ability to adapt
  current LaTeX documents to future latex compilers.
 
I can't answer any of these questions, I only know the bare minimum
about TeX and LaTeX. Rafael, maybe...?

Publishing in LaTeX/PDF+HTML
----------------------------

Some of the HTML advantages could be maintained if we could receive
LaTeX from the authors, and produce a PDF and an HTML version of the
article. It won't solve the lack of video, sound, galleries,
thumbnails in the article, because HTML and PDF versions will come
from the same source, but it will integrate better in our www-based
demos and archives.

The problem here is the poor quality of the LaTeX/HTML converters. All
these tools (latex2html, hevea, tex4ht, pandoc) work by parsing the
LaTeX to produce some HTML, but they only understand a subset of the
LaTeX language, and you will usually have problems with unknown
packages or local definitions for example. The only very good solution
would be another TeX variant, like TeX/LaTeX and PDFTeX/PDFLaTeX,
which would read the LateX file and interpret is as a programming
language with HTML output instead of dvi or PDF.

But such a tools doesn't exist. Meanwhile, I think the best LaTeX/HTML
currently available is pandoc[5], and we can try using it with the
following workflow, based the proposition from Pascal and Juan:

1. IPOL receives from the contact author
   - a LaTeX document
   - all the files (images, graphs, ...) needed by this LaTeX document
   - a source code archive
   - (optional) some other attachments: videos, data sets, etc.
2. IPOL produces a PDF file from the LaTeX document, with pdflatex
3. IPOL produces an HTML file from the LaTeX document, with pandoc
4. The PDF file, the HTML version, the source code, the attachments
   and the LaTeX source are used ton compose a single web page on
   http://ipol.im/; this page is the reference address (DOI URL) for
   the article

I omitted the review process, because I don't know how this should be
handled, which files sent and when, in this new context.

There would be no more wiki access for the users, no more web
edition, no more username/password. ikiwiki (or anything else) will
only be used to compose the final web page of the article, and this is
good because out ikiwiki is getting very heavy and slow.

The points 2. and 3. will be manually processed, but will be partially
automated later, once some things are defined (accepted LaTeX
packages, standard filenames and pdflatex/pandoc calling convention,
etc.).

If this looks good to you, we need to try before any firm decision.
Who volunteers to provide to IPOL the first article written in LaTeX?
I can handle the steps 2. 3. and 4. (but would happily let anyone do
it) if you make the effort to provide a very clean and standard LaTeX
document. And you have to accept the possibility that we finally go
back to the wiki system and you must rewrite your article.

Another thing to do is check the quality of the pandoc LaTeX/HTML
conversion. Could you install pandoc on your machine (or use the
fuchsia development server, pandoc is installed), pick some of your
typical LaTeX articles, convert them with

    pandoc --from latex --to html --standalone --normalize --smart --mathml --toc < article.tex > article.html

and comment on the quality of the result? You can also try other
pandoc options instead of --mathml, and try the --parse-raw option.

[5] http://johnmacfarlane.net/pandoc/
    and available in most distributions


8<----------8<----------8<----------8<----------8<----------8<----------
The remaining part of the message is not relevant for the IPOL
typesetting decisions, these are aswers to some points from the
other messages on the thread..

@Jean-Michel:
> Clearly, most online scientific journals have opted for .pdf, so
> far.

Scientific journals have not opted from .pdf, they come from the paper
world, the printed journals era and when they wondered how to
distribute files intead of paper, bits instead of atoms, they chose
PDF because it was the only format available to distribute a file that
can be seen on a screen and printed like a paper article.

The only alternatives i know are PS , dvi or DjVu. DjVu only came in
1998 (PDF in 1993), and only contains the visual information of a
document, not its semantic content. dvi can't embed the fonts, which
is a problem for the distribution of the file. PS came before PDF, and
PDF is build on PS, but if I remember correctly, the advantage of PDF
in 199x was the better compression, and it was important when the
Internet experience was mostly slow RTC connexions. Moreover, PDF was
strongly pushed by Adobe who was already present in the printing and
edition industry.

And after a few years PDF became a de-facto standard, just because it
could be used to faithfully reproduce the visual appearance of a
printed document.

> Thus why should we innovate at the point where our papers either
> look unprofessional, or request a lot of editing tricks to look decent?

All the editors still distribute e-paper as PDF, they still handle
exactly the same content as 50 years ago: text and images. All the
journals who allow "additional material" disconnect the "article" (PDF
file) from these materials, stored and available separately. IPOL
needs another solution becuse we include source code and demos in the
publishing process. This is not an excuse for looking bad,
unprofessional ans uncomfortable to read.

@Daniel:
> there could even be an online pdf-viewer as e.g. is the case at
> Springer

What is the online PDF viewer? Is it like browser plug-in to display a
pdf in an HTML page? I dont use this thing, I don't know how useful it
cvan be.

@Miguel:
> The IPOL case is just the inverse. The articles must be review and
> once accepted they're freezed, so there's no need, in my opinion, to
> keep or force using a dynamic editor.

I think the issue is not the web interface or the wiki system. You
could completely ignore the wiki part and decide to send your markdown
file once for HTML conversion and publication, and never edit it
anymore. And I don't like the web interface either. I think the issue
is the format we want to use to:
- receive the articles from the authors
- publish the articles
This format wuestion exixts with or without a wiki-like web editing
interface.

@Juan:
> At the moment the wiki is more a problem than a solution for the
> author, and I think we need to urgently diminish the workload
> involved in preparing a demo. 

I don't see the link between the wiki/formats and the demos, I think
these are different problems. And any solution for easy (one-click?)
demos is welcome. Agustin's effort may help us rduce the workload,
other ideas can be proposed and tried too.

-- 
Nicolas LIMARE - CMLA - ENS Cachan    http://www.cmla.ens-cachan.fr/~limare/
IPOL - image processing on line                          http://www.ipol.im/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://tools.ipol.im/mailman/archive/discuss/attachments/20110818/b5015151/attachment.pgp>


More information about the discuss mailing list