Monday, May 4, 2009

Easy TeX parsing (or: TeXpp and plain.tex)

Some time ago I have blogged about TeXpp. Today it have reached a stage when it loads plain.tex (i.e. the source of the plainTeX format) with only one warning (about unimplemented \input command which is not fatal in this case).

But what do I mean by "loads" ? That is: "parses the file and executes each command in it gathering information about all macro definitions, variable assignments, etc.". With this information TeXpp is able to parse any document typed in plain.tex format (yes, I know, you don't have such documents, neither do I - LaTeX support is coming in a near future).

To check correctness of parsing, I have enabled all possible trance information in TeX, parsed the document using TeXpp and Knuth's TeX and compared the log files. Actually I have 55 unit tests that works exactly the same and proves the TeXpp compatibility with TeX in many corner cases. Another 1437 unit tests are based on real-life documents from arxiv.org (but currently is this case a test scenario is a bit different).

That's all for today. Later I will probably blog about TeXpp abilities and how could it be useful for KDE-related projects.

Friday, May 1, 2009

Translating XML data files: a solution

Some time ago I was asked about translations of example files that are bundled with Step. These files are in XML-based format specific to Step and they do contain user-visible strings (notes, user-visible object names). The use of runtime translation mechanisms (for example as described here) was not an option because the files should be user-editable.

So the solution was to make a copies of the files for each language and install them to $DATADIR/step/$LANG/examples. Despite being simple, this solution has serious problems:
  • as there are no .pot files, translators simply don't know that the files are translatable
  • translators should deal with strange unfamiliar format, they can't use convenient tools like Lokalize
  • keeping translations in sync is really hard
A better solution should obviously be based on .po files. That is:
  1. Extract strings from XML file to .po file (in Messages.sh script)
  2. The .po files will be handled by translators as usual
  3. Merge strings back to XML files when building l10n module
The idea is not new and several tools implementing it already exist (namely extractrc script from KDE and intltool from Gnome). However these tools are tailored for a specific set of formats and can't be easily configured to work with new formats.

Instead of implementing something just for Step, I have written a more generic solution: extractxml. This is a rather simple python script that can be used to translate a variety of XML-based formats. The usage is simple, for example a command:
$ extractxml --context='%(tag)s' --tag=name --tag=text \
--extract test*.xml --xgettext --output=test.po
will extract the content of "name" and "text" tags from all test*.xml files into test.po file. The command:
$ extractxml --context='%(tag)s' --tag=name --tag=text \
--translate --po-file=test.po test*.xml --output-dir=i18n
will merge the translated strings back and save the translated files into i18n subdirectory.


The extractxml has some more features, just run "extractxml --help" to see them all. For example it is capable to match tags by regular expressions, strip and unquote the strings, recursively handle embedded XML fragments (for example rich text generated by Qt Designer).

A complete example of incorporating extractxml info a KDE l10n subsystem is available in trunk/KDE/kdeedu/step/step/data/ (take a look at Messages.sh, CMakeLists.txt and */CMakeLists.txt files).

Currently extractxml lives in trunk/kdeedu/step/step/data, but in case there will be some interest in it, I will be happy to move it to a more prominent location.

Saturday, March 7, 2009

TeX parsing: TeXpp

In the project I'm currently working on we need to do some automatic transformation of LaTeX documents. The most important requirement is simple: never break neither meaning nor formatting of the document. Unfortunately this requirement is also the most difficult to fulfill.

In spite of big popularity of TeX there are not so many libraries for parsing it and none of that libraries fulfill stated requirement. The reason for it is simple: TeX format is extremely hard to parse because its grammar is context-dependent. You can never be sure about meaning of any given character without parsing and executing all commands in the document before that character.

The first attempt was to use LaTeX::TOM perl module and it somehow worked, but with many bugs and limitations. Then there was an idea to modify original tex program to extract information we need but it turned out to be non-manageable task in reasonable timeframe. It seems that not so many people are dare enough to touch that code: not only it is written in not-so-popular and somehow cryptic pseudo pascal, but the whole organization of the code is very different from modern programs. I'm not saying its bad, but its very unfamiliar and hard to work with for modern programmers.

So I've decided to implement my own solution. For now I have a lexer and a parser that builds basic document tree for tex documents. Currently only a few TeX commands are supported (actually the whole list is: \relax, \par, \show and \let) but the framework is ready and new commands can (and will) be added very easily. The resulting document tree allows reconstruction of original source document as well as modifying parts of it.

Today I've released the whole code at http://code.google.com/p/texpp/. How that code can be used ? Think about full intelligent TeX code completion, online error detection, TeX debugging, etc. Kile developers ?

Monday, June 30, 2008

Converting any window into plasmoid

Take a look at the following screenshots:

Besides familiar Clock and Calculator you can see several new plasmoids here:
  • Konsole
  • MPlayer
  • Speedcrunch calculator
  • Step physical simulator
No, I have not rewrote all that apps as plasmoids. Instead I've wrote an applet that allows you to convert any running application into plasmoid. Currently it works using XEmbed protocol and therefore have lots of problems:
  • You can't rotate these plasmoids and they don't work correctly with ZUI (however the latter can still be improved a bit)
  • Z-order of plasmoids is not respected (i.e. in case you have several overlapping plasmoids)
  • Switching between Desktop and Dashboard is not very smooth: embedded applications are only repainted after several-seconds delay
All these problems can be fixed by using Composite extension (and input redirection) instead of XEmbed but this will probably require some cooperation with KWin and will only work when Composite is enabled.

There is another problem: currently you have to manually re-configure all window-embedding plasmoids after restarting plasma. For KDE applications this can probably be partly fixed by using KDE session management, for other applications one can allow the user to setup command line of the application to start.

Despite all current limitation, I think this approach still has its own use-cases, for example when you want quick access to some of your applications from Dashboard (for example Konsole) or if you want something not-yet-available as plasmoid on your desktop (for example video player, advanced calculator, web-browser with your webmail, a game from kdegames, googleearth, ...).

If you still want to try it, the code is available in playground/base/plasma/applets/embed-win.

Thursday, June 28, 2007

Exams are over

Last two weeks I've spend almost all time preparing to my last exam and writing my diploma. Yesterday I've defended my diploma and got bachelor degree in physics. So now I have a lot of time and motivation to work on Step.

Monday, June 11, 2007

Annotations for objects

I've just implemented basic text annotations for Step. Now you can add text items to the scene, very soon you will be able to associate annotations with the objects. What I still can't decide is the appearance of the annotations. I see two possible options:
- text in a frame (like KNotes) resizeable by mouse.
- free-running text without a frame dynamically resizeable as you type

Wednesday, June 6, 2007

Editing and threading

Recently I've implemented threading in Step. This is (still) not parallel computation, but doing all calculations in one dedicated thread to keep GUI responsive and allow the user to abort calculations at any time.

In Step the user can edit the scene while simulation is running but this can't be done in the middle of timestep. So I need to use mutex to serialize access to the scene: the mutex is locked by calculating thread while performing timestep and by GUI thread while altering objects. The problem arises when calculation of one timestep becomes too long (for example if user sets too low tolerance): if GUI tries to lock the mutex it becomes unresponsive till the end of the timestep and the user can't abort the calculation !

The obvious solution is to abort current timestep as soon as the user starts altering the scene and restart it later. But this can slow down the simulation and make it not smooth. My solution is to first try locking with timeout and abort the timestep only if locking fails:

// Try to lock but do not wait mode than one frame
if(!mutex->tryLock(1000/_simulationFps)) {
abortCurrentFrame();
mutex->lock();
}

With such approach simulation will be slowed down only if it is already slow (i.e. when simulation of one frame simulation takes longer then 1/FPS). In the other case simulation will proceed smoothly and user actions will be delayed only slightly.