Joys and rants of a Python programmer | Pow! Wham, bam, kapow!

Nov/09

29

Testing your translations for bugs

The problem

We have released the Polish version of Ututi a week ago, and that taught me a couple of lessons:

  1. I18n in Pylons support is lacking
  2. You must have tests for your translations just as you test your code

There are some flaws in Pylons I18n that got me longing for Zope3 I18n.

  • There is no ‘default’ translation. When I used Zope3 I used to be able to say _('ok-button-text', default='OK'). With Pylons I have to have an English to English translation, which means that translators cannot see the default text, which leads to mistakes like 'ok-mygtuko-tekstas' instead of 'Gerai' or 'OK'
  • Using Python formatting directives leads to tracebacks if there are bugs in the translations. If someone translates 'Hi %(fullname)s!' into 'Labas %(fullname)', all the pages that try showing this message will end up as error pages, because of the missing 's'.
  • Mako templates are very very very translation unfriendly. Simple translatable texts look like: ${_('Hi!')} and more complex texts in our templates end up looking like:
    ${_('A new file %(link_to_file)s was uploaded for the subject %(subject_title)s') % dict(
            filename=h.link_to(c.filename, c.file_url),
            subject_title=c.subject_title)}
    
    It looks like perl already, I am not even talking about what happens when you have something like an email, with multiple lines of text and multiple embedded links, to translate. Though Zope’ish
    <span i18n:translate="">A new file
    <a tal:attributes="href view/file_url" i18n:name="link_to_file" tal:content="view/filename" />
    was uploaded for subject
    <tal:block i18n:name="subject_title" tal:content="view/subject_title" />
    </span>
    
    is just as ugly for short pieces of text I would surely prefer it for something that is more than 2 lines of text. An Emacs macro that wraps any selected text in ${_('<text here>')} helps to reduce the strain, but I’d prefer dedicated markers for translatable text, ones that would be easier on the Shift-pressing hand than what we have now.
  • Babel has problems extracting some strings, some times, from Mako templates. The workaround -
    • run tests (yay almost 100% coverage),
    • copy template cache into your ’src’,
    • extract the translations,
    • remove the copy.
    • And then remove all the fuzzy markers from plural strings that are marked as # ,python-format by babel, which as you can guess is all the plural strings. We don’t want to guess the position of the hours in a sentence like 'uploaded %(hours)s ago'
    I still haven’t managed to find the time to reduce the problematic templates to something that I can fit into a bug report. (Update: Seems like the bug that is causing some (if not all) of the problems was reported 8 months ago in the mako bug tracker)

The solution

Now that we’re done explaining what’s wrong, let’s talk about something more constructive – making sure that tracebacks do not happen, because of typos in translations. First – we need a nice translation tester. Candidates:

  1. potest
  2. gettext-lint
  3. pofilter from translate-toolkit

potest

Last commit 78 weeks ago. Can’t parse plural forms. Verdict — unusable.

gettext-lint

Seems to be written in Python, but packaging uses autoconf !? which generates a Makefile that does nothing. Seems cumbersome to use, can’t check html tags (some translators get this idea of translating <strong> into the target language), does not handle %(foo)s syntax.

pofilter

The tarball has all parts that are needed to package this tool as an egg, but it is not easy_installable. So what do we do? The same thing we do every night,

  • extract,
  • python setup.py sdist,
  • scp dist/translate-toolkit.tar.gz pow.lt:~/www/eggs/

Now we just make a new virtualenv and easy_install translate-toolkit in it:

virtualenv translations
cd translations
bin/easy_install translate-toolkit --find-links=http://pow.lt/eggs

And test it on one of the projects in my src that has a lot of translations — SchoolTool:

bin/pofilter ../trunk/schooltool/src/schooltool/locales \
               -o ./out -t printf -t xmltags -t variables --openoffice

(I pass the --openoffice parameter, so that it would recognize Zope3 translation markers, like ${calendar_title}, as variables)

This results in a bunch of PO files in the ./out, each file containing the errors for the corresponding translation file.

# (pofilter) variables: do not translate: ${event_title}
#: /src/schooltool/app/browser/templates/recevent_delete.pt:4
#: /src/schooltool/app/browser/templates/recevent_delete.pt:14
msgid "Deleting a repeating event (${event_title})"
msgstr "Šalinamas pasikartojantis įvykis (${event title}"

Pretty cool, eh? pofilter has most of the functions I need, I just have to integrate it into my sandbox and extend it a little bit. So first I add:

[test_translations]
find-links = http://pow.lt/eggs/
recipe = zc.recipe.egg
eggs = translate-toolkit
       lxml
entry-points = pofilter=translate.filters.pofilter:main

to my buildout.cfg.

(I added an entry point to the [test_translations] section, because all the translate-toolkit scripts seem to be defined as plain scripts and not registered as console_script entry points)

Customizing pofilter is slightly difficult. I could not find any defined hooks that would allow me to customize the functionality. And “xmltags” seems to be picking up all the translated “title” attributes on links, which is annoying. So after reporting this as a bug I just find the “main” function in translate.filters.pofilter, copy it and produce – this:

from translate.filters.pofilter import cmdlineparser
from translate.filters.checks import StandardChecker

from translate.filters.checks import CheckerConfig

ututiconfig = CheckerConfig(
    canchangetags = [("a", "title", None)]
    )

class UtutiChecker(StandardChecker):

    def __init__(self, **kwargs):
        checkerconfig = kwargs.get("checkerconfig", None)
        if checkerconfig is None:
            checkerconfig = CheckerConfig()
            kwargs["checkerconfig"] = checkerconfig
        checkerconfig.update(ututiconfig)
        StandardChecker.__init__(self, **kwargs)


def main():
    parser = cmdlineparser()
    parser.add_option("", "--ututi", dest="filterclass",
        action="store_const", default=None, const=UtutiChecker,
        help="use the standard checks for Ututi translations")

    parser.run()

Then registered this new function as an entry point instead of the old one:

[test_translations]
find-links = http://pow.lt/eggs/
recipe = zc.recipe.egg
eggs = ututi
entry-points = pofilter=ututi.tests.translations:main

Now if I will pass “–ututi” to pofilter it will not raise warnings for title attributes anymore.

Icing on the cake

Tests are pretty useless if they are not run, and we want to run our tests after every modification to the code, and after every commit to our git server. As I am using make as my tool to run everything, I just added these two targets to the Makefile.

.PHONY: test_translations
test_translations: bin/pofilter
	bin/pofilter --progress=none -t xmltags -t printf --ututi src/ututi/i18n/ -o parts/test_translations/
	diff -r -u src/ututi/tests/expected_i18n_errors/ parts/test_translations/

.PHONY: update_expected_translations
update_expected_translations: bin/pofilter
	bin/pofilter --progress=none -t xmltags -t printf --ututi src/ututi/i18n/ -o parts/test_translations/
	rm -rf src/ututi/tests/expected_i18n_errors/
	mv parts/test_translations/ src/ututi/tests/expected_i18n_errors/

Even after changes pofilter is still reporting 3-4 false positives, that I will have to resolve with our translators, so instead of expecting absolutely no output, I am just asking for the output to be identical to the old one. If it is a known/accepted failure – we let it be.

And of course – made our dear Hudson run this after every commit, for when I forget to do it myself.

Fin!

  • Digg
  • Reddit
  • Delicious
  • StumbleUpon
  • Share/Bookmark

RSS Feed

No comments yet.

Leave a comment!

<< Running browser in the middle of a Pylons test

Cyclomatic complexity in emacs >>

Find it!

Theme Design by devolux.org