Tuesday, April 15, 2014

Europarl corpus v.7 en-fr word-aligned with GIZA++

Finally, I finished aligning the Europarl corpus with GIZA++. Since this took me several days, I thought some people would be happy the find directly the word-aligned version online (saving processor power consumption at the same time!). So here it is, along with the config file that produced it. The source language is English, the target language is French. I basically followed instructions given here (many thanks to the author!).

3 comments:

  1. Thank you for this, Catherine! It was extremely useful.

    Did you happen to find ready-aligned versions of Europarl for en-fr as well, or other languages?

    ReplyDelete
    Replies
    1. Correction: I meant the reverse, so fr-en I guess.

      Thank you! :)

      Delete
    2. In the end I managed to do them myself; I will upload them too soon following your good example

      Delete