Download PagesSoftware for Natural Language Processing (with source code)A recent snapshot of my source code can be found here. These are working versions with the latest features and bugfixes, but they are not systematically tested and may at times fail to compile. For more stable versions see the releases below. RegAligner - A tool for regularized word alignmentRegAligner is a tool for word alignment that augments the traditional maximum likelihood criterion for single word based models by regularity terms. Right now it handles the models IBM 1-4 and HMM, with (optional) slight variations. In the future we will also go beyond these models.A recent snapshot is available in the git repository. Get the latest release version 1.21 now with IBM-5 and nondeficient mode for IBM-3 and IBM-4. VI3 - Computation of IBM-3 Viterbi Alignments (patch for GIZA++)Computation of IBM-3 Viterbi Alignments has been shown to be NP-hard and the popular toolkit GIZA++ uses a suboptimal hillclimbing strategy. However, we have shown that using a combination of Integer Linear Programming (ILP) and a clever prior reasoning stage, computing exact Viterbi alignments is efficient enough for reasonably large corpora. This patch enables GIZA++ to compute the exact alignments. A recent snapshot is available in the git repository.This patch requires Coin-OR CBC. Software for Computer Vision (with source code)Regioncurv: Region-based Curvature Regularity for Segmentation, Inpainting and DenoisingThis is a very innovative toolkit for curvature regularity approaches to image segmentation, inpainting and denoising. The core principle to all approaches is a linear program with surface continuation constraints. A number of different solvers are supported. Regioncurv requires Coin-OR CBC.A recent snapshot can be found here. Download the latest release 1.1, now with message passing solvers. Toolboxes for Various Fields (with source code)C++-ToolboxThe C++-Toolbox is meant to ease software development. It does not provide executable binaries. Instead, it offers the following classes:
Optimization-ToolboxThis is a collection of topics related to optimization problems that is meant as a library (and hence offers no executables). At present it covers the following areas:
DataGold Alignments for Europarl De-EnAnnotations for 300 sentences of length up to 80. Provided are sure and possible alignments in the (machine-readable) standard format. We also provide visualizations in pdf and png formats. Get the new version 2 with corrections and twice as many sentence pairs here. The associated corpus (version 6) can be found here. |