This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
shared_task_description [2013/06/05 04:28] skuebler |
shared_task_description [2015/02/20 19:10] (current) dseddah |
||
---|---|---|---|
Line 16: | Line 16: | ||
- constituent structure | - constituent structure | ||
- | - dependency structure (in conll07 format) | + | - dependency structure |
Constituent structures are available in two formats: an extended PTB bracketed style (eg Penn Treebank with morphological features expressed at the POS or non terminal levels, see below) and, if available, the Tiger 2 format. The latter has the possibility to represent trees with crossing branches, allowing the use of more powerful parsing models (in term of expressivity) than pure PCFG-based parsers. | Constituent structures are available in two formats: an extended PTB bracketed style (eg Penn Treebank with morphological features expressed at the POS or non terminal levels, see below) and, if available, the Tiger 2 format. The latter has the possibility to represent trees with crossing branches, allowing the use of more powerful parsing models (in term of expressivity) than pure PCFG-based parsers. | ||
Line 25: | Line 25: | ||
- | Participants can choose either one of those frameworks, or both, or one by conversion from the other). | + | Participants can choose either one of those frameworks, or both, or one by conversion from the other. |
=== Input scenarios === | === Input scenarios === | ||
Line 68: | Line 68: | ||
* Note that if one wants to deliver a lattice in which segmentation is ambiguous, they can do so by adding lines for alternative spans or alternative tags of spans. These lines need not be sorted. See the (real-world) example segmentation lattice here: [[http://cl.indiana.edu/~skuebler/multi.pdf|multi.pdf]] | * Note that if one wants to deliver a lattice in which segmentation is ambiguous, they can do so by adding lines for alternative spans or alternative tags of spans. These lines need not be sorted. See the (real-world) example segmentation lattice here: [[http://cl.indiana.edu/~skuebler/multi.pdf|multi.pdf]] | ||
- | or the German morphology lattice file (predicted from the SMOR analyser): | + | * or the German morphology lattice file (predicted from the SMOR analyser): |
* 0 1 Der PRELS gender=fem|case=dat|number=sg| 1 | * 0 1 Der PRELS gender=fem|case=dat|number=sg| 1 | ||
Line 77: | Line 77: | ||
The format of Form/Lemma/CPos/FPos/Feats is the exact same as in the CoNLL format, including vertical bars | The format of Form/Lemma/CPos/FPos/Feats is the exact same as in the CoNLL format, including vertical bars | ||
- | separating morphemes, and = separate feature values. | + | separating morphemes, and = separate feature values. The only additional value in addition to the CoNLL ones is the original token ID in the last column. |
Line 105: | Line 105: | ||
=== Evaluating All Scenarios === | === Evaluating All Scenarios === | ||
- | * For constituent evaluation on gold word segmentation of bracketed output (eg. PTB), we will use parseval: [[http://nlp.cs.nyu.edu/evalb/]]. **(Note add a link to evalb_lcrfs.py (W.Maier). How to formulate : "for output coming from parsing models trained on crossing branches trees, they'll need to be be converted to the negra format") | + | * For constituent evaluation on gold word segmentation of bracketed output (eg. PTB), we will use a modified version of Parseval's evalb: [[http://pauillac.inria.fr/~seddah/evalb_spmrl2013.tar.gz]]. Add -fPIC to gcc to compile for Linux. |
- | ** | + | **update *February 2014: the evalb package that was available on Djame's site was not the correct one. if your version doesn't have the -X switch, it's the buggy one.** |
+ | |||
+ | * | ||
* For dependency evaluation on gold word segmentation, we will use the CoNLL 2007 evaluation: [[http://nextens.uvt.nl/depparse-wiki/SoftwarePage#eval07.pl]] | * For dependency evaluation on gold word segmentation, we will use the CoNLL 2007 evaluation: [[http://nextens.uvt.nl/depparse-wiki/SoftwarePage#eval07.pl]] | ||
- | * For the fully raw scenario, we will use tedeval in the unlabeled condition: ([[http://www.tsarfaty.com/unipar/index.html]]) | + | * For output from parsing models using crossing branches trees, they will need to be be converted to the negra format: [[http://cl.indiana.edu/~skuebler/exformat3.pdf|exformat.pdf]]. For this scenario, we will use Wolfgang Maier's evalb-lcfrs: [[http://wolfgang-maier.net/evalb-lcfrs]]. |
+ | |||
+ | |||
+ | * For the fully raw scenario, we will use tedeval in the unlabeled condition: ([[http://www.tsarfaty.com/unipar/index.html]]) wrapper here: {{:dldata:tedeval_wrapper_08192013.tar.bz2}} | ||
+ | |||
+ | * French MWE Evaluation | ||
+ | On top of classical evalb and eval07.pl evaluation, we will also provide results on multiword expression. | ||
+ | Thanks to Marie Candito, the evaluator for dependencies output is provided on tools. (see test/tools/do_eval_dep_mwe.pl) | ||
+ | In the very next days, we'll provide the same script for mwe eval of constituency parses, however here's the readme of the current tool. | ||
+ | |||
+ | SPMRL 2013 shared task dependency evaluation script for French. | ||
+ | |||
+ | EXPECTED FORMAT for marking MWEs: | ||
+ | |||
+ | The script supposes that all MWEs are flat, with one component governing | ||
+ | all the other components of the MWE with dependencies labeled <MWE_LABEL>. | ||
+ | If provided, the additional information of the part-of-speech of the MWE | ||
+ | is expected to be given as value of a <MWE_POS_FEAT> feature, on the head token | ||
+ | of the MWE. | ||
+ | |||
+ | OUTPUT: | ||
+ | |||
+ | The script outputs in any case two evaluations, and possibly a third one : | ||
+ | |||
+ | - precision/recall/Fmeas on components of MWEs (excluding heads of MWEs) | ||
+ | A component of MWE is counted as correct if it is attached to the same | ||
+ | token as in the gold file, with label <MWE_LABEL> | ||
+ | |||
+ | - precision/recall/Fmeas on full MWEs | ||
+ | A MWE is counted as correct if its sequence of tokens also forms | ||
+ | a MWE in gold file | ||
+ | |||
+ | - if both the gold file and the system files do contain at least one <MWE_POS_FEAT> feature, | ||
+ | then a third evaluation is also provided, which uses a stricter criteria | ||
+ | for full MWEs : they have to be composed of the same tokens as in gold file AND the gold | ||
+ | and predicted part-of-speech for the MWE have to match. | ||
+ | |||
+ | |||
+ | USAGE: perl do_eval_dep_mwe.pl [OPTIONS] -g <gold standard conll> -s <system output conll> | ||
+ | [ -mwe_label <MWE_LABEL> ] label used for components of MWEs. Default = dep_cpd | ||
+ | [ -mwe_pos_feat <MWE_POS_FEAT> ] use to define the feature name that marks heads of MWEs. Default = mwehead | ||
+ | [ -help ] |