User Tools

Site Tools


shared_task_description

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
shared_task_description [2013/06/05 04:23]
skuebler
shared_task_description [2015/01/07 15:34]
dseddah
Line 16: Line 16:
  
   - constituent structure      - constituent structure   
-  - dependency structure (in conll07 format)+  - dependency structure 
  
 Constituent structures are available in two formats: an extended PTB bracketed style (eg Penn Treebank with morphological features expressed at the POS or non terminal levels, see below) and, if available, the Tiger 2 format. The latter has the possibility to represent trees with crossing branches, allowing the use of more powerful parsing models (in term of expressivity) than pure PCFG-based parsers. Constituent structures are available in two formats: an extended PTB bracketed style (eg Penn Treebank with morphological features expressed at the POS or non terminal levels, see below) and, if available, the Tiger 2 format. The latter has the possibility to represent trees with crossing branches, allowing the use of more powerful parsing models (in term of expressivity) than pure PCFG-based parsers.
Line 25: Line 25:
  
  
-Participants can choose either one of those frameworks, or both, or one by conversion from the other).+Participants can choose either one of those frameworks, or both, or one by conversion from the other.
  
 === Input scenarios === === Input scenarios ===
Line 46: Line 46:
  
  
-Our data set contains treebanks with different size (from  6k to 50k sentences). So in order to favor a fair comparison between treebank/parsing model pairs, we also provide  training sets with a common size of 5000 sentences. Participants should then also provide results from parsing models trained on the small data set.+Our data set contains treebanks with different size (from  6k to 50k sentences). So in order to allow for a fair comparison between treebank/parsing model pairs, we also provide  training sets with a common size of 5000 sentences. Participants should thus also provide results from parsing models trained on the small data set.
  
  
Line 59: Line 59:
 ---- ----
  
-The input format is a variant of the CoNLL format for dependencies. This is necessary to represent word segmentation issues and easily allows to include morphological features and alternative analysis. +The input format is a variant of the CoNLL format for dependencies. This is necessary to represent word segmentation issues and easily allow to include morphological features and alternative analysis. 
-We mark the beginning and the end of words, that does not have to correspond to what we call tokens, which can consist of more than one word. +We mark the beginning and the end of words, which do not have to correspond to what we call tokens, which can consist of more than one word. 
  
   * This file shows an English example: [[http://cl.indiana.edu/~skuebler/englishIn.pdf|englishIn.pdf]]   * This file shows an English example: [[http://cl.indiana.edu/~skuebler/englishIn.pdf|englishIn.pdf]]
  
-  * In Hebrew, where several morphemes create a word, we will get something like the following: [[http://cl.indiana.edu/~skuebler/hebrewIn.pdf|hebrewIn.pdf]]+  * In Hebrew, where several (syntactically important) morphemes create a word, we will have something like the following: [[http://cl.indiana.edu/~skuebler/hebrewIn.pdf|hebrewIn.pdf]]
  
-  * Note that if one wants to deliver a lattice in which segmentation is ambiguous, they can do so by adding lines for alternative spans or alternative tags of spans. These lines need not be sorted. See the (real-world) example segmentation lattice here: [[http://cl.indiana.edu/~skuebler/multi.pdf|multi.pdf]] or the german morphology lattice file (predicted from the SMOR analyser):+  * Note that if one wants to deliver a lattice in which segmentation is ambiguous, they can do so by adding lines for alternative spans or alternative tags of spans. These lines need not be sorted. See the (real-world) example segmentation lattice here: [[http://cl.indiana.edu/~skuebler/multi.pdf|multi.pdf]]  
 + 
 +  * or the German morphology lattice file (predicted from the SMOR analyser):
  
   * 0             Der     PRELS   gender=fem|case=dat|number=sg|  1   * 0             Der     PRELS   gender=fem|case=dat|number=sg|  1
Line 75: Line 77:
    
 The format of Form/Lemma/CPos/FPos/Feats is the exact same as in the CoNLL format, including vertical bars  The format of Form/Lemma/CPos/FPos/Feats is the exact same as in the CoNLL format, including vertical bars 
-separating morphemes, and = separate feature values.+separating morphemes, and = separate feature values. The only additional value in addition to the CoNLL ones is the original token ID in the last column.
  
  
Line 103: Line 105:
 === Evaluating All Scenarios === === Evaluating All Scenarios ===
  
-  * For constituent evaluation on gold word segmentation of bracketed output (eg. PTB), we will use parseval: [[http://nlp.cs.nyu.edu/evalb/]]. **(Note add a link to evalb_lcrfs.py (W.Maier)How to formulate : "for output coming from parsing models trained on crossing branches trees, they'll need to be be converted to the negra format" +  * For constituent evaluation on gold word segmentation of bracketed output (eg. PTB), we will use a modified version of Parseval's evalb: [[http://pauillac.inria.fr/~seddah/evalb_spmrl2013.tar.gz]] or {{:dldata:evalb_spmrl2013.tar.gz}}Add -fPIC to gcc to compile for Linux. 
-*+  
   * For dependency evaluation on gold word segmentation, we will use the CoNLL 2007 evaluation: [[http://nextens.uvt.nl/depparse-wiki/SoftwarePage#eval07.pl]]   * For dependency evaluation on gold word segmentation, we will use the CoNLL 2007 evaluation: [[http://nextens.uvt.nl/depparse-wiki/SoftwarePage#eval07.pl]]
  
-  * For the fully raw scenario, we will use tedeval in the unlabeled condition: ([[http://www.tsarfaty.com/unipar/index.html]])+  * For output from parsing models using crossing branches trees, they will need to be be converted to the negra format: [[http://cl.indiana.edu/~skuebler/exformat3.pdf|exformat.pdf]]. For this scenario, we will use Wolfgang Maier's evalb-lcfrs: [[http://wolfgang-maier.net/evalb-lcfrs]]. 
 + 
 + 
 +  * For the fully raw scenario, we will use tedeval in the unlabeled condition: ([[http://www.tsarfaty.com/unipar/index.html]]) wrapper here: {{:dldata:tedeval_wrapper_08192013.tar.bz2}} 
 + 
 +  * French MWE Evaluation  
 +On top of classical evalb and eval07.pl evaluation, we will also provide results on multiword expression. 
 +Thanks to Marie Candito, the evaluator for dependencies output is provided on tools. (see test/tools/do_eval_dep_mwe.pl) 
 +In the very next days, we'll provide the same script for mwe eval of constituency parses, however here's the readme of the current tool.  
 + 
 +SPMRL 2013 shared task dependency evaluation script for French. 
 + 
 +EXPECTED FORMAT for marking MWEs: 
 + 
 +   The script supposes that all MWEs are flat, with one component governing 
 +   all the other components of the MWE with dependencies labeled <MWE_LABEL>
 +   If provided, the additional information of the part-of-speech of the MWE 
 +   is expected to be given as value of a <MWE_POS_FEAT> feature, on the head token 
 +   of the MWE. 
 + 
 +OUTPUT:  
 + 
 +   The script outputs in any case two evaluations, and possibly a third one : 
 + 
 +   - precision/recall/Fmeas on components of MWEs (excluding heads of MWEs) 
 +     A component of MWE is counted as correct if it is attached to the same  
 +     token as in the gold file, with label <MWE_LABEL> 
 + 
 +   - precision/recall/Fmeas on full MWEs 
 +     A MWE is counted as correct if its sequence of tokens also forms  
 +     a MWE in gold file 
 + 
 +   - if both the gold file and the system files do contain at least one <MWE_POS_FEAT> feature, 
 +     then a third evaluation is also provided, which uses a stricter criteria 
 +     for full MWEs : they have to be composed of the same tokens as in gold file AND the gold 
 +     and predicted part-of-speech for the MWE have to match. 
 + 
 + 
 +USAGE: perl do_eval_dep_mwe.pl [OPTIONS] -g <gold standard conll> -s <system output conll>
  
 +   [ -mwe_label <MWE_LABEL> ] label used for components of MWEs. Default = dep_cpd
 +   [ -mwe_pos_feat <MWE_POS_FEAT> ] use to define the feature name that marks heads of MWEs. Default = mwehead
 +   [ -help ] 
shared_task_description.txt · Last modified: 2015/02/20 19:10 by dseddah