User Tools

Site Tools


shared_task_description

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
shared_task_description [2013/06/05 04:27]
skuebler
shared_task_description [2015/02/20 19:10] (current)
dseddah
Line 16: Line 16:
  
   - constituent structure ​     - constituent structure ​  
-  - dependency structure ​(in conll07 format)+  - dependency structure ​
  
 Constituent structures are available in two formats: an extended PTB bracketed style (eg Penn Treebank with morphological features expressed at the POS or non terminal levels, see below) and, if available, the Tiger 2 format. The latter has the possibility to represent trees with crossing branches, allowing the use of more powerful parsing models (in term of expressivity) than pure PCFG-based parsers. Constituent structures are available in two formats: an extended PTB bracketed style (eg Penn Treebank with morphological features expressed at the POS or non terminal levels, see below) and, if available, the Tiger 2 format. The latter has the possibility to represent trees with crossing branches, allowing the use of more powerful parsing models (in term of expressivity) than pure PCFG-based parsers.
Line 25: Line 25:
  
  
-Participants can choose either one of those frameworks, or both, or one by conversion from the other).+Participants can choose either one of those frameworks, or both, or one by conversion from the other.
  
 === Input scenarios === === Input scenarios ===
Line 66: Line 66:
   * In Hebrew, where several (syntactically important) morphemes create a word, we will have something like the following: [[http://​cl.indiana.edu/​~skuebler/​hebrewIn.pdf|hebrewIn.pdf]]   * In Hebrew, where several (syntactically important) morphemes create a word, we will have something like the following: [[http://​cl.indiana.edu/​~skuebler/​hebrewIn.pdf|hebrewIn.pdf]]
  
-  * Note that if one wants to deliver a lattice in which segmentation is ambiguous, they can do so by adding lines for alternative spans or alternative tags of spans. These lines need not be sorted. See the (real-world) example segmentation lattice here: [[http://​cl.indiana.edu/​~skuebler/​multi.pdf|multi.pdf]] or the german ​morphology lattice file (predicted from the SMOR analyser):+  * Note that if one wants to deliver a lattice in which segmentation is ambiguous, they can do so by adding lines for alternative spans or alternative tags of spans. These lines need not be sorted. See the (real-world) example segmentation lattice here: [[http://​cl.indiana.edu/​~skuebler/​multi.pdf|multi.pdf]] ​ 
 + 
 +  * or the German ​morphology lattice file (predicted from the SMOR analyser):
  
   * 0       ​1 ​      ​Der ​    ​PRELS ​  ​gender=fem|case=dat|number=sg| ​ 1   * 0       ​1 ​      ​Der ​    ​PRELS ​  ​gender=fem|case=dat|number=sg| ​ 1
Line 75: Line 77:
    
 The format of Form/​Lemma/​CPos/​FPos/​Feats is the exact same as in the CoNLL format, including vertical bars  The format of Form/​Lemma/​CPos/​FPos/​Feats is the exact same as in the CoNLL format, including vertical bars 
-separating morphemes, and = separate feature values.+separating morphemes, and = separate feature values. The only additional value in addition to the CoNLL ones is the original token ID in the last column.
  
  
Line 103: Line 105:
 === Evaluating All Scenarios === === Evaluating All Scenarios ===
  
-  * For constituent evaluation on gold word segmentation of bracketed output (eg. PTB), we will use parseval: [[http://nlp.cs.nyu.edu/evalb/]]. **(Note add a link to evalb_lcrfs.py (W.Maier). How to formulate : "for output coming from parsing models trained ​on crossing branches treesthey'll need to be be converted to the negra format"​)  +  * For constituent evaluation on gold word segmentation of bracketed output (eg. PTB), we will use a modified version of Parseval'​s evalb: [[http://pauillac.inria.fr/~seddah/evalb_spmrl2013.tar.gz]]. Add -fPIC to gcc to compile ​for Linux. 
-** +**update *February 2014: the evalb package that was available ​on Djame'​s site was not the correct one. if your version doesn'​t have the -X switchit'the buggy one.*
 + 
 +  ​
   * For dependency evaluation on gold word segmentation,​ we will use the CoNLL 2007 evaluation: [[http://​nextens.uvt.nl/​depparse-wiki/​SoftwarePage#​eval07.pl]]   * For dependency evaluation on gold word segmentation,​ we will use the CoNLL 2007 evaluation: [[http://​nextens.uvt.nl/​depparse-wiki/​SoftwarePage#​eval07.pl]]
  
-  * For the fully raw scenario, we will use tedeval in the unlabeled condition: ([[http://​www.tsarfaty.com/​unipar/​index.html]])+  ​* For output from parsing models using crossing branches trees, they will need to be be converted to the negra format: [[http://​cl.indiana.edu/​~skuebler/​exformat3.pdf|exformat.pdf]]. For this scenario, we will use Wolfgang Maier'​s evalb-lcfrs:​ [[http://​wolfgang-maier.net/​evalb-lcfrs]]. 
 + 
 + 
 +  ​* For the fully raw scenario, we will use tedeval in the unlabeled condition: ([[http://​www.tsarfaty.com/​unipar/​index.html]]) ​wrapper here: {{:​dldata:​tedeval_wrapper_08192013.tar.bz2}} 
 + 
 +  * French MWE Evaluation  
 +On top of classical evalb and eval07.pl evaluation, we will also provide results on multiword expression. 
 +Thanks to Marie Candito, the evaluator for dependencies output is provided on tools. (see test/​tools/​do_eval_dep_mwe.pl) 
 +In the very next days, we'll provide the same script for mwe eval of constituency parses, however here's the readme of the current tool.  
 + 
 +SPMRL 2013 shared task dependency evaluation script for French. 
 + 
 +EXPECTED FORMAT for marking MWEs: 
 + 
 +   The script supposes that all MWEs are flat, with one component governing 
 +   all the other components of the MWE with dependencies labeled <​MWE_LABEL>​. 
 +   If provided, the additional information of the part-of-speech of the MWE 
 +   is expected to be given as value of a <​MWE_POS_FEAT>​ feature, on the head token 
 +   of the MWE. 
 + 
 +OUTPUT:  
 + 
 +   The script outputs in any case two evaluations,​ and possibly a third one : 
 + 
 +   - precision/​recall/​Fmeas on components of MWEs (excluding heads of MWEs) 
 +     A component of MWE is counted as correct if it is attached to the same  
 +     token as in the gold file, with label <​MWE_LABEL>​ 
 + 
 +   - precision/​recall/​Fmeas on full MWEs 
 +     A MWE is counted as correct if its sequence of tokens also forms  
 +     a MWE in gold file 
 + 
 +   - if both the gold file and the system files do contain at least one <​MWE_POS_FEAT>​ feature, 
 +     then a third evaluation is also provided, which uses a stricter criteria 
 +     for full MWEs : they have to be composed of the same tokens as in gold file AND the gold 
 +     and predicted part-of-speech for the MWE have to match. 
 + 
 + 
 +USAGE: perl do_eval_dep_mwe.pl [OPTIONS] -g <gold standard conll> -s <system output conll>
  
 +   [ -mwe_label <​MWE_LABEL>​ ] label used for components of MWEs. Default = dep_cpd
 +   [ -mwe_pos_feat <​MWE_POS_FEAT>​ ] use to define the feature name that marks heads of MWEs. Default = mwehead
 +   [ -help ] 
shared_task_description.1370399224.txt.gz · Last modified: 2013/06/05 04:27 by skuebler