User Tools

Site Tools


frequently_asked_questions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
frequently_asked_questions [2013/07/29 19:31]
dseddah
frequently_asked_questions [2013/08/01 23:57] (current)
dseddah
Line 121: Line 121:
 Details: \\ Details: \\
 Actually, some languages (Hebrew and Arabic) do not come with gold tokenization + predicted morphology, so they'​re only available with predicted tokenization + predicted morphology or gold tokenization+gold morphology (so no task 2 for those two). <​del>​In addition, providing gold tokenization + predicted morphology would ruin the  Actually, some languages (Hebrew and Arabic) do not come with gold tokenization + predicted morphology, so they'​re only available with predicted tokenization + predicted morphology or gold tokenization+gold morphology (so no task 2 for those two). <​del>​In addition, providing gold tokenization + predicted morphology would ruin the 
-task 3 as the crucial missing token info will be available).</​del>​ We received comments on the difficulty of dealing with the tokenization of semitic languages. Of course, we are aware of that point, that was even+task 3 as the crucial missing token info will be available).</​del> ​**We received comments on the difficulty of dealing with the tokenization of semitic languages. Of course, we are aware of that point, that was even
 one of our main concerns about this shared task: the entry cost for "​newcomers"​ could be quite high.  one of our main concerns about this shared task: the entry cost for "​newcomers"​ could be quite high. 
 Nevertheless,​ we talked about this at length and decided that having the possibility to compare gold token+pred morph   vs pred token + pred morph and  investigate the bottlenecks more closely is more important than trying to avoid  at all costs any potential, though very unlikely, misplaced curiosity on the test data.\\ Nevertheless,​ we talked about this at length and decided that having the possibility to compare gold token+pred morph   vs pred token + pred morph and  investigate the bottlenecks more closely is more important than trying to avoid  at all costs any potential, though very unlikely, misplaced curiosity on the test data.\\
Line 127: Line 127:
 So, this is why teams that would prefer to submit results on Arabic and Hebrew with gold token and pred morph. are now allowed to do it, with the restriction that they must So, this is why teams that would prefer to submit results on Arabic and Hebrew with gold token and pred morph. are now allowed to do it, with the restriction that they must
 also submit results on the pred token+pred morph data. The idea here is to see how those models compare within those those two scenarios.\\ also submit results on the pred token+pred morph data. The idea here is to see how those models compare within those those two scenarios.\\
-\\+**\\
 For Hebrew, the gold token+pred morph data set is described in the README.spmrl file (using morphette (chrupala et al, 2008) trained on  FORM \t CPOS+FEAT (no lemma available)\\ For Hebrew, the gold token+pred morph data set is described in the README.spmrl file (using morphette (chrupala et al, 2008) trained on  FORM \t CPOS+FEAT (no lemma available)\\
 For Arabic, we trained morephette directly on a subset of the feature field (the atbpos=feature,​ namely the original treebank Bulkwater tagset) then we converted it to the CATIB tagset (CPOS)\\ For Arabic, we trained morephette directly on a subset of the feature field (the atbpos=feature,​ namely the original treebank Bulkwater tagset) then we converted it to the CATIB tagset (CPOS)\\
frequently_asked_questions.txt ยท Last modified: 2013/08/01 23:57 by dseddah