SPMRL 2014 software and data

These are the settings, programs and scripts we used for our paper "Parsing Morphologically Rich Languages with (Mostly) Off-The-Shelf Software and Word Vectors". If you have questions regarding the setup, send me an email.

The Word Vectors

The unlabeled data was converted into "one sentence per line" format: cleanup.py.txt (remove the .txt, it't an artifact of this wiki)

The word vectors were created as follows:

./word2vec -train $inputfile -output ${vector}-200-cbow.vectors -cbow 1 -size 200 -window 5 -negative 0 -hs 1 -sample 1e-3 -threads 12 -binary 0
./word2vec -train $inputfile -output ${vector}-200-skipgram-5neg.vectors -cbow 0 -size 200 -window 5 -negative 1 -hs 1 -sample 1e-3 -threads 12 -binary 0

The 400 dimension word vector combinations were created with this script: make_composite_vectors.sh

If you are interested in the word vectors we used, please send me an e-mail. Please note that they are about 4 Gigs in total.

The Parsers

TurboParser was trained as follows:
/path/to/TurboParser/TurboParser -train -file_train $inputfile -file_model $modelfile

The Mate parser as follows:
java -Xmx25000M -cp anna-3.61.jar is2.parser.Parser -i 20 -train train5k.$language.gold.conll9 -model mate-$language.model

RBGParser as follows:
java -classpath "bin:lib/trove.jar" -Xmx35000m parser.DependencyParser  train train-file:../'${language^^}_SPMRL/gold/conll/train5k/train5k.$language.gold.conll model-file:$language-$vector.model thread:8 label:true model:full word-vector:../Unlabeled.$language.pred.-$vector.vectors

The Relabeler

The relabeler was uses megam.

The Lattice Chooser

(remove the .txt, it't an artifact of this wiki)

Topic attachments
I Attachment Action Size Date Who Comment
beam.py.txttxt beam.py.txt manage 9.0 K 10 Nov 2014 - 14:57 ArneKoehn  
cleanup.py.txttxt cleanup.py.txt manage 0.2 K 06 Nov 2014 - 13:09 ArneKoehn  
greedy.py.txttxt greedy.py.txt manage 8.8 K 10 Nov 2014 - 16:13 ArneKoehn  
labeler.plpl labeler.pl manage 7.6 K 07 Nov 2014 - 09:13 ArneKoehn  
make_composite_vectors.shsh make_composite_vectors.sh manage 0.5 K 06 Nov 2014 - 13:08 ArneKoehn  
oracleBest.py.txttxt oracleBest.py.txt manage 5.4 K 10 Nov 2014 - 16:13 ArneKoehn  
pickBest.py.txttxt pickBest.py.txt manage 3.2 K 10 Nov 2014 - 16:13 ArneKoehn  
run_relabeler.shsh run_relabeler.sh manage 0.3 K 06 Nov 2014 - 13:10 ArneKoehn  
trainLR.py.txttxt trainLR.py.txt manage 5.5 K 10 Nov 2014 - 16:12 ArneKoehn  
train_relabeler.shsh train_relabeler.sh manage 0.7 K 06 Nov 2014 - 13:09 ArneKoehn  
This topic: User/ArneKoehn > ArneKoehn > Spmrl14
Topic revision: 10 Nov 2014, ArneKoehn
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback