Phrase2vec

This is an extension of word2vec to learn n-gram (phrase) embeddings as described in the following paper (Section 3.1):

Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. Unsupervised Statistical Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).

If you use this software for academic research, please cite the paper in question:

@inproceedings{artetxe2018emnlp,
  author    = {Artetxe, Mikel  and  Labaka, Gorka  and  Agirre, Eneko},
  title     = {Unsupervised Statistical Machine Translation},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
  month     = {November},
  year      = {2018},
  address   = {Brussels, Belgium},
  publisher = {Association for Computational Linguistics}
}

Usage is equivalent to word2vec, with the addition of an optional parameter --phrases <file> to specify the set of phrases (one per line) to learn embeddings for. For best results, we recommend disabling subsampling (i.e. --sample 0). Here is an example call with the hyperparameters used in our experiments:

./word2vec -cbow 0 -hs 0 -sample 0 -size 300 -window 5 -negative 10 -iter 5 \
           -train CORPUS.TXT \
           -phrases PHRASES.TXT \
           -output OUTPUT.TXT

For more details on word2vec, please refer to the original README at README-original.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
LICENSE		LICENSE
README-original.txt		README-original.txt
README.md		README.md
compute-accuracy.c		compute-accuracy.c
demo-analogy.sh		demo-analogy.sh
demo-classes.sh		demo-classes.sh
demo-phrase-accuracy.sh		demo-phrase-accuracy.sh
demo-phrases.sh		demo-phrases.sh
demo-train-big-model-v1.sh		demo-train-big-model-v1.sh
demo-word-accuracy.sh		demo-word-accuracy.sh
demo-word.sh		demo-word.sh
distance.c		distance.c
makefile		makefile
questions-phrases.txt		questions-phrases.txt
questions-words.txt		questions-words.txt
word-analogy.c		word-analogy.c
word2phrase.c		word2phrase.c
word2vec.c		word2vec.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phrase2vec

About

Releases

Packages

Languages

License

fyjgreatlion/phrase2vec

Folders and files

Latest commit

History

Repository files navigation

Phrase2vec

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages