tvecs.bilingual_generator package¶
Submodules¶
tvecs.bilingual_generator.bilingual_generator module¶
Module used to generate bilingual dictionary.
-
tvecs.bilingual_generator.bilingual_generator.
build_sparse_bilingual_dictionary
(bilingual_dictionary_path, model, encoding='utf-8', output_path='data/bilingual_dictionary', output_fname='sparse_bd', topn=5000, sample_size=1)[source]¶ Create Sparse Bilingual Dictionary.
Cluster pre-existing Bilingual Dictionary and sample from the same.
- API Documentation
- param bilingual_dictionary_path
Path for Bilingual Dictionary.
- param model
Word2Vec Model for obtaining vectors.
- param encoding
Encoding of the bilingual dictionary.
- param output_fname
Output Filename for sparse bilingual dictionary.
- param output_path
Output file path for bilingual dictionary.
- param topn
Number of words considered from bilingual dictionary.
- param sample_size
Number of samples from each cluster.
- type bilingual_dictionary_path
String
- type encoding
String
- type model
‘mod’gensim.models.Word2Vec.
- type output_fname
String
- type output_path
String
- type topn
Integer
- type sample_size
Integer
See also
tvecs.bilingual_generator.clustering
-
tvecs.bilingual_generator.bilingual_generator.
load_bilingual_dictionary
(bilingual_dictionary_path, encoding='utf-8')[source]¶ Load bilingual dictionary from the specified bilingual_dictionary_path.
- API Documentation
- param bilingual_dictionary_path
Path for Bilingual Dictionary.
- param encoding
Encoding of the bilingual dictionary.
- type bilingual_dictionary_path
String
- type encoding
String
- return
Bilingual Dictionary loaded.
- rtype
List
tvecs.bilingual_generator.cluster module¶
Test.
-
tvecs.bilingual_generator.cluster.
build_clusters
(entire_word_list, model, damping_factor=0.5)[source]¶ Cluster word_list using Affinity Propagation.
Clustering based on the vectors from the Word2Vec model.
- API Documentation:
- param entire_word_list
Word List provided to cluster.
- param model
Model to obtain the vectors for the word_list.
- param damping_factor
Damping factor for the affinity propagation.
- type entire_word_list
List
- type model
gensim.models.Word2Vec
- type damping_factor
Float
-
tvecs.bilingual_generator.cluster.
write_clusters
(word_list, model, encoding='utf-8', output_path='.', output_fname='clusters.json')[source]¶ Write Clusters to the specified file as JSON.
- API Documentation:
- param word_list
Word List provided to cluster.
- param model
Model to obtain the vectors for the word_list.
- param encoding
Encoding of the file written.
- param output_fname
Filename of the output file.
- param output_path
File path of the output file.
- type word_list
List
- type model
gensim.models.Word2Vec
- type encoding
String
- type output_fname
String
- type output_path
String