tvecs.bilingual_generator package

Submodules

tvecs.bilingual_generator.bilingual_generator module

Module used to generate bilingual dictionary.

tvecs.bilingual_generator.bilingual_generator.build_sparse_bilingual_dictionary(bilingual_dictionary_path, model, encoding='utf-8', output_path='data/bilingual_dictionary', output_fname='sparse_bd', topn=5000, sample_size=1)[source]

Create Sparse Bilingual Dictionary.

  • Cluster pre-existing Bilingual Dictionary and sample from the same.

API Documentation
param bilingual_dictionary_path

Path for Bilingual Dictionary.

param model

Word2Vec Model for obtaining vectors.

param encoding

Encoding of the bilingual dictionary.

param output_fname

Output Filename for sparse bilingual dictionary.

param output_path

Output file path for bilingual dictionary.

param topn

Number of words considered from bilingual dictionary.

param sample_size

Number of samples from each cluster.

type bilingual_dictionary_path

String

type encoding

String

type model

‘mod’gensim.models.Word2Vec.

type output_fname

String

type output_path

String

type topn

Integer

type sample_size

Integer

See also

  • tvecs.bilingual_generator.clustering

tvecs.bilingual_generator.bilingual_generator.load_bilingual_dictionary(bilingual_dictionary_path, encoding='utf-8')[source]

Load bilingual dictionary from the specified bilingual_dictionary_path.

API Documentation
param bilingual_dictionary_path

Path for Bilingual Dictionary.

param encoding

Encoding of the bilingual dictionary.

type bilingual_dictionary_path

String

type encoding

String

return

Bilingual Dictionary loaded.

rtype

List

tvecs.bilingual_generator.cluster module

Test.

tvecs.bilingual_generator.cluster.build_clusters(entire_word_list, model, damping_factor=0.5)[source]

Cluster word_list using Affinity Propagation.

  • Clustering based on the vectors from the Word2Vec model.

API Documentation:
param entire_word_list

Word List provided to cluster.

param model

Model to obtain the vectors for the word_list.

param damping_factor

Damping factor for the affinity propagation.

type entire_word_list

List

type model

gensim.models.Word2Vec

type damping_factor

Float

tvecs.bilingual_generator.cluster.write_clusters(word_list, model, encoding='utf-8', output_path='.', output_fname='clusters.json')[source]

Write Clusters to the specified file as JSON.

API Documentation:
param word_list

Word List provided to cluster.

param model

Model to obtain the vectors for the word_list.

param encoding

Encoding of the file written.

param output_fname

Filename of the output file.

param output_path

File path of the output file.

type word_list

List

type model

gensim.models.Word2Vec

type encoding

String

type output_fname

String

type output_path

String

Module contents