tvecs.vector_space_mapper package

Submodules

tvecs.vector_space_mapper.vector_space_mapper module

Module to map two Vector Spaces using a bilingual dictionary.

class tvecs.vector_space_mapper.vector_space_mapper.VectorSpaceMapper(model_1, model_2, bilingual_dict, encoding='utf-8')[source]

Bases: object

Class to map two vector spaces together.

  • Vector spaces obtained using the two Word2Vec models.

  • Bilingual Dict used to map semantic embeddings between vector spaces.

  • Linear Regression utilised for the mapping from

    sklearn.linear_model

API Documentation:
param model_1

Model constructed from Language 1 built using tvecs.model_generator.model_generator.

param model_2

Model constructed from Language 2 built using tvecs.model_generator.model_generator.

param bilingual_dict

Bilingual Dictionary for Language 1, Language 2.

param encoding

Encoding utilised in the corpora

type encoding

String

type model_1

gensim.models.Word2Vec

type model_2

gensim.models.Word2Vec

type bilingual_dict

List[(lang1, lang2), (lang1, lang2)]

See also

get_recommendations_from_vec(vector, topn=10)[source]

Get topn most similar words from model-2 [language 2].

  • Vector for the word in Model 1 [Language 1] should be provided

API Documentation:
param vector

Input a vector from Model 1, recommendations provided from Model 2.

param topn

Number of recommendations to be provided.

type vector

List, numpy.array

type topn

Integer

return

Topn recommendations from Model 2.

rtype

List

get_recommendations_from_word(word, topn=10, pretty_print=False)[source]

Get topn most similar words from model-2 [language 2].

  • Word from Model 1 [Language 1] should be provided

API Documentation:
param word

Input a word from Model 1, recommendations provided from Model 2.

param topn

Number of recommendations to be provided.

param pretty_print

Pretty Print the recommendations correctly.

type pretty_print

Boolean

type word

String expected [ usually unicode preferred ]

type topn

Integer

return

Topn recommendations from Model 2.

rtype

List

map_vector_spaces()[source]

Perform linear regression upon the semantic embeddings.

  • Semantic embeddings obtained from vector space of corresponding

    bilingual words of the same language.

obtain_cosine_similarity(word_1, word_2)[source]

Obtain cosine similarity.

  • Cosine Similarity between word_2 and predicted word using word_1

API Documentation:
param word_1

Used to predict possible vector from Model 2 using word from Model 1.

param word_2

Used for comparison in cosine similarity.

type word_1

String

type word_2

String

return

Cosine similarity between predicted word and actual word.

rtype

Float

obtain_mean_square_error_from_dataset(dataset_path)[source]

Obtain Mean Square Error from bilingual dataset.

API Documentation:
param dataset_path

Path for the test bilingual dictionary.

type dataset_path

String

return

%% of reduction of Mean Square Error after transformation.

rtype

Float

Module contents