tvecs.model_generator package

Submodules

tvecs.model_generator.model_generator module

Used to generate Word2Vec Models for individual languages after preprocessing.

  • Preprocessing Corpus - Implementation of BasePreprocessor module
    • HcCorpusPreprocessor

  • Word2Vec Model Building
    • Gensim Word2Vec SkipGram implementation

tvecs.model_generator.model_generator.construct_model(preprocessed_corpus, language, output_dir_path='.', output_fname=None, iterations=5)[source]

Construct Model given the preprocessed corpus.

API Documentation:
param preprocessed_corpus

Instance of SubClass of BasePreprocessor.

type preprocessed_corpus

Any class that inherits from tvecs.preprocessor.base_preprocessor

param language

Language for which model is generated.

type language

String

param output_dir_path

Output Dir Path where model is stored. [ Default Current Directory ]

type output_dir_path

String

param output_fname

Output file name set.

type output_fname

String

param iterations

Number of iterations for Word2Vec. [ Default value 5 ]

type iterations

Integer

return

Constructed Model based on the provided specifications.

rtype

gensim.models.Word2Vec

See also

tvecs.model_generator.model_generator.generate_model(preprocessor_type, language, corpus_fname, corpus_dir_path='.', output_fname=None, output_dir_path='data/models', need_preprocessing=True, iterations=5)[source]

Function used to preprocess and generate models.

API Documentation
param preprocessor_type

Class Name for preprocessor.

type preprocessor_type

String

param language

Language for which model is generated.

type language

String

param corpus_fname

Corpus Filename

type corpus_fname

String

param corpus_dir_path

Directory Path where corpus exists. [ Default Current Directory ]

type corpus_dir_path

String

param output_dir_path

Output Dir Path where model is stored

type output_dir_path

String

param output_fname

Output filename to be generated.

type output_fname

String

param need_preprocessing

Runs Preprocess with the same flag. [ Default True ]

type need_preprocessing

Boolean

param iterations

Number of iterations for Word2Vec. [ Default value 5 ]

type iterations

Integer

return

Constructed Model based on the provided specifications.

rtype

gensim.models.Word2Vec

Module contents