In R text2vec package - LDA model can show the topic distribution for each tokens in document? -

April 15, 2013

library (text2vec) library (parallel) library (doparallel)  n <- parallel::detectcores() cl <- makecluster (n) registerdoparallel (cl) ky_young <- read.csv("./ky_young.csv")  <- itoken_parallel (ky_young$textinfo,                        ids          = ky_young$id,                        tokenizer    = word_tokenizer,                        progressbar  = f)  ##stopword stop_words = readlines("./stopwrd1.txt", encoding="utf-8")  vocab <- create_vocabulary (         it, stopwords = stop_words         ngram = c(1, 1)) %>%         prune_vocabulary (term_count_min = 5)   vocab.order <- vocab[order((vocab$term_count), decreasing = t),]  vectorizer <- vocab_vectorizer (vocab)  dtm <- create_dtm (it, vectorizer, distributed = f)   lda_model <-        latentdirichletallocation$new (n_topics         = 200,                                      #vocabulary       = vocab, <= error                                      doc_topic_prior  = 0.1,                                        topic_word_prior = 0.01)    ##topic-document distribution lda_fit <- lda_model$fit_transform (         x = dtm,          n_iter = 50,          convergence_tol = -1,          n_check_convergence = 10)  #topic-word distribution topic_word_prior = lda_model$topic_word_distribution

i create test lda code in text2vec, , can word-topic distribution , document-topic distribution. (and crazy fast)

by way, wondering possible topic distribution each tokens in document text2vec's lda model?

i understand lda analysis process result each tokens in document belong specific topics, , each document has topics distribution.

if can each token's topic distribution, check each topic's top word changes classfified documents(like period). possible?

if there way, grateful let me know.

unfortunately impossible distribution of topics each token in given document. document-topic counts calculated/aggregated "on fly", document-token-topic distribution not stored anywhere.

Search This Blog

Enable

In R text2vec package - LDA model can show the topic distribution for each tokens in document? -

Comments

Post a Comment

Popular posts from this blog

Sort a complex associative array in PHP -

vb.net - How to ignore if a cell is empty nothing -

How to restore default keyboard shortcuts on Ubuntu-17.04? -