DFM 객체 정돈하기

R/TextMining 2019. 10. 30. 23:37

install.packages("tm")
install.packages("topicmodels")
install.packages("tidyverse")
install.packages("tidytext")
install.packages("quanteda")
install.packages("scales")
library(tm)
library(topicmodels)
library(tidyverse)
library(tidytext)
library(quanteda)
library(scales)

data("data_corpus_inaugural")
data_corpus_inaugural

inaug_dfm <- quanteda::dfm(data_corpus_inaugural, verbose = FALSE)
inaug_dfm

inaug_tidy <- tidytext::tidy(inaug_dfm)
inaug_tidy

inaug_tidy %>%
tidytext::bind_tf_idf(term, document, count) %>%
dplyr::arrange(desc(tf_idf)) -> inaug_tf_idf

inaug_tidy %>%
tidyr::extract(document, "year", "(\\d+)", convert = TRUE) %>%
tidyr::complete(year, term, fill = list(count = 0)) %>%
dplyr::group_by(year) %>%
dplyr::mutate(year_total = sum(count)) -> year_term_counts

year_term_counts %>%
dplyr::filter(term %in% c("god", "america", "foreign", "freedom")) %>%
ggplot2::ggplot(mapping = aes(x = year, y = count / year_total)) +
ggplot2::geom_point() +
ggplot2::geom_smooth() +
ggplot2::facet_wrap(~term, scales = "free_y") +
ggplot2::scale_y_continuous(labels = scales::percent_format()) +
ggplot2::ylab("% frequency of word in inaugural address")

[출처] R로 배우는 텍스트마이닝, 줄리아 실기/데이비드 로빈슨 지음, 박진수 옮김, 제이펍, p89~92

저작자표시 변경금지 (새창열림)

'R > TextMining' 카테고리의 다른 글

문재인 대통령 취임사의 워드 클라우드 (0)	2019.11.08
term-topic probability (0)	2019.11.04
긍정 정서나 부정 정서에 가장 큰 기여를 한 단어들 (0)	2019.10.30
R and BERT (0)	2019.10.29
문재인 대통령 평양 연설문에 대한 Word Cloud 작성하기 (0)	2019.10.24

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

인기포스트

ABOUT ME

buillee buillee

'R > TextMining' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

인기포스트

ABOUT ME

'R > TextMining' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역