ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • 사전 기반의 감성분석(Sentiment Analysis)
    Python 2020. 3. 3. 14:37

    # 패키지 설치하기
    !pip install modin
    !pip install afinn

     

    # 패키지 로딩하기
    import modin.pandas as pd
    import numpy as np
    import glob
    import matplotlib.pyplot as plt
    from afinn import Afinn
    from nltk.stem.porter import PorterStemmer
    from nltk.corpus import stopwords
    from nltk.tokenize import RegexpTokenizer

     

     

    # 데이터 읽어오기
    pos_review = glob.glob("d:/deeplearning/textmining/pos/*.txt")[20]
    pos_file = open(pos_review, "r")
    pos_lines1 = pos_file.readlines()[0]
    pos_file.close()

     

    neg_review = glob.glob("d:/deeplearning/textmining/neg/*.txt")[20]
    neg_file = open(neg_review, "r")
    neg_lines1 = neg_file.readlines()[0]
    neg_file.close()

     

     

    # Afinn 감성사전 생성하기
    afinn = Afinn()

     

     

    # Afinn 감성사전으로 긍정 평가
    afinn.score(pos_lines1)

     

     

    # Afinn 감성사전으로 부정 평가
    afinn.score(neg_lines1)

     

     

    # EmoLex 감성사전 생성하기
    ncr = pd.read_table("d:/deeplearning/textmining/NRC.txt", engine = "python", header = None, sep = "\t")
    ncr = ncr[(ncr != 0).all(1)]
    ncr = ncr.reset_index(drop = True)

     

     

    # 형태소 분석기
    tokenizer = RegexpTokenizer("[\w]+")
    stop_words = stopwords.words("english")
    p_stemmer = PorterStemmer()

     

     

    # 긍정 평가
    pos_raw = pos_lines1.lower()
    pos_tokens = tokenizer.tokenize(pos_raw)
    stopped_pos_tokens = [i for i in pos_tokens if not i in stop_words]
    match_pos_words = [x for x in stopped_pos_tokens if x in list(ncr[0])]


    pos_emotions = []
    for i in match_pos_words:
       temp = list(ncr.iloc[np.where(ncr[0] == i)[0], 1])
       for j in temp:
          pos_emotions.append(j)

     

    pos_sentiment_result1 = pd.Series(pos_emotions).value_counts()
    pos_sentiment_result1.plot.bar()
    plt.show()

     

    # 부정 평가
    neg_raw = neg_lines1.lower()
    neg_tokens = tokenizer.tokenize(neg_raw)
    neg_stopped_tokens = [i for i in neg_tokens if not i in stop_words]
    neg_match_words = [x for x in neg_stopped_tokens if x in list(ncr[0])]

     

    neg_emotions = []
    for i in neg_match_words:
       temp = list(ncr.iloc[np.where(ncr[0] == i)[0], 1])
       for j in temp:
          neg_emotions.append(j)

     

    neg_sentiment_result1 = pd.Series(neg_emotions).value_counts()
    neg_sentiment_result1.plot.bar()
    plt.show()

     

    [출처] 잡아라! 텍스트마이닝 with 파이썬, 서대호 지음, BJ, p100~107

    'Python' 카테고리의 다른 글

    word2vec 기반 연관어 분석  (0) 2020.03.04
    통계적 기반의 연관어 분석  (0) 2020.03.04
    LDA(Latent Dirichlet Allocation)  (0) 2020.03.02
    텍스트 구조적 군집분석  (0) 2020.03.02
    텍스트 군집분석  (0) 2020.03.02
Designed by Tistory.