light_model 추가 및 README 보강

김민수
Commit 70a5847e74206174d6cd9e9efe01c422807a90d3 70a5847e 1 parent 3c341a41
Showing 19 changed files with 1451 additions and 0 deletions
Light_model/.gitignore
Light_model/Dockerfile
Light_model/README.md
Light_model/Styling.py
Light_model/app.js
Light_model/app.py
Light_model/chat.css
Light_model/generation.py
Light_model/light_chatbot.py
Light_model/main.html
Light_model/model.py
Light_model/requirements.txt
Light_model/sorted_model-rough.pth
Light_model/sorted_model-soft.pth
Light_model/static/app.js
Light_model/static/chat.css
Light_model/static/favicon.ico
Light_model/static/main.html
README.md
--- a/Light_model/.gitignore 0 → 100644
View file @70a5847
+++ b/Light_model/.gitignore 0 → 100644
View file @70a5847
+ *.zip
+ venv/
--- a/Light_model/Dockerfile 0 → 100644
View file @70a5847
+++ b/Light_model/Dockerfile 0 → 100644
View file @70a5847
+ FROM ufoym/deepo:pytorch-cpu
+ # https://github.com/Beomi/deepo-nlp/blob/master/Dockerfile
+ # Install JVM for Konlpy
+ RUN apt-get update && \
+     apt-get upgrade -y && \
+     apt-get install -y \
+     openjdk-8-jdk wget curl git python3-dev \
+     language-pack-ko
+ 
+ RUN locale-gen en_US.UTF-8 && \
+     update-locale LANG=en_US.UTF-8
+ 
+ # Install zsh
+ RUN apt-get install -y zsh && \
+     sh -c "$(curl -fsSL https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh)"
+ 
+ # Install another packages
+ RUN pip install --upgrade pip
+ RUN pip install autopep8
+ RUN pip install konlpy
+ RUN pip install torchtext pytorch_pretrained_bert
+ # Install dependency of styling chatbot
+ RUN pip install hgtk chatspace
+ 
+ # Add Mecab-Ko
+ RUN curl -L https://raw.githubusercontent.com/konlpy/konlpy/master/scripts/mecab.sh | bash
+ # install styling chatbot by BM-K
+ RUN git clone https://github.com/km19809/light_model.git
+ RUN pip install -r light_model/requirements.txt
+ 
+ # Add non-root user
+ RUN adduser --disabled-password --gecos "" user
+ 
+ # Reset Workdir
+ WORKDIR /light_model
\ No newline at end of file
--- a/Light_model/README.md 0 → 100644
View file @70a5847
+++ b/Light_model/README.md 0 → 100644
View file @70a5847
+ # Light weight model of styling chatbot
+ 가벼운 모델을 웹호스팅하기 위한 레포지토리입니다.\
+ 원본 레포지토리는 다음과 같습니다.  [바로 가기](https://github.com/km19809/Styling-Chatbot-with-Transformer)
+ 
+ ## 요구사항
+ 
+ 이하의 내용은 개발 중 변경될 수 있으니 requirements.txt를 참고 바랍니다.
+ ```
+ torch~=1.4.0
+ Flask~=1.1.2
+ torchtext~=0.6.0
+ hgtk~=0.1.3
+ konlpy~=0.5.2
+ chatspace~=1.0.1
+ ```
+ 
+ ## 사용법
+ `light_chatbot.py [--train] [--per_soft|--per_rough]`
+ 
+ * train: 학습해 모델을 만들 경우에 사용합니다. \
+ 사용하지 않으면 모델을 불러와 시험 합니다.
+ * per_soft: soft 말투를 학습 또는 시험합니다.\
+ per_rough를 쓴 경우 rough 말투를 학습 또는 시험합니다.\
+ 두 옵션은 양립 불가능합니다.
+ 
+ `app.py`
+ 
+ 챗봇을 시험하기 위한 간단한 플라스크 서버입니다.
\ No newline at end of file
--- a/Light_model/Styling.py 0 → 100644
View file @70a5847
+++ b/Light_model/Styling.py 0 → 100644
View file @70a5847
+ import torch
+ import csv
+ import hgtk
+ from konlpy.tag import Mecab
+ import random
+ 
+ mecab = Mecab()
+ empty_list = []
+ positive_emo = ['ㅎㅎ', '~']
+ negative_emo = ['...', 'ㅠㅠ']
+ asdf = []
+ 
+ 
+ # mecab 을 통한 형태소 분석.
+ def mecab_token_pos_flat_fn(string: str):
+     tokens_ko = mecab.pos(string)
+     return [str(pos[0]) + '/' + str(pos[1]) for pos in tokens_ko]
+ 
+ 
+ # rough 를 위한 함수. 대명사 NP (저, 제) 를 찾아 나 or 내 로 바꿔준다.
+ def exchange_NP(target: str):
+     keyword = []
+     ko_sp = mecab_token_pos_flat_fn(target)
+     _idx = -1  # 실패 시 기본 값
+     for idx, word in enumerate(ko_sp):
+         if word.find('NP') > 0:
+             keyword.append(word.split('/'))
+             _idx = idx
+             break
+     if not keyword:  # keyword 가 비었을 때
+         return '', _idx, False
+ 
+     if keyword[0][0] == '저':
+         keyword[0][0] = '나'
+     elif keyword[0][0] == '제':
+         keyword[0][0] = '내'
+     else:
+         return keyword[0], _idx, False
+ 
+     return keyword[0][0], _idx, True
+ 
+ 
+ # 단어를 soft or rough 말투로 바꾸는 과정
+ def make_special_word(target: str, per_rough: bool, search_ec: bool):
+     # mecab 를 통해 문장을 구분 (example output : ['오늘/MAG', '날씨/NNG', '좋/VA', '다/EF', './SF'])
+     ko_sp = mecab_token_pos_flat_fn(target)
+ 
+     keyword = []
+     _idx = -1  # 실패 시 기본 값
+     # word 에 종결어미 'EF' or 'EC' 가 포함 되어 있을 경우 index 와 keyword 추출.
+     for idx, word in enumerate(ko_sp):
+         if word.find('EF') > 0:
+             keyword.append(word.split('/'))
+             _idx = idx
+             break
+         if search_ec:
+             if ko_sp[-2].find('EC') > 0:
+                 keyword.append(ko_sp[-2].split('/'))
+                 _idx = len(ko_sp) - 1
+                 break
+             else:
+                 continue
+ 
+     # 'EF'가 없을 시 return.
+     if not keyword:
+         return '', _idx
+     else:
+         _keyword = keyword[0]
+ 
+     if per_rough:
+         return _keyword[0], _idx
+ 
+     # hgtk 를 사용하여 keyword 를 쪼갬. (ex output : 하ᴥ세요)
+     h_separation = hgtk.text.decompose(_keyword[0])
+     total_word = ''
+ 
+     for idx, word in enumerate(h_separation):
+         total_word += word
+ 
+     # 'EF' 에 종성 'ㅇ' 를 붙여 Styling
+     total_word = replace_right(total_word, "ᴥ", "ㅇᴥ", 1)
+ 
+     # 다 이어 붙임. ' 하세요 -> 하세용 ' 으로 변환.
+     h_combine = hgtk.text.compose(total_word)
+ 
+     return h_combine, _idx
+ 
+ 
+ # special token 을 만드는 함수
+ def make_special_token(per_rough: bool):
+     # 감정을 나타내기 위한 special token
+     target_special_voca = []
+ 
+     banmal_dict = get_rough_dic()
+ 
+     # train data set 의 chatbot answer 에서 'EF' 를 뽑아 종성 'ㅇ' 을 붙인 special token 생성
+     with open('chatbot_0325_ALLLABEL_train.txt', 'r', encoding='utf-8') as f:
+         rdr = csv.reader(f, delimiter='\t')
+         for idx, line in enumerate(rdr):
+             target = line[2]  # chatbot answer
+             exchange_word, _ = make_special_word(target, per_rough, False)
+             target_special_voca.append(str(exchange_word))
+     target_special_voca = list(set(target_special_voca))
+ 
+     banmal_special_voca = []
+     for i in range(len(target_special_voca)):
+         try:
+             banmal_special_voca.append(banmal_dict[target_special_voca[i]])
+         except KeyError:
+             if per_rough:
+                 print("not include banmal dictionary")
+             pass
+ 
+     # 임의 이모티콘 추가.
+     target_special_voca.append('ㅎㅎ')
+     target_special_voca.append('~')
+     target_special_voca.append('ㅠㅠ')
+     target_special_voca.append('...')
+     target_special_voca = target_special_voca + banmal_special_voca
+ 
+     # '<posi> : positive, <nega> : negative' 를 의미
+     return ['<posi>', '<nega>'], target_special_voca
+ 
+ 
+ # python string 함수 replace 를 오른쪽부터 시작하는 함수.
+ def replace_right(original: str, old: str, new: str, count_right: int):
+     text = original
+ 
+     count_find = original.count(old)
+     # 바꿀 횟수가 문자열에 포함된 old보다 많다면 문자열에 포함된 old의 모든 개수(count_find)만큼 교체한다 아니라면 입력받은 개수(count)만큼 교체한다
+     repeat = count_find if count_right > count_find else count_right
+     for _ in range(repeat):
+         find_index = text.rfind(old)  # 오른쪽부터 index를 찾기위해 rfind 사용
+         text = text[:find_index] + new + text[find_index + 1:]
+ 
+     return text
+ 
+ 
+ # transformer 에 input 과 output 으로 들어갈 tensor Styling 변환.
+ def styling(enc_input, dec_input, dec_output, dec_outputs, enc_label, max_len: int, per_soft: bool, per_rough: bool,  TEXT, LABEL):
+     device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+ 
+     pad_tensor = torch.tensor([LABEL.vocab.stoi['<pad>']]).type(dtype=torch.int32).to(device)
+ 
+     temp_enc = enc_input.data.cpu().numpy()
+     batch_sentiment_list = []
+ 
+     # 부드러운 성격
+     if per_soft:
+         # encoder input : 나는 너를 좋아해 <posi> <pad> <pad> ... - 형식으로 바꿔줌.
+         for i in range(len(temp_enc)):
+             for j in range(max_len):
+                 if temp_enc[i][j] == 1 and enc_label[i] == 0:
+                     temp_enc[i][j] = TEXT.vocab.stoi["<nega>"]
+                     batch_sentiment_list.append(0)
+                     break
+                 elif temp_enc[i][j] == 1 and enc_label[i] == 1:
+                     temp_enc[i][j] = TEXT.vocab.stoi["<posi>"]
+                     batch_sentiment_list.append(1)
+                     break
+ 
+         enc_input = torch.tensor(temp_enc, dtype=torch.int32).to(device)
+ 
+         for i in range(len(dec_outputs)):
+             dec_outputs[i] = torch.cat([dec_output[i], pad_tensor], dim=-1)
+ 
+         temp_dec = dec_outputs.data.cpu().numpy()
+ 
+         dec_outputs_sentiment_list = []  # decoder 에 들어가 감정표현 저장.
+ 
+         # decoder outputs : 저도 좋아용 ㅎㅎ <eos> <pad> <pad> ... - 형식으로 바꿔줌.
+         for i in range(len(temp_dec)):  # i = batch size
+             temp_sentence = ''
+             sa_ = batch_sentiment_list[i]
+             if sa_ == 0:
+                 sa_ = random.choice(negative_emo)
+             elif sa_ == 1:
+                 sa_ = random.choice(positive_emo)
+             dec_outputs_sentiment_list.append(sa_)
+ 
+             for ix, token_i in enumerate(temp_dec[i]):
+                 if LABEL.vocab.itos[token_i] in ['<sos>', '<eos>', '<pad>']:
+                     continue
+                 temp_sentence = temp_sentence + LABEL.vocab.itos[token_i]
+             temp_sentence = temp_sentence + '.'  # 마침표에 유무에 따라 형태소 분석이 달라짐.
+             exchange_word, idx = make_special_word(temp_sentence, per_rough, True)
+ 
+             if exchange_word == '':
+                 for j in range(len(temp_dec[i])):
+                     if temp_dec[i][j] == LABEL.vocab.stoi['<eos>']:
+                         temp_dec[i][j] = LABEL.vocab.stoi[sa_]
+                         temp_dec[i][j + 1] = LABEL.vocab.stoi['<eos>']
+                         break
+                 continue
+ 
+             for j in range(len(temp_dec[i])):
+                 if LABEL.vocab.itos[temp_dec[i][j]] == '<eos>':
+                     temp_dec[i][j - 1] = LABEL.vocab.stoi[exchange_word]
+                     temp_dec[i][j] = LABEL.vocab.stoi[dec_outputs_sentiment_list[i]]
+                     temp_dec[i][j + 1] = LABEL.vocab.stoi['<eos>']
+                     break
+                 elif temp_dec[i][j] != LABEL.vocab.stoi['<eos>'] and j + 1 == len(temp_dec[i]):
+                     print("\t-ERROR- No <EOS> token")
+                     exit()
+ 
+         dec_outputs = torch.tensor(temp_dec, dtype=torch.int32).to(device)
+ 
+         temp_dec_input = dec_input.data.cpu().numpy()
+         # decoder input : <sos> 저도 좋아용 ㅎㅎ <eos> <pad> <pad> ... - 형식으로 바꿔줌.
+         for i in range(len(temp_dec_input)):
+             temp_sentence = ''
+             for ix, token_i in enumerate(temp_dec_input[i]):
+                 if LABEL.vocab.itos[token_i] in ['<sos>', '<eos>', '<pad>']:
+                     continue
+                 temp_sentence = temp_sentence + LABEL.vocab.itos[token_i]
+             temp_sentence = temp_sentence + '.'  # 마침표에 유무에 따라 형태소 분석이 달라짐.
+             exchange_word, idx = make_special_word(temp_sentence, per_rough, True)
+ 
+             if exchange_word == '':
+                 for j in range(len(temp_dec_input[i])):
+                     if temp_dec_input[i][j] == LABEL.vocab.stoi['<eos>']:
+                         temp_dec_input[i][j] = LABEL.vocab.stoi[dec_outputs_sentiment_list[i]]
+                         temp_dec_input[i][j + 1] = LABEL.vocab.stoi['<eos>']
+                         break
+                 continue
+ 
+             for j in range(len(temp_dec_input[i])):
+                 if LABEL.vocab.itos[temp_dec_input[i][j]] == '<eos>':
+                     temp_dec_input[i][j - 1] = LABEL.vocab.stoi[exchange_word]
+                     temp_dec_input[i][j] = LABEL.vocab.stoi[dec_outputs_sentiment_list[i]]
+                     temp_dec_input[i][j + 1] = LABEL.vocab.stoi['<eos>']
+                     break
+                 elif temp_dec_input[i][j] != LABEL.vocab.stoi['<eos>'] and j + 1 == len(temp_dec_input[i]):
+                     print("\t-ERROR- No <EOS> token")
+                     exit()
+ 
+         dec_input = torch.tensor(temp_dec_input, dtype=torch.int32).to(device)
+ 
+     # 거친 성격
+     elif per_rough:
+         banmal_dic = get_rough_dic()
+ 
+         for i in range(len(dec_outputs)):
+             dec_outputs[i] = torch.cat([dec_output[i], pad_tensor], dim=-1)
+ 
+         temp_dec = dec_outputs.data.cpu().numpy()
+ 
+         # decoder outputs : 나도 좋아  <eos> <pad> <pad> ... - 형식으로 바꿔줌.
+         for i in range(len(temp_dec)):  # i = batch size
+             temp_sentence = ''
+             for ix, token_i in enumerate(temp_dec[i]):
+                 if LABEL.vocab.itos[token_i] == '<eos>':
+                     break
+                 temp_sentence = temp_sentence + LABEL.vocab.itos[token_i]
+             temp_sentence = temp_sentence + '.'  # 마침표에 유무에 따라 형태소 분석이 달라짐.
+             exchange_word, idx = make_special_word(temp_sentence, per_rough, True)
+             exchange_NP_word, NP_idx, exist = exchange_NP(temp_sentence)
+ 
+             if exist:
+                 temp_dec[i][NP_idx] = LABEL.vocab.stoi[exchange_NP_word]
+ 
+             if exchange_word == '':
+                 continue
+             try:
+                 exchange_word = banmal_dic[exchange_word]
+             except KeyError:
+                 asdf.append(exchange_word)
+                 print("not include banmal dictionary")
+                 pass
+ 
+             temp_dec[i][idx] = LABEL.vocab.stoi[exchange_word]
+             temp_dec[i][idx + 1] = LABEL.vocab.stoi['<eos>']
+             for k in range(idx + 2, max_len):
+                 temp_dec[i][k] = LABEL.vocab.stoi['<pad>']
+ 
+             # for j in range(len(temp_dec[i])):
+             #     if LABEL.vocab.itos[temp_dec[i][j]]=='<eos>':
+             #         break
+             #     print(LABEL.vocab.itos[temp_dec[i][j]], end='')
+             # print()
+ 
+         dec_outputs = torch.tensor(temp_dec, dtype=torch.int32).to(device)
+ 
+         temp_dec_input = dec_input.data.cpu().numpy()
+         # decoder input : <sos> 나도 좋아 <eos> <pad> <pad> ... - 형식으로 바꿔줌.
+         for i in range(len(temp_dec_input)):
+             temp_sentence = ''
+             for ix, token_i in enumerate(temp_dec_input[i]):
+                 if ix == 0:
+                     continue  # because of token <sos>
+                 if LABEL.vocab.itos[token_i] == '<eos>':
+                     break
+                 temp_sentence = temp_sentence + LABEL.vocab.itos[token_i]
+             temp_sentence = temp_sentence + '.'  # 마침표에 유무에 따라 형태소 분석이 달라짐.
+             exchange_word, idx = make_special_word(temp_sentence, per_rough, True)
+             exchange_NP_word, NP_idx, exist = exchange_NP(temp_sentence)
+             idx = idx + 1  # because of token <sos>
+             NP_idx = NP_idx + 1
+ 
+             if exist:
+                 temp_dec_input[i][NP_idx] = LABEL.vocab.stoi[exchange_NP_word]
+ 
+             if exchange_word == '':
+                 continue
+ 
+             try:
+                 exchange_word = banmal_dic[exchange_word]
+             except KeyError:
+                 print("not include banmal dictionary")
+                 pass
+ 
+             temp_dec_input[i][idx] = LABEL.vocab.stoi[exchange_word]
+             temp_dec_input[i][idx + 1] = LABEL.vocab.stoi['<eos>']
+ 
+             for k in range(idx + 2, max_len):
+                 temp_dec_input[i][k] = LABEL.vocab.stoi['<pad>']
+ 
+             # for j in range(len(temp_dec_input[i])):
+             #     if LABEL.vocab.itos[temp_dec_input[i][j]]=='<eos>':
+             #         break
+             #     print(LABEL.vocab.itos[temp_dec_input[i][j]], end='')
+             # print()
+ 
+         dec_input = torch.tensor(temp_dec_input, dtype=torch.int32).to(device)
+ 
+     return enc_input, dec_input, dec_outputs
+ 
+ 
+ # 반말로 바꾸기위한 딕셔너리
+ def get_rough_dic():
+     my_exword = {
+         '돌아와요': '돌아와',
+         '으세요': '으셈',
+         '잊어버려요': '잊어버려',
+         '나온대요': '나온대',
+         '될까요': '될까',
+         '할텐데': '할텐데',
+         '옵니다': '온다',
+         '봅니다': '본다',
+         '네요': '네',
+         '된답니다': '된대',
+         '데요': '데',
+         '봐요': '봐',
+         '부러워요': '부러워',
+         '바랄게요': '바랄게',
+         '지나갑니다': "지가간다",
+         '이뻐요': "이뻐",
+         '지요': "지",
+         '사세요': "사라",
+         '던가요': "던가",
+         '모릅니다': "몰라",
+         '은가요': "은가",
+         '심해요': "심해",
+         '몰라요': "몰라",
+         '라요': "라",
+         '더라고요': '더라고',
+         '입니다': '이라고',
+         '는다면요': '는다면',
+         '멋져요': '멋져',
+         '다면요': '다면',
+         '다니': '다나',
+         '져요': '져',
+         '만드세요': '만들어',
+         '야죠': '야지',
+         '죠': '지',
+         '해줄게요': '해줄게',
+         '대요': '대',
+         '돌아갑시다': '돌아가자',
+         '해보여요': '해봐',
+         '라뇨': '라니',
+         '편합니다': '편해',
+         '합시다': '하자',
+         '드세요': '먹어',
+         '아름다워요': '아름답네',
+         '드립니다': '줄게',
+         '받아들여요': '받아들여',
+         '건가요': '간기',
+         '쏟아진다': '쏟아지네',
+         '슬퍼요': '슬퍼',
+         '해서요': '해서',
+         '다릅니다': '다르다',
+         '니다': '니',
+         '내려요': '내려',
+         '마셔요': '마셔',
+         '아세요': '아냐',
+         '변해요': '뱐헤',
+         '드려요': '드려',
+         '아요': '아',
+         '어서요': '어서',
+         '뜁니다': '뛴다',
+         '속상해요': '속상해',
+         '래요': '래',
+         '까요': '까',
+         '어야죠': '어야지',
+         '라니': '라니',
+         '해집니다': '해진다',
+         '으련만': '으련만',
+         '지워져요': '지워져',
+         '잘라요': '잘라',
+         '고요': '고',
+         '셔야죠': '셔야지',
+         '다쳐요': '다쳐',
+         '는구나': '는구만',
+         '은데요': '은데',
+         '일까요': '일까',
+         '인가요': '인가',
+         '아닐까요': '아닐까',
+         '텐데요': '텐데',
+         '할게요': '할게',
+         '보입니다': '보이네',
+         '에요': '야',
+         '걸요': '걸',
+         '한답니다': '한대',
+         '을까요': '을까',
+         '못해요': '못해',
+         '베푸세요': '베풀어',
+         '어때요': '어떄',
+         '더라구요': '더라구',
+         '노라': '노라',
+         '반가워요': '반가워',
+         '군요': '군',
+         '만납시다': '만나자',
+         '어떠세요': '어때',
+         '달라져요': '달라져',
+         '예뻐요': '예뻐',
+         '됩니다': '된다',
+         '봅시다': '보자',
+         '한대요': '한대',
+         '싸워요': '싸워',
+         '와요': '와',
+         '인데요': '인데',
+         '야': '야',
+         '줄게요': '줄게',
+         '기에요': '기',
+         '던데요': '던데',
+         '걸까요': '걸까',
+         '신가요': '신가',
+         '어요': '어',
+         '따져요': '따져',
+         '갈게요': '갈게',
+         '봐': '봐',
+         '나요': '나',
+         '니까요': '니까',
+         '마요': '마',
+         '씁니다': '쓴다',
+         '집니다': '진다',
+         '건데요': '건데',
+         '지웁시다': '지우자',
+         '바랍니다': '바래',
+         '는데요': '는데',
+         '으니까요': '으니까',
+         '셔요': '셔',
+         '네여': '네',
+         '달라요': '달라',
+         '거려요': '거려',
+         '보여요': '보여',
+         '겁니다': '껄',
+         '다': '다',
+         '그래요': '그래',
+         '한가요': '한가',
+         '잖아요': '잖아',
+         '한데요': '한데',
+         '우세요': '우셈',
+         '해야죠': '해야지',
+         '세요': '셈',
+         '걸려요': '걸려',
+         '텐데': '텐데',
+         '어딘가': '어딘가',
+         '요': '',
+         '흘러갑니다': '흘러간다',
+         '줘요': '줘',
+         '편해요': '편해',
+         '거예요': '거야',
+         '예요': '야',
+         '습니다': '어',
+         '아닌가요': '아닌가',
+         '합니다': '한다',
+         '사라집니다': '사라져',
+         '드릴게요': '줄게',
+         '다면': '다면',
+         '그럴까요': '그럴까',
+         '해요': '해',
+         '답니다': '다',
+         '주무세요': '자라',
+         '마세요': '마라',
+         '아픈가요': '아프냐',
+         '그런가요': '그런가',
+         '했잖아요': '했잖아',
+         '버려요': '버려',
+         '갑니다': '간다',
+         '가요': '가',
+         '라면요': '라면',
+         '아야죠': '아야지',
+         '살펴봐요': '살펴봐',
+         '남겨요': '남겨',
+         '내려놔요': '내려놔',
+         '떨려요': '떨려',
+         '랍니다': '란다',
+         '돼요': '돼',
+         '버텨요': '버텨',
+         '만나': '만나',
+         '일러요': '일러',
+         '을게요': '을게',
+         '갑시다': '가자',
+         '나아요': '나아',
+         '어려요': '어려',
+         '온대요': '온대',
+         '다고요': '다고',
+         '할래요': '할래',
+         '된대요': '된대',
+         '어울려요': '어울려',
+         '는군요': '는군',
+         '볼까요': '볼까',
+         '드릴까요': '줄까',
+         '라던데요': '라던데',
+         '올게요': '올게',
+         '기뻐요': '기뻐',
+         '아닙니다': '아냐',
+         '둬요': '둬',
+         '십니다': '십',
+         '아파요': '아파',
+         '생겨요': '생겨',
+         '해줘요': '해줘',
+         '로군요': '로군요',
+         '시켜요': '시켜',
+         '느껴져요': '느껴져',
+         '가재요': '가재',
+         '어 ': ' ',
+         '느려요': '느려',
+         '볼게요': '볼게',
+         '쉬워요': '쉬워',
+         '나빠요': '나빠',
+         '불러줄게요': '불러줄게',
+         '살쪄요': '살쪄',
+         '봐야겠어요': '봐야겠어',
+         '네': '네',
+         '어': '어',
+         '든지요': '든지',
+         '드신다': '드심',
+         '가져요': '가져',
+         '할까요': '할까',
+         '졸려요': '졸려',
+         '그럴게요': '그럴게',
+         '': '',
+         '어린가': '어린가',
+         '나와요': '나와',
+         '빨라요': '빨라',
+         '겠죠': '겠지',
+         '졌어요': '졌어',
+         '해봐요': '해봐',
+         '게요': '게',
+         '해드릴까요': '해줄까',
+         '인걸요': '인걸',
+         '했어요': '했어',
+         '원해요': '원해',
+         '는걸요': '는걸',
+         '좋아합니다': '좋아해',
+         '했으면': '했으면',
+         '나갑니다': '나간다',
+         '왔어요': '왔어',
+         '해봅시다': '해보자',
+         '물어봐요': '물어봐',
+         '생겼어요': '생겼어',
+         '해': '해',
+         '다녀올게요': '다녀올게',
+         '납시다': '나자'
+     }
+     return my_exword
\ No newline at end of file
--- a/Light_model/app.js 0 → 100644
View file @70a5847
+++ b/Light_model/app.js 0 → 100644
View file @70a5847
+ function send() {
+     /*client side */
+   var chat = document.createElement("li");
+   var chat_input = document.getElementById("chat_input");
+   var chat_text = chat_input.value;
+   chat.className = "chat-bubble mine";
+   chat.innerText = chat_text
+   document.getElementById("chat_list").appendChild(chat);
+   chat_input.value = "";
+ 
+   /* ajax request */
+   var request = new XMLHttpRequest();
+   request.open("POST", `${window.location.host}/api/soft`, true);
+   request.onreadystatechange = function() {
+     if (request.readyState !== 4 || Math.floor(request.status /100) !==2) return;
+     var bot_chat = document.createElement("li");
+   bot_chat.className = "chat-bubble bots";
+   bot_chat.innerText = JSON.parse(request.responseText).data;
+   document.getElementById("chat_list").appendChild(bot_chat);
+ 
+   };
+   request.setRequestHeader("Content-Type", "application/json;charset=UTF-8");
+ request.send(JSON.stringify({"data":chat_text}));
+ }
+ 
+ function setDefault() {
+   document.getElementById("chat_input").addEventListener("keyup", function(event) {
+     let input = document.getElementById("chat_input").value;
+     let button = document.getElementById("send_button");
+     if(input.length>0)
+     {
+       button.removeAttribute("disabled");
+     }
+     else
+     {
+       button.setAttribute("disabled", "true");
+     }
+     // Number 13 is the "Enter" key on the keyboard
+     if (event.keyCode === 13) {
+       // Cancel the default action, if needed
+       event.preventDefault();
+       // Trigger the button element with a click
+       button.click();
+     }
+   });
+ }
--- a/Light_model/app.py 0 → 100644
View file @70a5847
+++ b/Light_model/app.py 0 → 100644
View file @70a5847
+ from flask import Flask, request, jsonify, send_from_directory
+ import torch
+ from torchtext import data
+ from generation import inference, tokenizer1
+ from Styling import make_special_token
+ from model import Transformer
+ 
+ app = Flask(__name__,
+             static_url_path='', 
+             static_folder='static',)
+ app.config['JSON_AS_ASCII'] = False
+ device = torch.device('cpu')
+ max_len = 40
+ ID = data.Field(sequential=False,
+                 use_vocab=False)
+ SA = data.Field(sequential=False,
+                 use_vocab=False)
+ TEXT = data.Field(sequential=True,
+                   use_vocab=True,
+                   tokenize=tokenizer1,
+                   batch_first=True,
+                   fix_length=max_len,
+                   dtype=torch.int32
+                   )
+ 
+ LABEL = data.Field(sequential=True,
+                    use_vocab=True,
+                    tokenize=tokenizer1,
+                    batch_first=True,
+                    fix_length=max_len,
+                    init_token='<sos>',
+                    eos_token='<eos>',
+                    dtype=torch.int32
+                    )
+ text_specials, label_specials = make_special_token(False)
+ train_data, _ = data.TabularDataset.splits(
+     path='.', train='chatbot_0325_ALLLABEL_train.txt', test='chatbot_0325_ALLLABEL_test.txt', format='tsv',
+     fields=[('id', ID), ('text', TEXT), ('target_text', LABEL), ('SA', SA)], skip_header=True
+ )
+ TEXT.build_vocab(train_data, max_size=15000, specials=text_specials)
+ LABEL.build_vocab(train_data, max_size=15000, specials=label_specials)
+ soft_model = Transformer(160, 2, 2, 0.1, TEXT, LABEL)
+ # rough_model = Transformer(args, TEXT, LABEL)
+ soft_model.to(device)
+ # rough_model.to(device)
+ soft_model.load_state_dict(torch.load('sorted_model-soft.pth', map_location=device)['model_state_dict'])
+ 
+ 
+ # rough_model.load_state_dict(torch.load('sorted_model-rough.pth', map_location=device)['model_state_dict'])
+ 
+ 
+ @app.route('/api/soft', methods=['POST'])
+ def soft():
+     if request.is_json:
+         sentence = request.json["data"]
+         return jsonify({"data": inference(device, max_len, TEXT, LABEL, soft_model, sentence)}), 200
+     else:
+         return jsonify({"data": "잘못된 요청입니다. Bad Request."}), 400
+ 
+ # @app.route('/rough', methods=['POST'])
+ # def rough():
+ #     return inference(device, max_len, TEXT, LABEL, rough_model, ), 200
+ 
+ @app.route('/', methods=['GET'])
+ def main_page():
+     return send_from_directory('static','main.html')
+ 
+ if __name__ == '__main__':
+     app.run(host='0.0.0.0', port=8080)
--- a/Light_model/chat.css 0 → 100644
View file @70a5847
+++ b/Light_model/chat.css 0 → 100644
View file @70a5847
+ ul.no-bullets {
+     list-style-type: none; /* Remove bullets */
+     padding: 0; /* Remove padding */
+     margin: 0; /* Remove margins */
+   }
+ 
+ .chat-bubble {
+   position: relative;
+   padding: 0.5em;
+   margin-top: 0.25em;
+   margin-bottom: 0.25em;
+   border-radius: 0.4em;
+   color: white;
+ }
+ .mine {
+   background: #00aabb;
+ }
+ .bots {
+   background: #cc78c5;
+ }
+ 
+ .chat-bubble:after {
+   content: "";
+   position: absolute;
+   top: 50%;
+   width: 0;
+   height: 0;
+   border: 0.625em solid transparent;
+   border-top: 0;
+   margin-top: -0.312em;
+   
+ }
+ .chat-bubble.mine:after {
+   right: 0;
+ 
+   border-left-color: #00aabb;
+   border-right: 0;
+   margin-right: -0.625em;
+ }
+ 
+ .chat-bubble.bots:after {
+   left: 0;
+ 
+   border-right-color: #cc78c5;
+   border-left: 0;
+   margin-left: -0.625em;
+ }
+ 
+ #chat_input {
+     width: 90%;
+ }
+ 
+ #send_button {
+ 
+     width: 5%;
+     border-radius: 0.4em;
+     color: white;
+     background-color: rgb(15, 145, 138);
+ }
+ 
+ .input-holder {
+     position: fixed;
+     left: 0;
+     right: 0;
+     bottom: 0;
+     padding: 0.25em;
+     background-color: lightseagreen;
+ }
\ No newline at end of file
--- a/Light_model/generation.py 0 → 100644
View file @70a5847
+++ b/Light_model/generation.py 0 → 100644
View file @70a5847
+ import torch
+ from konlpy.tag import Mecab
+ from torch.autograd import Variable
+ from chatspace import ChatSpace
+ 
+ spacer = ChatSpace()
+ 
+ 
+ def tokenizer1(text: str):
+     result_text = ''.join(c for c in text if c.isalnum())
+     a = Mecab().morphs(result_text)
+     return [a[i] for i in range(len(a))]
+ 
+ 
+ def inference(device: torch.device, max_len: int, TEXT, LABEL, model: torch.nn.Module, sentence: str):
+ 
+     enc_input = tokenizer1(sentence)
+     enc_input_index = []
+ 
+     for tok in enc_input:
+         enc_input_index.append(TEXT.vocab.stoi[tok])
+ 
+     for j in range(max_len - len(enc_input_index)):
+         enc_input_index.append(TEXT.vocab.stoi['<pad>'])
+ 
+     enc_input_index = Variable(torch.LongTensor([enc_input_index]))
+ 
+     dec_input = torch.LongTensor([[LABEL.vocab.stoi['<sos>']]])
+ 
+     model.eval()
+     pred = []
+     for i in range(max_len):
+         y_pred = model(enc_input_index.to(device), dec_input.to(device))
+         y_pred_ids = y_pred.max(dim=-1)[1]
+         if y_pred_ids[0, -1] == LABEL.vocab.stoi['<eos>']:
+             y_pred_ids = y_pred_ids.squeeze(0)
+             print(">", end=" ")
+             for idx in range(len(y_pred_ids)):
+                 if LABEL.vocab.itos[y_pred_ids[idx]] == '<eos>':
+                     pred_sentence = "".join(pred)
+                     pred_str = spacer.space(pred_sentence)
+                     return pred_str
+                 else:
+                     pred.append(LABEL.vocab.itos[y_pred_ids[idx]])
+             return 'Error: Sentence is not end'
+ 
+         dec_input = torch.cat(
+             [dec_input.to(torch.device('cpu')),
+              y_pred_ids[0, -1].unsqueeze(0).unsqueeze(0).to(torch.device('cpu'))], dim=-1)
+     return 'Error: Sentence is not predicted'
--- a/Light_model/light_chatbot.py 0 → 100644
View file @70a5847
+++ b/Light_model/light_chatbot.py 0 → 100644
View file @70a5847
+ import argparse
+ import time
+ import torch
+ from torch import nn
+ from torchtext import data
+ from torchtext.data import BucketIterator
+ from torchtext.data import TabularDataset
+ 
+ from Styling import styling, make_special_token
+ from generation import inference, tokenizer1
+ from model import Transformer, GradualWarmupScheduler
+ 
+ SEED = 1234
+ 
+ 
+ 
+ 
+ def acc(yhat: torch.Tensor, y: torch.Tensor):
+     with torch.no_grad():
+         yhat = yhat.max(dim=-1)[1]  # [0]: max value, [1]: index of max value
+         _acc = (yhat == y).float()[y != 1].mean()  # padding은 acc에서 제거
+     return _acc
+ 
+ 
+ def train(model: Transformer, iterator, optimizer, criterion: nn.CrossEntropyLoss, max_len: int, per_soft: bool, per_rough: bool):
+     total_loss = 0
+     iter_num = 0
+     tr_acc = 0
+     model.train()
+ 
+     for step, batch in enumerate(iterator):
+         optimizer.zero_grad()
+ 
+         enc_input, dec_input, enc_label = batch.text, batch.target_text, batch.SA
+         dec_output = dec_input[:, 1:]
+         dec_outputs = torch.zeros(dec_output.size(0), max_len).type_as(dec_input.data)
+ 
+         # emotion 과 체를 반영
+         enc_input, dec_input, dec_outputs = \
+             styling(enc_input, dec_input, dec_output, dec_outputs, enc_label, max_len, per_soft, per_rough, TEXT, LABEL)
+ 
+         y_pred = model(enc_input, dec_input)
+ 
+         y_pred = y_pred.reshape(-1, y_pred.size(-1))
+         dec_output = dec_outputs.view(-1).long()
+ 
+         # padding 제외한 value index 추출
+         real_value_index = [dec_output != 1]  # <pad> == 1
+ 
+         # padding 은 loss 계산시 제외
+         loss = criterion(y_pred[real_value_index], dec_output[real_value_index])
+         loss.backward()
+         optimizer.step()
+ 
+         with torch.no_grad():
+             train_acc = acc(y_pred, dec_output)
+ 
+         total_loss += loss
+         iter_num += 1
+         tr_acc += train_acc
+ 
+     return total_loss.data.cpu().numpy() / iter_num, tr_acc.data.cpu().numpy() / iter_num
+ 
+ 
+ def test(model: Transformer, iterator, criterion: nn.CrossEntropyLoss):
+     total_loss = 0
+     iter_num = 0
+     te_acc = 0
+     model.eval()
+ 
+     with torch.no_grad():
+         for batch in iterator:
+             enc_input, dec_input, enc_label = batch.text, batch.target_text, batch.SA
+             dec_output = dec_input[:, 1:]
+             dec_outputs = torch.zeros(dec_output.size(0), args.max_len).type_as(dec_input.data)
+ 
+             # emotion 과 체를 반영
+             enc_input, dec_input, dec_outputs = \
+                 styling(enc_input, dec_input, dec_output, dec_outputs, enc_label, args.max_len, args.per_soft, args.per_rough, TEXT, LABEL)
+ 
+             y_pred = model(enc_input, dec_input)
+ 
+             y_pred = y_pred.reshape(-1, y_pred.size(-1))
+             dec_output = dec_outputs.view(-1).long()
+ 
+             real_value_index = [dec_output != 1]  # <pad> == 1
+ 
+             loss = criterion(y_pred[real_value_index], dec_output[real_value_index])
+ 
+             with torch.no_grad():
+                 test_acc = acc(y_pred, dec_output)
+             total_loss += loss
+             iter_num += 1
+             te_acc += test_acc
+ 
+     return total_loss.data.cpu().numpy() / iter_num, te_acc.data.cpu().numpy() / iter_num
+ 
+ 
+ # 데이터 전처리 및 loader return
+ def data_preprocessing(args, device):
+     # ID는 사용하지 않음. SA는 Sentiment Analysis 라벨(0,1) 임.
+     ID = data.Field(sequential=False,
+                     use_vocab=False)
+ 
+     TEXT = data.Field(sequential=True,
+                       use_vocab=True,
+                       tokenize=tokenizer1,
+                       batch_first=True,
+                       fix_length=args.max_len,
+                       dtype=torch.int32
+                       )
+ 
+     LABEL = data.Field(sequential=True,
+                        use_vocab=True,
+                        tokenize=tokenizer1,
+                        batch_first=True,
+                        fix_length=args.max_len,
+                        init_token='<sos>',
+                        eos_token='<eos>',
+                        dtype=torch.int32
+                        )
+ 
+     SA = data.Field(sequential=False,
+                     use_vocab=False)
+ 
+     train_data, test_data = TabularDataset.splits(
+         path='.', train='chatbot_0325_ALLLABEL_train.txt', test='chatbot_0325_ALLLABEL_test.txt', format='tsv',
+         fields=[('id', ID), ('text', TEXT), ('target_text', LABEL), ('SA', SA)], skip_header=True
+     )
+ 
+     # TEXT, LABEL 에 필요한 special token 만듦.
+     text_specials, label_specials = make_special_token(args.per_rough)
+ 
+     TEXT.build_vocab(train_data, max_size=15000, specials=text_specials)
+     LABEL.build_vocab(train_data, max_size=15000, specials=label_specials)
+ 
+     train_loader = BucketIterator(dataset=train_data, batch_size=args.batch_size, device=device, shuffle=True)
+     test_loader = BucketIterator(dataset=test_data, batch_size=args.batch_size, device=device, shuffle=True)
+ 
+     return TEXT, LABEL, train_loader, test_loader
+ 
+ 
+ def main(TEXT, LABEL, arguments):
+ 
+     # print argparse
+     for idx, (key, value) in enumerate(args.__dict__.items()):
+         if idx == 0:
+             print("\nargparse{\n", "\t", key, ":", value)
+         elif idx == len(args.__dict__) - 1:
+             print("\t", key, ":", value, "\n}")
+         else:
+             print("\t", key, ":", value)
+ 
+     model = Transformer(args.embedding_dim, args.nhead, args.nlayers, args.dropout, TEXT, LABEL)
+     criterion = nn.CrossEntropyLoss(ignore_index=LABEL.vocab.stoi['<pad>'])
+     optimizer = torch.optim.Adam(params=model.parameters(), lr=arguments.lr)
+     scheduler = GradualWarmupScheduler(optimizer, multiplier=8, total_epoch=arguments.num_epochs)
+     if args.per_soft:
+         sorted_path = 'sorted_model-soft.pth'
+     else:
+         sorted_path = 'sorted_model-rough.pth'
+     model.to(device)
+     if arguments.train:
+         best_valid_loss = float('inf')
+         for epoch in range(arguments.num_epochs):
+             torch.manual_seed(SEED)
+             start_time = time.time()
+ 
+             # train, validation
+             train_loss, train_acc = \
+                 train(model, train_loader, optimizer, criterion, arguments.max_len, arguments.per_soft,
+                       arguments.per_rough)
+             valid_loss, valid_acc = test(model, test_loader, criterion)
+ 
+             scheduler.step(epoch)
+             # time cal
+             end_time = time.time()
+             elapsed_time = end_time - start_time
+             epoch_mins = int(elapsed_time / 60)
+             epoch_secs = int(elapsed_time - (epoch_mins * 60))
+ 
+             # torch.save(model.state_dict(), sorted_path) # for some overfitting
+             # 전에 학습된 loss 보다 현재 loss 가 더 낮을시 모델 저장.
+             if valid_loss < best_valid_loss:
+                 best_valid_loss = valid_loss
+                 torch.save({
+                     'epoch': epoch,
+                     'model_state_dict': model.state_dict(),
+                     'optimizer_state_dict': optimizer.state_dict(),
+                     'loss': valid_loss},
+                     sorted_path)
+                 print(f'\t## SAVE valid_loss: {valid_loss:.3f} | valid_acc: {valid_acc:.3f} ##')
+ 
+             # print loss and acc
+             print(f'\n\t==Epoch: {epoch + 1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s==')
+             print(f'\t==Train Loss: {train_loss:.3f} | Train_acc: {train_acc:.3f}==')
+             print(f'\t==Valid Loss: {valid_loss:.3f} | Valid_acc: {valid_acc:.3f}==\n')
+ 
+ 
+ 
+     checkpoint = torch.load(sorted_path, map_location=device)
+     model.load_state_dict(checkpoint['model_state_dict'])
+ 
+     test_loss, test_acc = test(model, test_loader, criterion)  # 아
+     print(f'==test_loss : {test_loss:.3f} | test_acc: {test_acc:.3f}==')
+     print("\t-----------------------------")
+     while True:
+         sentence = input("문장을 입력하세요 : ")
+         print(inference(device, args.max_len, TEXT, LABEL, model, sentence))
+         print("\n")
+ 
+ 
+ if __name__ == '__main__':
+     # argparse 정의
+     parser = argparse.ArgumentParser()
+     parser.add_argument('--max_len', type=int, default=40)  # max_len 크게 해야 오류 안 생김.
+     parser.add_argument('--batch_size', type=int, default=256)
+     parser.add_argument('--num_epochs', type=int, default=22)
+     parser.add_argument('--warming_up_epochs', type=int, default=5)
+     parser.add_argument('--lr', type=float, default=0.0002)
+     parser.add_argument('--embedding_dim', type=int, default=160)
+     parser.add_argument('--nlayers', type=int, default=2)
+     parser.add_argument('--nhead', type=int, default=2)
+     parser.add_argument('--dropout', type=float, default=0.1)
+     parser.add_argument('--train', action="store_true")
+     group = parser.add_mutually_exclusive_group()
+     group.add_argument('--per_soft', action="store_true")
+     group.add_argument('--per_rough', action="store_true")
+     args = parser.parse_args()
+     print("-준비중-")
+     device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+     TEXT, LABEL, train_loader, test_loader = data_preprocessing(args, device)
+     main(TEXT, LABEL, args)
--- a/Light_model/main.html 0 → 100644
View file @70a5847
+++ b/Light_model/main.html 0 → 100644
View file @70a5847
+ <!DOCTYPE html>
+ <html>
+     <head>
+         <meta charset="UTF-8">
+          <meta name="viewport" content="width=device-width, initial-scale=1">
+         <title>Emotional Chatbot with Styler</title>
+         <script src="app.js"></script>
+         <link rel="stylesheet" type="text/css" href="chat.css" />
+     </head>
+     <body onload="setDefault()">
+         <ul id="chat_list" class="list no-bullets">
+ <li class="chat-bubble mine">(대충 적당한 대사)</li>
+ <li class="chat-bubble bots">(대충 알맞은 답변)</li>
+         </ul>
+         <div class="input-holder">
+         <input type="text" id="chat_input" autofocus/>
+         <input type="button" id="send_button" class="button" value="↵" onclick="send()" disabled>
+     </div>
+     </body>
+ </html>
\ No newline at end of file
--- a/Light_model/model.py 0 → 100644
View file @70a5847
+++ b/Light_model/model.py 0 → 100644
View file @70a5847
+ import torch
+ import torch.nn as nn
+ import math
+ 
+ device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+ 
+ 
+ class Transformer(nn.Module):
+     def __init__(self, embedding_dim: int, nhead: int, nlayers: int, dropout: float, SRC_vocab, TRG_vocab):
+         super(Transformer, self).__init__()
+         self.d_model = embedding_dim
+         self.n_head = nhead
+         self.num_encoder_layers = nlayers
+         self.num_decoder_layers = nlayers
+         self.dim_feedforward = embedding_dim
+         self.dropout = dropout
+ 
+         self.SRC_vo = SRC_vocab
+         self.TRG_vo = TRG_vocab
+ 
+         self.pos_encoder = PositionalEncoding(self.d_model, self.dropout)
+ 
+         self.src_embedding = nn.Embedding(len(self.SRC_vo.vocab), self.d_model)
+         self.trg_embedding = nn.Embedding(len(self.TRG_vo.vocab), self.d_model)
+ 
+         self.transformer = nn.Transformer(d_model=self.d_model,
+                                                 nhead=self.n_head,
+                                                 num_encoder_layers=self.num_encoder_layers,
+                                                 num_decoder_layers=self.num_decoder_layers,
+                                                 dim_feedforward=self.dim_feedforward,
+                                                 dropout=self.dropout)
+         self.proj_vocab_layer = nn.Linear(
+             in_features=self.dim_feedforward, out_features=len(self.TRG_vo.vocab))
+ 
+ 
+     def forward(self, en_input, de_input):
+         x_en_embed = self.src_embedding(en_input.long()) * math.sqrt(self.d_model)
+         x_de_embed = self.trg_embedding(de_input.long()) * math.sqrt(self.d_model)
+         x_en_embed = self.pos_encoder(x_en_embed)
+         x_de_embed = self.pos_encoder(x_de_embed)
+ 
+         # Masking
+         src_key_padding_mask = en_input == self.SRC_vo.vocab.stoi['<pad>']
+         tgt_key_padding_mask = de_input == self.TRG_vo.vocab.stoi['<pad>']
+         memory_key_padding_mask = src_key_padding_mask
+         tgt_mask = self.transformer.generate_square_subsequent_mask(de_input.size(1))
+ 
+         x_en_embed = torch.einsum('ijk->jik', x_en_embed)
+         x_de_embed = torch.einsum('ijk->jik', x_de_embed)
+ 
+         feature = self.transformer(src=x_en_embed,
+                                    tgt=x_de_embed,
+                                    src_key_padding_mask=src_key_padding_mask,
+                                    tgt_key_padding_mask=tgt_key_padding_mask,
+                                    memory_key_padding_mask=memory_key_padding_mask,
+                                    tgt_mask=tgt_mask.to(device))
+ 
+         logits = self.proj_vocab_layer(feature)
+         logits = torch.einsum('ijk->jik', logits)
+ 
+         return logits
+ 
+ 
+ class PositionalEncoding(nn.Module):
+ 
+     def __init__(self, d_model, dropout, max_len=15000):
+         super(PositionalEncoding, self).__init__()
+         self.dropout = nn.Dropout(p=dropout)
+ 
+         pe = torch.zeros(max_len, d_model)
+         position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
+         div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
+         pe[:, 0::2] = torch.sin(position * div_term)
+         pe[:, 1::2] = torch.cos(position * div_term)
+         pe = pe.unsqueeze(0).transpose(0, 1)
+         self.register_buffer('pe', pe)
+ 
+     def forward(self, x):
+         x = x + self.pe[:x.size(0), :]
+         return self.dropout(x)
+ 
+ 
+ from torch.optim.lr_scheduler import _LRScheduler
+ from torch.optim.lr_scheduler import ReduceLROnPlateau
+ 
+ 
+ class GradualWarmupScheduler(_LRScheduler):
+     """ Gradually warm-up(increasing) learning rate in optimizer.
+     Proposed in 'Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour'.
+     Args:
+         optimizer (Optimizer): Wrapped optimizer.
+         multiplier: target learning rate = base lr * multiplier
+         total_epoch: target learning rate is reached at total_epoch, gradually
+         after_scheduler: after target_epoch, use this scheduler(eg. ReduceLROnPlateau)
+     """
+ 
+     def __init__(self, optimizer, multiplier, total_epoch, after_scheduler=None):
+         self.last_epoch =  1  # ReduceLROnPlateau is called at the end of epoch, whereas others are called at beginning
+         self.multiplier = multiplier
+         if self.multiplier <= 1.:
+             raise ValueError('multiplier should be greater than 1.')
+         self.total_epoch = total_epoch
+         self.after_scheduler = after_scheduler
+         self.finished = False
+         super().__init__(optimizer)
+ 
+     def get_lr(self):
+         if self.last_epoch > self.total_epoch:
+             if self.after_scheduler:
+                 if not self.finished:
+                     self.after_scheduler.base_lrs = [base_lr * self.multiplier for base_lr in self.base_lrs]
+                     self.finished = True
+                 return self.after_scheduler.get_lr()
+             return [base_lr * self.multiplier for base_lr in self.base_lrs]
+ 
+         return [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in
+                 self.base_lrs]
+ 
+     def step_ReduceLROnPlateau(self, metrics, epoch=None):
+         if epoch is None:
+             epoch = self.last_epoch + 1
+         self.last_epoch = epoch if epoch != 0 else 1
+         if self.last_epoch <= self.total_epoch:
+             warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in
+                          self.base_lrs]
+             for param_group, lr in zip(self.optimizer.param_groups, warmup_lr):
+                 param_group['lr'] = lr
+         else:
+             if epoch is None:
+                 self.after_scheduler.step(metrics, None)
+             else:
+                 self.after_scheduler.step(metrics, epoch - self.total_epoch)
+ 
+     def step(self, epoch=None, metrics=None):
+         if type(self.after_scheduler) != ReduceLROnPlateau:
+             if self.finished and self.after_scheduler:
+                 if epoch is None:
+                     self.after_scheduler.step(None)
+                 else:
+                     self.after_scheduler.step(epoch - self.total_epoch)
+             else:
+                 return super(GradualWarmupScheduler, self).step(epoch)
+         else:
+             self.step_ReduceLROnPlateau(metrics, epoch)
--- a/Light_model/requirements.txt 0 → 100644
View file @70a5847
+++ b/Light_model/requirements.txt 0 → 100644
View file @70a5847
+ torch~=1.4.0
+ Flask~=1.1.2
+ torchtext~=0.6.0
+ hgtk~=0.1.3
+ konlpy~=0.5.2
+ chatspace~=1.0.1
\ No newline at end of file
--- a/Light_model/sorted_model-rough.pth 0 → 100644
View file @70a5847
+++ b/Light_model/sorted_model-rough.pth 0 → 100644
View file @70a5847
--- a/Light_model/sorted_model-soft.pth 0 → 100644
View file @70a5847
+++ b/Light_model/sorted_model-soft.pth 0 → 100644
View file @70a5847
--- a/Light_model/static/app.js 0 → 100644
View file @70a5847
+++ b/Light_model/static/app.js 0 → 100644
View file @70a5847
+ function send() {
+     /*client side */
+   var chat = document.createElement("li");
+   var chat_input = document.getElementById("chat_input");
+   var chat_text = chat_input.value;
+   chat.className = "chat-bubble mine";
+   chat.innerText = chat_text
+   document.getElementById("chat_list").appendChild(chat);
+   chat_input.value = "";
+ 
+   /* ajax request */
+   var request = new XMLHttpRequest();
+   request.open("POST", `${window.location.protocol}//${window.location.host}/api/soft`, true);
+   request.onreadystatechange = function() {
+     if (request.readyState !== 4 || Math.floor(request.status /100) !==2) return;
+     var bot_chat = document.createElement("li");
+   bot_chat.className = "chat-bubble bots";
+   bot_chat.innerText = JSON.parse(request.responseText).data;
+   document.getElementById("chat_list").appendChild(bot_chat);
+ 
+   };
+   request.setRequestHeader("Content-Type", "application/json;charset=UTF-8");
+ request.send(JSON.stringify({"data":chat_text}));
+ }
+ 
+ function setDefault() {
+   document.getElementById("chat_input").addEventListener("keyup", function(event) {
+     let input = document.getElementById("chat_input").value;
+     let button = document.getElementById("send_button");
+     if(input.length>0)
+     {
+       button.removeAttribute("disabled");
+     }
+     else
+     {
+       button.setAttribute("disabled", "true");
+     }
+     // Number 13 is the "Enter" key on the keyboard
+     if (event.keyCode === 13) {
+       // Cancel the default action, if needed
+       event.preventDefault();
+       // Trigger the button element with a click
+       button.click();
+     }
+   });
+ }
--- a/Light_model/static/chat.css 0 → 100644
View file @70a5847
+++ b/Light_model/static/chat.css 0 → 100644
View file @70a5847
+ ul.no-bullets {
+     list-style-type: none; /* Remove bullets */
+     padding: 0; /* Remove padding */
+     margin: 0; /* Remove margins */
+   }
+ 
+ .chat-bubble {
+   position: relative;
+   padding: 0.5em;
+   margin-top: 0.25em;
+   margin-bottom: 0.25em;
+   border-radius: 0.4em;
+   color: white;
+ }
+ .mine {
+   background: #00aabb;
+ }
+ .bots {
+   background: #cc78c5;
+ }
+ 
+ .chat-bubble:after {
+   content: "";
+   position: absolute;
+   top: 50%;
+   width: 0;
+   height: 0;
+   border: 0.625em solid transparent;
+   border-top: 0;
+   margin-top: -0.312em;
+   
+ }
+ .chat-bubble.mine:after {
+   right: 0;
+ 
+   border-left-color: #00aabb;
+   border-right: 0;
+   margin-right: -0.625em;
+ }
+ 
+ .chat-bubble.bots:after {
+   left: 0;
+ 
+   border-right-color: #cc78c5;
+   border-left: 0;
+   margin-left: -0.625em;
+ }
+ 
+ #chat_input {
+     width: 90%;
+ }
+ 
+ #send_button {
+ 
+     width: 5%;
+     border-radius: 0.4em;
+     color: white;
+     background-color: rgb(15, 145, 138);
+ }
+ 
+ .input-holder {
+     position: fixed;
+     left: 0;
+     right: 0;
+     bottom: 0;
+     padding: 0.25em;
+     background-color: lightseagreen;
+ }
\ No newline at end of file
--- a/Light_model/static/favicon.ico 0 → 100644
View file @70a5847
+++ b/Light_model/static/favicon.ico 0 → 100644
View file @70a5847
--- a/Light_model/static/main.html 0 → 100644
View file @70a5847
+++ b/Light_model/static/main.html 0 → 100644
View file @70a5847
+ <!DOCTYPE html>
+ <html>
+     <head>
+         <meta charset="UTF-8">
+          <meta name="viewport" content="width=device-width, initial-scale=1">
+         <title>Emotional Chatbot with Styler</title>
+         <script src="app.js"></script>
+         <link rel="stylesheet" type="text/css" href="chat.css" />
+     </head>
+     <body onload="setDefault()">
+         <ul id="chat_list" class="list no-bullets">
+ <li class="chat-bubble mine">이렇게 질문을 하면...</li>
+ <li class="chat-bubble bots">이렇게 답변이 옵니다!</li>
+         </ul>
+         <div class="input-holder">
+         <input type="text" id="chat_input" autofocus/>
+         <input type="button" id="send_button" class="button" value="↵" onclick="send()" disabled>
+     </div>
+     </body>
+ </html>
\ No newline at end of file
--- a/README.md
View file @70a5847
+++ b/README.md
View file @70a5847
@@ -10,3 +10,51 @@ Language Style과 감정 분석에 따른 챗봇 답변 변화 모델 :
 - Force RTX 2080 Ti
 - Python 3.6.8
 - Pytorch 1.2.0
+ 
+ # Code
+ ## Chatbot
+ 
+ ### Chatbot_main.py
+ 챗봇 학습 및 시험에 사용되는 메인 파일입니다.
+ ### model.py
+ 챗봇에 이용되는 Transfer 모델 클래스 파일입니다.
+ ### generation.py
+ 추론 및 Beam search, Greedy search를 하는 파일입니다.
+ ### metric.py
+ 학습 성능을 측정하기 위한 모델입니다.\
+ `acc(yhat, y)`\
+ ### Styling.py
+ 성격에 따라 문체를 바꿔주는 역할을 하는 파일입니다.
+ ### get_data.py
+ 데이터셋을 전처리하고 불러오기 위한 파일입니다.\
+ `tokenizer1(text)`\
+ * text: 토크나이징할 문자열
+ 특수문자를 걸러낸 후 Mecab으로 토크나이징합니다.\
+ `data_preprocessing(args, device)`\
+ * args: argparser로 파싱한 NamedTuple
+ * device: pytorch device
+ 텍스트를 토크나이징하고 id, 텍스트, 라벨, 감정분석 결과로 나누어 데이터셋을 구성합니다.
+ 
+ ## KoBERT
+ [SKTBrain KoBERT](https://github.com/SKTBrain/KoBERT)\
+ SKT Brain에서 BERT를 한국어에 응용하여 만든 모델입니다.\
+ 네이버 영화 리뷰를 통해 감정 분석을 학습했으며 챗봇 감정 분석에 사용됩니다.\
+ ## Light_model
+ 웹 호스팅을 위해 경량화한 모델입니다. KoBERT를 지원하지 않습니다.
+ ### light_chatbot.py
+ 챗봇 모델 학습 및 시험을 할수 있는 콘솔 프로그램입니다.
+ `light_chatbot.py [--train] [--per_soft|--per_rough]`
+ 
+ * train: 학습해 모델을 만들 경우에 사용합니다.
+ 사용하지 않으면 모델을 불러와 시험 합니다.
+ * per_soft: soft 말투를 학습 또는 시험합니다.
+ * per_rough: rough 말투를 학습 또는 시험합니다.
+ 두 옵션은 양립 불가능합니다.
+ ### app.py
+ 웹 호스팅을 위한, Flask로 구성된 간단한 HTTP 서버입니다.\
+ `POST /api/soft`\
+ soft 모델을 사용해, 추론 결과를 JSON으로 응답해주는 API를 제공합니다.\
+ `GET /`\
+ static 폴더의 HTML, CSS, JS를 정적으로 호스팅해 응답합니다.
+ ### 기타
+ generation.py, styling.py, model.py의 역할은 Chatbot과 동일합니다.
\ No newline at end of file