김민수

light_model 추가 및 README 보강

FROM ufoym/deepo:pytorch-cpu
# https://github.com/Beomi/deepo-nlp/blob/master/Dockerfile
# Install JVM for Konlpy
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y \
openjdk-8-jdk wget curl git python3-dev \
language-pack-ko
RUN locale-gen en_US.UTF-8 && \
update-locale LANG=en_US.UTF-8
# Install zsh
RUN apt-get install -y zsh && \
sh -c "$(curl -fsSL https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh)"
# Install another packages
RUN pip install --upgrade pip
RUN pip install autopep8
RUN pip install konlpy
RUN pip install torchtext pytorch_pretrained_bert
# Install dependency of styling chatbot
RUN pip install hgtk chatspace
# Add Mecab-Ko
RUN curl -L https://raw.githubusercontent.com/konlpy/konlpy/master/scripts/mecab.sh | bash
# install styling chatbot by BM-K
RUN git clone https://github.com/km19809/light_model.git
RUN pip install -r light_model/requirements.txt
# Add non-root user
RUN adduser --disabled-password --gecos "" user
# Reset Workdir
WORKDIR /light_model
\ No newline at end of file
# Light weight model of styling chatbot
가벼운 모델을 웹호스팅하기 위한 레포지토리입니다.\
원본 레포지토리는 다음과 같습니다. [바로 가기](https://github.com/km19809/Styling-Chatbot-with-Transformer)
## 요구사항
이하의 내용은 개발 중 변경될 수 있으니 requirements.txt를 참고 바랍니다.
```
torch~=1.4.0
Flask~=1.1.2
torchtext~=0.6.0
hgtk~=0.1.3
konlpy~=0.5.2
chatspace~=1.0.1
```
## 사용법
`light_chatbot.py [--train] [--per_soft|--per_rough]`
* train: 학습해 모델을 만들 경우에 사용합니다. \
사용하지 않으면 모델을 불러와 시험 합니다.
* per_soft: soft 말투를 학습 또는 시험합니다.\
per_rough를 쓴 경우 rough 말투를 학습 또는 시험합니다.\
두 옵션은 양립 불가능합니다.
`app.py`
챗봇을 시험하기 위한 간단한 플라스크 서버입니다.
\ No newline at end of file
This diff is collapsed. Click to expand it.
function send() {
/*client side */
var chat = document.createElement("li");
var chat_input = document.getElementById("chat_input");
var chat_text = chat_input.value;
chat.className = "chat-bubble mine";
chat.innerText = chat_text
document.getElementById("chat_list").appendChild(chat);
chat_input.value = "";
/* ajax request */
var request = new XMLHttpRequest();
request.open("POST", `${window.location.host}/api/soft`, true);
request.onreadystatechange = function() {
if (request.readyState !== 4 || Math.floor(request.status /100) !==2) return;
var bot_chat = document.createElement("li");
bot_chat.className = "chat-bubble bots";
bot_chat.innerText = JSON.parse(request.responseText).data;
document.getElementById("chat_list").appendChild(bot_chat);
};
request.setRequestHeader("Content-Type", "application/json;charset=UTF-8");
request.send(JSON.stringify({"data":chat_text}));
}
function setDefault() {
document.getElementById("chat_input").addEventListener("keyup", function(event) {
let input = document.getElementById("chat_input").value;
let button = document.getElementById("send_button");
if(input.length>0)
{
button.removeAttribute("disabled");
}
else
{
button.setAttribute("disabled", "true");
}
// Number 13 is the "Enter" key on the keyboard
if (event.keyCode === 13) {
// Cancel the default action, if needed
event.preventDefault();
// Trigger the button element with a click
button.click();
}
});
}
from flask import Flask, request, jsonify, send_from_directory
import torch
from torchtext import data
from generation import inference, tokenizer1
from Styling import make_special_token
from model import Transformer
app = Flask(__name__,
static_url_path='',
static_folder='static',)
app.config['JSON_AS_ASCII'] = False
device = torch.device('cpu')
max_len = 40
ID = data.Field(sequential=False,
use_vocab=False)
SA = data.Field(sequential=False,
use_vocab=False)
TEXT = data.Field(sequential=True,
use_vocab=True,
tokenize=tokenizer1,
batch_first=True,
fix_length=max_len,
dtype=torch.int32
)
LABEL = data.Field(sequential=True,
use_vocab=True,
tokenize=tokenizer1,
batch_first=True,
fix_length=max_len,
init_token='<sos>',
eos_token='<eos>',
dtype=torch.int32
)
text_specials, label_specials = make_special_token(False)
train_data, _ = data.TabularDataset.splits(
path='.', train='chatbot_0325_ALLLABEL_train.txt', test='chatbot_0325_ALLLABEL_test.txt', format='tsv',
fields=[('id', ID), ('text', TEXT), ('target_text', LABEL), ('SA', SA)], skip_header=True
)
TEXT.build_vocab(train_data, max_size=15000, specials=text_specials)
LABEL.build_vocab(train_data, max_size=15000, specials=label_specials)
soft_model = Transformer(160, 2, 2, 0.1, TEXT, LABEL)
# rough_model = Transformer(args, TEXT, LABEL)
soft_model.to(device)
# rough_model.to(device)
soft_model.load_state_dict(torch.load('sorted_model-soft.pth', map_location=device)['model_state_dict'])
# rough_model.load_state_dict(torch.load('sorted_model-rough.pth', map_location=device)['model_state_dict'])
@app.route('/api/soft', methods=['POST'])
def soft():
if request.is_json:
sentence = request.json["data"]
return jsonify({"data": inference(device, max_len, TEXT, LABEL, soft_model, sentence)}), 200
else:
return jsonify({"data": "잘못된 요청입니다. Bad Request."}), 400
# @app.route('/rough', methods=['POST'])
# def rough():
# return inference(device, max_len, TEXT, LABEL, rough_model, ), 200
@app.route('/', methods=['GET'])
def main_page():
return send_from_directory('static','main.html')
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8080)
ul.no-bullets {
list-style-type: none; /* Remove bullets */
padding: 0; /* Remove padding */
margin: 0; /* Remove margins */
}
.chat-bubble {
position: relative;
padding: 0.5em;
margin-top: 0.25em;
margin-bottom: 0.25em;
border-radius: 0.4em;
color: white;
}
.mine {
background: #00aabb;
}
.bots {
background: #cc78c5;
}
.chat-bubble:after {
content: "";
position: absolute;
top: 50%;
width: 0;
height: 0;
border: 0.625em solid transparent;
border-top: 0;
margin-top: -0.312em;
}
.chat-bubble.mine:after {
right: 0;
border-left-color: #00aabb;
border-right: 0;
margin-right: -0.625em;
}
.chat-bubble.bots:after {
left: 0;
border-right-color: #cc78c5;
border-left: 0;
margin-left: -0.625em;
}
#chat_input {
width: 90%;
}
#send_button {
width: 5%;
border-radius: 0.4em;
color: white;
background-color: rgb(15, 145, 138);
}
.input-holder {
position: fixed;
left: 0;
right: 0;
bottom: 0;
padding: 0.25em;
background-color: lightseagreen;
}
\ No newline at end of file
import torch
from konlpy.tag import Mecab
from torch.autograd import Variable
from chatspace import ChatSpace
spacer = ChatSpace()
def tokenizer1(text: str):
result_text = ''.join(c for c in text if c.isalnum())
a = Mecab().morphs(result_text)
return [a[i] for i in range(len(a))]
def inference(device: torch.device, max_len: int, TEXT, LABEL, model: torch.nn.Module, sentence: str):
enc_input = tokenizer1(sentence)
enc_input_index = []
for tok in enc_input:
enc_input_index.append(TEXT.vocab.stoi[tok])
for j in range(max_len - len(enc_input_index)):
enc_input_index.append(TEXT.vocab.stoi['<pad>'])
enc_input_index = Variable(torch.LongTensor([enc_input_index]))
dec_input = torch.LongTensor([[LABEL.vocab.stoi['<sos>']]])
model.eval()
pred = []
for i in range(max_len):
y_pred = model(enc_input_index.to(device), dec_input.to(device))
y_pred_ids = y_pred.max(dim=-1)[1]
if y_pred_ids[0, -1] == LABEL.vocab.stoi['<eos>']:
y_pred_ids = y_pred_ids.squeeze(0)
print(">", end=" ")
for idx in range(len(y_pred_ids)):
if LABEL.vocab.itos[y_pred_ids[idx]] == '<eos>':
pred_sentence = "".join(pred)
pred_str = spacer.space(pred_sentence)
return pred_str
else:
pred.append(LABEL.vocab.itos[y_pred_ids[idx]])
return 'Error: Sentence is not end'
dec_input = torch.cat(
[dec_input.to(torch.device('cpu')),
y_pred_ids[0, -1].unsqueeze(0).unsqueeze(0).to(torch.device('cpu'))], dim=-1)
return 'Error: Sentence is not predicted'
import argparse
import time
import torch
from torch import nn
from torchtext import data
from torchtext.data import BucketIterator
from torchtext.data import TabularDataset
from Styling import styling, make_special_token
from generation import inference, tokenizer1
from model import Transformer, GradualWarmupScheduler
SEED = 1234
def acc(yhat: torch.Tensor, y: torch.Tensor):
with torch.no_grad():
yhat = yhat.max(dim=-1)[1] # [0]: max value, [1]: index of max value
_acc = (yhat == y).float()[y != 1].mean() # padding은 acc에서 제거
return _acc
def train(model: Transformer, iterator, optimizer, criterion: nn.CrossEntropyLoss, max_len: int, per_soft: bool, per_rough: bool):
total_loss = 0
iter_num = 0
tr_acc = 0
model.train()
for step, batch in enumerate(iterator):
optimizer.zero_grad()
enc_input, dec_input, enc_label = batch.text, batch.target_text, batch.SA
dec_output = dec_input[:, 1:]
dec_outputs = torch.zeros(dec_output.size(0), max_len).type_as(dec_input.data)
# emotion 과 체를 반영
enc_input, dec_input, dec_outputs = \
styling(enc_input, dec_input, dec_output, dec_outputs, enc_label, max_len, per_soft, per_rough, TEXT, LABEL)
y_pred = model(enc_input, dec_input)
y_pred = y_pred.reshape(-1, y_pred.size(-1))
dec_output = dec_outputs.view(-1).long()
# padding 제외한 value index 추출
real_value_index = [dec_output != 1] # <pad> == 1
# padding 은 loss 계산시 제외
loss = criterion(y_pred[real_value_index], dec_output[real_value_index])
loss.backward()
optimizer.step()
with torch.no_grad():
train_acc = acc(y_pred, dec_output)
total_loss += loss
iter_num += 1
tr_acc += train_acc
return total_loss.data.cpu().numpy() / iter_num, tr_acc.data.cpu().numpy() / iter_num
def test(model: Transformer, iterator, criterion: nn.CrossEntropyLoss):
total_loss = 0
iter_num = 0
te_acc = 0
model.eval()
with torch.no_grad():
for batch in iterator:
enc_input, dec_input, enc_label = batch.text, batch.target_text, batch.SA
dec_output = dec_input[:, 1:]
dec_outputs = torch.zeros(dec_output.size(0), args.max_len).type_as(dec_input.data)
# emotion 과 체를 반영
enc_input, dec_input, dec_outputs = \
styling(enc_input, dec_input, dec_output, dec_outputs, enc_label, args.max_len, args.per_soft, args.per_rough, TEXT, LABEL)
y_pred = model(enc_input, dec_input)
y_pred = y_pred.reshape(-1, y_pred.size(-1))
dec_output = dec_outputs.view(-1).long()
real_value_index = [dec_output != 1] # <pad> == 1
loss = criterion(y_pred[real_value_index], dec_output[real_value_index])
with torch.no_grad():
test_acc = acc(y_pred, dec_output)
total_loss += loss
iter_num += 1
te_acc += test_acc
return total_loss.data.cpu().numpy() / iter_num, te_acc.data.cpu().numpy() / iter_num
# 데이터 전처리 및 loader return
def data_preprocessing(args, device):
# ID는 사용하지 않음. SA는 Sentiment Analysis 라벨(0,1) 임.
ID = data.Field(sequential=False,
use_vocab=False)
TEXT = data.Field(sequential=True,
use_vocab=True,
tokenize=tokenizer1,
batch_first=True,
fix_length=args.max_len,
dtype=torch.int32
)
LABEL = data.Field(sequential=True,
use_vocab=True,
tokenize=tokenizer1,
batch_first=True,
fix_length=args.max_len,
init_token='<sos>',
eos_token='<eos>',
dtype=torch.int32
)
SA = data.Field(sequential=False,
use_vocab=False)
train_data, test_data = TabularDataset.splits(
path='.', train='chatbot_0325_ALLLABEL_train.txt', test='chatbot_0325_ALLLABEL_test.txt', format='tsv',
fields=[('id', ID), ('text', TEXT), ('target_text', LABEL), ('SA', SA)], skip_header=True
)
# TEXT, LABEL 에 필요한 special token 만듦.
text_specials, label_specials = make_special_token(args.per_rough)
TEXT.build_vocab(train_data, max_size=15000, specials=text_specials)
LABEL.build_vocab(train_data, max_size=15000, specials=label_specials)
train_loader = BucketIterator(dataset=train_data, batch_size=args.batch_size, device=device, shuffle=True)
test_loader = BucketIterator(dataset=test_data, batch_size=args.batch_size, device=device, shuffle=True)
return TEXT, LABEL, train_loader, test_loader
def main(TEXT, LABEL, arguments):
# print argparse
for idx, (key, value) in enumerate(args.__dict__.items()):
if idx == 0:
print("\nargparse{\n", "\t", key, ":", value)
elif idx == len(args.__dict__) - 1:
print("\t", key, ":", value, "\n}")
else:
print("\t", key, ":", value)
model = Transformer(args.embedding_dim, args.nhead, args.nlayers, args.dropout, TEXT, LABEL)
criterion = nn.CrossEntropyLoss(ignore_index=LABEL.vocab.stoi['<pad>'])
optimizer = torch.optim.Adam(params=model.parameters(), lr=arguments.lr)
scheduler = GradualWarmupScheduler(optimizer, multiplier=8, total_epoch=arguments.num_epochs)
if args.per_soft:
sorted_path = 'sorted_model-soft.pth'
else:
sorted_path = 'sorted_model-rough.pth'
model.to(device)
if arguments.train:
best_valid_loss = float('inf')
for epoch in range(arguments.num_epochs):
torch.manual_seed(SEED)
start_time = time.time()
# train, validation
train_loss, train_acc = \
train(model, train_loader, optimizer, criterion, arguments.max_len, arguments.per_soft,
arguments.per_rough)
valid_loss, valid_acc = test(model, test_loader, criterion)
scheduler.step(epoch)
# time cal
end_time = time.time()
elapsed_time = end_time - start_time
epoch_mins = int(elapsed_time / 60)
epoch_secs = int(elapsed_time - (epoch_mins * 60))
# torch.save(model.state_dict(), sorted_path) # for some overfitting
# 전에 학습된 loss 보다 현재 loss 가 더 낮을시 모델 저장.
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': valid_loss},
sorted_path)
print(f'\t## SAVE valid_loss: {valid_loss:.3f} | valid_acc: {valid_acc:.3f} ##')
# print loss and acc
print(f'\n\t==Epoch: {epoch + 1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s==')
print(f'\t==Train Loss: {train_loss:.3f} | Train_acc: {train_acc:.3f}==')
print(f'\t==Valid Loss: {valid_loss:.3f} | Valid_acc: {valid_acc:.3f}==\n')
checkpoint = torch.load(sorted_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
test_loss, test_acc = test(model, test_loader, criterion) # 아
print(f'==test_loss : {test_loss:.3f} | test_acc: {test_acc:.3f}==')
print("\t-----------------------------")
while True:
sentence = input("문장을 입력하세요 : ")
print(inference(device, args.max_len, TEXT, LABEL, model, sentence))
print("\n")
if __name__ == '__main__':
# argparse 정의
parser = argparse.ArgumentParser()
parser.add_argument('--max_len', type=int, default=40) # max_len 크게 해야 오류 안 생김.
parser.add_argument('--batch_size', type=int, default=256)
parser.add_argument('--num_epochs', type=int, default=22)
parser.add_argument('--warming_up_epochs', type=int, default=5)
parser.add_argument('--lr', type=float, default=0.0002)
parser.add_argument('--embedding_dim', type=int, default=160)
parser.add_argument('--nlayers', type=int, default=2)
parser.add_argument('--nhead', type=int, default=2)
parser.add_argument('--dropout', type=float, default=0.1)
parser.add_argument('--train', action="store_true")
group = parser.add_mutually_exclusive_group()
group.add_argument('--per_soft', action="store_true")
group.add_argument('--per_rough', action="store_true")
args = parser.parse_args()
print("-준비중-")
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
TEXT, LABEL, train_loader, test_loader = data_preprocessing(args, device)
main(TEXT, LABEL, args)
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Emotional Chatbot with Styler</title>
<script src="app.js"></script>
<link rel="stylesheet" type="text/css" href="chat.css" />
</head>
<body onload="setDefault()">
<ul id="chat_list" class="list no-bullets">
<li class="chat-bubble mine">(대충 적당한 대사)</li>
<li class="chat-bubble bots">(대충 알맞은 답변)</li>
</ul>
<div class="input-holder">
<input type="text" id="chat_input" autofocus/>
<input type="button" id="send_button" class="button" value="↵" onclick="send()" disabled>
</div>
</body>
</html>
\ No newline at end of file
import torch
import torch.nn as nn
import math
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
class Transformer(nn.Module):
def __init__(self, embedding_dim: int, nhead: int, nlayers: int, dropout: float, SRC_vocab, TRG_vocab):
super(Transformer, self).__init__()
self.d_model = embedding_dim
self.n_head = nhead
self.num_encoder_layers = nlayers
self.num_decoder_layers = nlayers
self.dim_feedforward = embedding_dim
self.dropout = dropout
self.SRC_vo = SRC_vocab
self.TRG_vo = TRG_vocab
self.pos_encoder = PositionalEncoding(self.d_model, self.dropout)
self.src_embedding = nn.Embedding(len(self.SRC_vo.vocab), self.d_model)
self.trg_embedding = nn.Embedding(len(self.TRG_vo.vocab), self.d_model)
self.transformer = nn.Transformer(d_model=self.d_model,
nhead=self.n_head,
num_encoder_layers=self.num_encoder_layers,
num_decoder_layers=self.num_decoder_layers,
dim_feedforward=self.dim_feedforward,
dropout=self.dropout)
self.proj_vocab_layer = nn.Linear(
in_features=self.dim_feedforward, out_features=len(self.TRG_vo.vocab))
def forward(self, en_input, de_input):
x_en_embed = self.src_embedding(en_input.long()) * math.sqrt(self.d_model)
x_de_embed = self.trg_embedding(de_input.long()) * math.sqrt(self.d_model)
x_en_embed = self.pos_encoder(x_en_embed)
x_de_embed = self.pos_encoder(x_de_embed)
# Masking
src_key_padding_mask = en_input == self.SRC_vo.vocab.stoi['<pad>']
tgt_key_padding_mask = de_input == self.TRG_vo.vocab.stoi['<pad>']
memory_key_padding_mask = src_key_padding_mask
tgt_mask = self.transformer.generate_square_subsequent_mask(de_input.size(1))
x_en_embed = torch.einsum('ijk->jik', x_en_embed)
x_de_embed = torch.einsum('ijk->jik', x_de_embed)
feature = self.transformer(src=x_en_embed,
tgt=x_de_embed,
src_key_padding_mask=src_key_padding_mask,
tgt_key_padding_mask=tgt_key_padding_mask,
memory_key_padding_mask=memory_key_padding_mask,
tgt_mask=tgt_mask.to(device))
logits = self.proj_vocab_layer(feature)
logits = torch.einsum('ijk->jik', logits)
return logits
class PositionalEncoding(nn.Module):
def __init__(self, d_model, dropout, max_len=15000):
super(PositionalEncoding, self).__init__()
self.dropout = nn.Dropout(p=dropout)
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0).transpose(0, 1)
self.register_buffer('pe', pe)
def forward(self, x):
x = x + self.pe[:x.size(0), :]
return self.dropout(x)
from torch.optim.lr_scheduler import _LRScheduler
from torch.optim.lr_scheduler import ReduceLROnPlateau
class GradualWarmupScheduler(_LRScheduler):
""" Gradually warm-up(increasing) learning rate in optimizer.
Proposed in 'Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour'.
Args:
optimizer (Optimizer): Wrapped optimizer.
multiplier: target learning rate = base lr * multiplier
total_epoch: target learning rate is reached at total_epoch, gradually
after_scheduler: after target_epoch, use this scheduler(eg. ReduceLROnPlateau)
"""
def __init__(self, optimizer, multiplier, total_epoch, after_scheduler=None):
self.last_epoch = 1 # ReduceLROnPlateau is called at the end of epoch, whereas others are called at beginning
self.multiplier = multiplier
if self.multiplier <= 1.:
raise ValueError('multiplier should be greater than 1.')
self.total_epoch = total_epoch
self.after_scheduler = after_scheduler
self.finished = False
super().__init__(optimizer)
def get_lr(self):
if self.last_epoch > self.total_epoch:
if self.after_scheduler:
if not self.finished:
self.after_scheduler.base_lrs = [base_lr * self.multiplier for base_lr in self.base_lrs]
self.finished = True
return self.after_scheduler.get_lr()
return [base_lr * self.multiplier for base_lr in self.base_lrs]
return [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in
self.base_lrs]
def step_ReduceLROnPlateau(self, metrics, epoch=None):
if epoch is None:
epoch = self.last_epoch + 1
self.last_epoch = epoch if epoch != 0 else 1
if self.last_epoch <= self.total_epoch:
warmup_lr = [base_lr * ((self.multiplier - 1.) * self.last_epoch / self.total_epoch + 1.) for base_lr in
self.base_lrs]
for param_group, lr in zip(self.optimizer.param_groups, warmup_lr):
param_group['lr'] = lr
else:
if epoch is None:
self.after_scheduler.step(metrics, None)
else:
self.after_scheduler.step(metrics, epoch - self.total_epoch)
def step(self, epoch=None, metrics=None):
if type(self.after_scheduler) != ReduceLROnPlateau:
if self.finished and self.after_scheduler:
if epoch is None:
self.after_scheduler.step(None)
else:
self.after_scheduler.step(epoch - self.total_epoch)
else:
return super(GradualWarmupScheduler, self).step(epoch)
else:
self.step_ReduceLROnPlateau(metrics, epoch)
torch~=1.4.0
Flask~=1.1.2
torchtext~=0.6.0
hgtk~=0.1.3
konlpy~=0.5.2
chatspace~=1.0.1
\ No newline at end of file
This file is too large to display.
This file is too large to display.
function send() {
/*client side */
var chat = document.createElement("li");
var chat_input = document.getElementById("chat_input");
var chat_text = chat_input.value;
chat.className = "chat-bubble mine";
chat.innerText = chat_text
document.getElementById("chat_list").appendChild(chat);
chat_input.value = "";
/* ajax request */
var request = new XMLHttpRequest();
request.open("POST", `${window.location.protocol}//${window.location.host}/api/soft`, true);
request.onreadystatechange = function() {
if (request.readyState !== 4 || Math.floor(request.status /100) !==2) return;
var bot_chat = document.createElement("li");
bot_chat.className = "chat-bubble bots";
bot_chat.innerText = JSON.parse(request.responseText).data;
document.getElementById("chat_list").appendChild(bot_chat);
};
request.setRequestHeader("Content-Type", "application/json;charset=UTF-8");
request.send(JSON.stringify({"data":chat_text}));
}
function setDefault() {
document.getElementById("chat_input").addEventListener("keyup", function(event) {
let input = document.getElementById("chat_input").value;
let button = document.getElementById("send_button");
if(input.length>0)
{
button.removeAttribute("disabled");
}
else
{
button.setAttribute("disabled", "true");
}
// Number 13 is the "Enter" key on the keyboard
if (event.keyCode === 13) {
// Cancel the default action, if needed
event.preventDefault();
// Trigger the button element with a click
button.click();
}
});
}
ul.no-bullets {
list-style-type: none; /* Remove bullets */
padding: 0; /* Remove padding */
margin: 0; /* Remove margins */
}
.chat-bubble {
position: relative;
padding: 0.5em;
margin-top: 0.25em;
margin-bottom: 0.25em;
border-radius: 0.4em;
color: white;
}
.mine {
background: #00aabb;
}
.bots {
background: #cc78c5;
}
.chat-bubble:after {
content: "";
position: absolute;
top: 50%;
width: 0;
height: 0;
border: 0.625em solid transparent;
border-top: 0;
margin-top: -0.312em;
}
.chat-bubble.mine:after {
right: 0;
border-left-color: #00aabb;
border-right: 0;
margin-right: -0.625em;
}
.chat-bubble.bots:after {
left: 0;
border-right-color: #cc78c5;
border-left: 0;
margin-left: -0.625em;
}
#chat_input {
width: 90%;
}
#send_button {
width: 5%;
border-radius: 0.4em;
color: white;
background-color: rgb(15, 145, 138);
}
.input-holder {
position: fixed;
left: 0;
right: 0;
bottom: 0;
padding: 0.25em;
background-color: lightseagreen;
}
\ No newline at end of file
No preview for this file type
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Emotional Chatbot with Styler</title>
<script src="app.js"></script>
<link rel="stylesheet" type="text/css" href="chat.css" />
</head>
<body onload="setDefault()">
<ul id="chat_list" class="list no-bullets">
<li class="chat-bubble mine">이렇게 질문을 하면...</li>
<li class="chat-bubble bots">이렇게 답변이 옵니다!</li>
</ul>
<div class="input-holder">
<input type="text" id="chat_input" autofocus/>
<input type="button" id="send_button" class="button" value="↵" onclick="send()" disabled>
</div>
</body>
</html>
\ No newline at end of file
......@@ -10,3 +10,51 @@ Language Style과 감정 분석에 따른 챗봇 답변 변화 모델 :
- Force RTX 2080 Ti
- Python 3.6.8
- Pytorch 1.2.0
# Code
## Chatbot
### Chatbot_main.py
챗봇 학습 및 시험에 사용되는 메인 파일입니다.
### model.py
챗봇에 이용되는 Transfer 모델 클래스 파일입니다.
### generation.py
추론 및 Beam search, Greedy search를 하는 파일입니다.
### metric.py
학습 성능을 측정하기 위한 모델입니다.\
`acc(yhat, y)`\
### Styling.py
성격에 따라 문체를 바꿔주는 역할을 하는 파일입니다.
### get_data.py
데이터셋을 전처리하고 불러오기 위한 파일입니다.\
`tokenizer1(text)`\
* text: 토크나이징할 문자열
특수문자를 걸러낸 후 Mecab으로 토크나이징합니다.\
`data_preprocessing(args, device)`\
* args: argparser로 파싱한 NamedTuple
* device: pytorch device
텍스트를 토크나이징하고 id, 텍스트, 라벨, 감정분석 결과로 나누어 데이터셋을 구성합니다.
## KoBERT
[SKTBrain KoBERT](https://github.com/SKTBrain/KoBERT)\
SKT Brain에서 BERT를 한국어에 응용하여 만든 모델입니다.\
네이버 영화 리뷰를 통해 감정 분석을 학습했으며 챗봇 감정 분석에 사용됩니다.\
## Light_model
웹 호스팅을 위해 경량화한 모델입니다. KoBERT를 지원하지 않습니다.
### light_chatbot.py
챗봇 모델 학습 및 시험을 할수 있는 콘솔 프로그램입니다.
`light_chatbot.py [--train] [--per_soft|--per_rough]`
* train: 학습해 모델을 만들 경우에 사용합니다.
사용하지 않으면 모델을 불러와 시험 합니다.
* per_soft: soft 말투를 학습 또는 시험합니다.
* per_rough: rough 말투를 학습 또는 시험합니다.
두 옵션은 양립 불가능합니다.
### app.py
웹 호스팅을 위한, Flask로 구성된 간단한 HTTP 서버입니다.\
`POST /api/soft`\
soft 모델을 사용해, 추론 결과를 JSON으로 응답해주는 API를 제공합니다.\
`GET /`\
static 폴더의 HTML, CSS, JS를 정적으로 호스팅해 응답합니다.
### 기타
generation.py, styling.py, model.py의 역할은 Chatbot과 동일합니다.
\ No newline at end of file
......