yomapi

modify readme

# ML base Spacing Correcter
This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon/TrainKoSpacing "TrainKoSpacing"), using FastText instead of Word2Vec
본 프로젝트는 (주) 미리내에서 진행한 산학 연계 프로젝트로 별도 사내의 버전 관리 서비스를 이용하였음으로, 히스토리 내역 없이 소스 코드 및 데모 코드만 첨부하였음.
## Introduction
This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon/TrainKoSpacing "TrainKoSpacing"), using FastText instead of Word2Vec.
If you want detail information you can watch our [presentation video](https://drive.google.com/file/d/1f-D3DC8cnrRniLvoAJ4WyreCjoo316Yc/view?usp=sharing "KCC Presentation Video "presentation video")
## Performances
| Model | Test Accuracy(%) | Encoding Time Cost |
......@@ -12,7 +18,9 @@ This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon
#### Corpus
We mainly focus on the National Institute of Korean Language 모두의 말뭉치 corpus and National Information Society Agency AI-Hub data. However, due to the license issue, we are restricted to distribute this dataset. You should be able to get them throw the link below
[National Institute of Korean Language 모두의 말뭉치](https://corpus.korean.go.kr/).
[National Information Society Agency AI-Hub](https://aihub.or.kr/aihub-data/natural-language/about "National Information Society Agency AI-Hub")
#### Data format
......@@ -34,8 +42,11 @@ Bziped file consisting of one sentence per line.
### Word Embedding
#### 자모분해
To get similar shpae of Korean charector, use 자모분해 FastText word embedding.
ex)
자연어처리
ㅈ ㅏ – ㅇ ㅕ ㄴ ㅇ ㅓ – ㅊ ㅓ – ㄹ ㅣ –
#### 2 stage FastText
......@@ -47,6 +58,7 @@ Because middle part of output distribution are evenly distributed.
![probability_distribution_of_output_vector](img/probability_distribution_of_output_vector.png)
Use log transform and second derivative
result:
![Thresholding_result](img/Thresholding_result.png)
......@@ -110,7 +122,24 @@ Directory guide for embedding model files
- **kospacing_wv.np**
- **w2idx.dic**
### Reference
## Demo
![demo_img](img/demo_screenshot.png)
You can watch Demo video [here](https://drive.google.com/file/d/1fYKapmplTmVKVxypj0-bB2TFV_2IiBO_/view?usp=sharing "here")
### How to Run Demo
#### 1. run demo server
```bash
cd demo
python server.py
```
#### 2. open demo client page
open html file on path: demo/front-client/client_demo.html
Input Korean sentence and click submit
## Reference
TrainKoSpacing: https://github.com/haven-jeon/TrainKoSpacing
딥 러닝을 이용한 자연어 처리 입문: https://wikidocs.net/book/2155
......