yomapi

modify readme

1 # ML base Spacing Correcter 1 # ML base Spacing Correcter
2 -This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon/TrainKoSpacing "TrainKoSpacing"), using FastText instead of Word2Vec 2 +본 프로젝트는 (주) 미리내에서 진행한 산학 연계 프로젝트로 별도 사내의 버전 관리 서비스를 이용하였음으로, 히스토리 내역 없이 소스 코드 및 데모 코드만 첨부하였음.
3 +
4 +## Introduction
5 +This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon/TrainKoSpacing "TrainKoSpacing"), using FastText instead of Word2Vec.
6 +
7 +If you want detail information you can watch our [presentation video](https://drive.google.com/file/d/1f-D3DC8cnrRniLvoAJ4WyreCjoo316Yc/view?usp=sharing "KCC Presentation Video "presentation video")
8 +
3 9
4 ## Performances 10 ## Performances
5 | Model | Test Accuracy(%) | Encoding Time Cost | 11 | Model | Test Accuracy(%) | Encoding Time Cost |
...@@ -12,7 +18,9 @@ This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon ...@@ -12,7 +18,9 @@ This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon
12 #### Corpus 18 #### Corpus
13 19
14 We mainly focus on the National Institute of Korean Language 모두의 말뭉치 corpus and National Information Society Agency AI-Hub data. However, due to the license issue, we are restricted to distribute this dataset. You should be able to get them throw the link below 20 We mainly focus on the National Institute of Korean Language 모두의 말뭉치 corpus and National Information Society Agency AI-Hub data. However, due to the license issue, we are restricted to distribute this dataset. You should be able to get them throw the link below
21 +
15 [National Institute of Korean Language 모두의 말뭉치](https://corpus.korean.go.kr/). 22 [National Institute of Korean Language 모두의 말뭉치](https://corpus.korean.go.kr/).
23 +
16 [National Information Society Agency AI-Hub](https://aihub.or.kr/aihub-data/natural-language/about "National Information Society Agency AI-Hub") 24 [National Information Society Agency AI-Hub](https://aihub.or.kr/aihub-data/natural-language/about "National Information Society Agency AI-Hub")
17 25
18 #### Data format 26 #### Data format
...@@ -34,8 +42,11 @@ Bziped file consisting of one sentence per line. ...@@ -34,8 +42,11 @@ Bziped file consisting of one sentence per line.
34 ### Word Embedding 42 ### Word Embedding
35 #### 자모분해 43 #### 자모분해
36 To get similar shpae of Korean charector, use 자모분해 FastText word embedding. 44 To get similar shpae of Korean charector, use 자모분해 FastText word embedding.
45 +
37 ex) 46 ex)
47 +
38 자연어처리 48 자연어처리
49 +
39 ㅈ ㅏ – ㅇ ㅕ ㄴ ㅇ ㅓ – ㅊ ㅓ – ㄹ ㅣ – 50 ㅈ ㅏ – ㅇ ㅕ ㄴ ㅇ ㅓ – ㅊ ㅓ – ㄹ ㅣ –
40 51
41 #### 2 stage FastText 52 #### 2 stage FastText
...@@ -47,6 +58,7 @@ Because middle part of output distribution are evenly distributed. ...@@ -47,6 +58,7 @@ Because middle part of output distribution are evenly distributed.
47 ![probability_distribution_of_output_vector](img/probability_distribution_of_output_vector.png) 58 ![probability_distribution_of_output_vector](img/probability_distribution_of_output_vector.png)
48 59
49 Use log transform and second derivative 60 Use log transform and second derivative
61 +
50 result: 62 result:
51 ![Thresholding_result](img/Thresholding_result.png) 63 ![Thresholding_result](img/Thresholding_result.png)
52 64
...@@ -110,7 +122,24 @@ Directory guide for embedding model files ...@@ -110,7 +122,24 @@ Directory guide for embedding model files
110 - **kospacing_wv.np** 122 - **kospacing_wv.np**
111 - **w2idx.dic** 123 - **w2idx.dic**
112 124
113 -### Reference 125 +## Demo
126 +![demo_img](img/demo_screenshot.png)
127 +You can watch Demo video [here](https://drive.google.com/file/d/1fYKapmplTmVKVxypj0-bB2TFV_2IiBO_/view?usp=sharing "here")
128 +
129 +### How to Run Demo
130 +#### 1. run demo server
131 +```bash
132 +cd demo
133 +python server.py
134 +```
135 +#### 2. open demo client page
136 +open html file on path: demo/front-client/client_demo.html
137 +
138 +Input Korean sentence and click submit
139 +
140 +
141 +## Reference
114 TrainKoSpacing: https://github.com/haven-jeon/TrainKoSpacing 142 TrainKoSpacing: https://github.com/haven-jeon/TrainKoSpacing
143 +
115 딥 러닝을 이용한 자연어 처리 입문: https://wikidocs.net/book/2155 144 딥 러닝을 이용한 자연어 처리 입문: https://wikidocs.net/book/2155
116 145
......