Showing
2 changed files
with
31 additions
and
2 deletions
| 1 | # ML base Spacing Correcter | 1 | # ML base Spacing Correcter |
| 2 | -This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon/TrainKoSpacing "TrainKoSpacing"), using FastText instead of Word2Vec | 2 | +본 프로젝트는 (주) 미리내에서 진행한 산학 연계 프로젝트로 별도 사내의 버전 관리 서비스를 이용하였음으로, 히스토리 내역 없이 소스 코드 및 데모 코드만 첨부하였음. |
| 3 | + | ||
| 4 | +## Introduction | ||
| 5 | +This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon/TrainKoSpacing "TrainKoSpacing"), using FastText instead of Word2Vec. | ||
| 6 | + | ||
| 7 | +If you want detail information you can watch our [presentation video](https://drive.google.com/file/d/1f-D3DC8cnrRniLvoAJ4WyreCjoo316Yc/view?usp=sharing "KCC Presentation Video "presentation video") | ||
| 8 | + | ||
| 3 | 9 | ||
| 4 | ## Performances | 10 | ## Performances |
| 5 | | Model | Test Accuracy(%) | Encoding Time Cost | | 11 | | Model | Test Accuracy(%) | Encoding Time Cost | |
| ... | @@ -12,7 +18,9 @@ This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon | ... | @@ -12,7 +18,9 @@ This model is improved version of [TrainKoSpacing](https://github.com/haven-jeon |
| 12 | #### Corpus | 18 | #### Corpus |
| 13 | 19 | ||
| 14 | We mainly focus on the National Institute of Korean Language 모두의 말뭉치 corpus and National Information Society Agency AI-Hub data. However, due to the license issue, we are restricted to distribute this dataset. You should be able to get them throw the link below | 20 | We mainly focus on the National Institute of Korean Language 모두의 말뭉치 corpus and National Information Society Agency AI-Hub data. However, due to the license issue, we are restricted to distribute this dataset. You should be able to get them throw the link below |
| 21 | + | ||
| 15 | [National Institute of Korean Language 모두의 말뭉치](https://corpus.korean.go.kr/). | 22 | [National Institute of Korean Language 모두의 말뭉치](https://corpus.korean.go.kr/). |
| 23 | + | ||
| 16 | [National Information Society Agency AI-Hub](https://aihub.or.kr/aihub-data/natural-language/about "National Information Society Agency AI-Hub") | 24 | [National Information Society Agency AI-Hub](https://aihub.or.kr/aihub-data/natural-language/about "National Information Society Agency AI-Hub") |
| 17 | 25 | ||
| 18 | #### Data format | 26 | #### Data format |
| ... | @@ -34,8 +42,11 @@ Bziped file consisting of one sentence per line. | ... | @@ -34,8 +42,11 @@ Bziped file consisting of one sentence per line. |
| 34 | ### Word Embedding | 42 | ### Word Embedding |
| 35 | #### 자모분해 | 43 | #### 자모분해 |
| 36 | To get similar shpae of Korean charector, use 자모분해 FastText word embedding. | 44 | To get similar shpae of Korean charector, use 자모분해 FastText word embedding. |
| 45 | + | ||
| 37 | ex) | 46 | ex) |
| 47 | + | ||
| 38 | 자연어처리 | 48 | 자연어처리 |
| 49 | + | ||
| 39 | ㅈ ㅏ – ㅇ ㅕ ㄴ ㅇ ㅓ – ㅊ ㅓ – ㄹ ㅣ – | 50 | ㅈ ㅏ – ㅇ ㅕ ㄴ ㅇ ㅓ – ㅊ ㅓ – ㄹ ㅣ – |
| 40 | 51 | ||
| 41 | #### 2 stage FastText | 52 | #### 2 stage FastText |
| ... | @@ -47,6 +58,7 @@ Because middle part of output distribution are evenly distributed. | ... | @@ -47,6 +58,7 @@ Because middle part of output distribution are evenly distributed. |
| 47 |  | 58 |  |
| 48 | 59 | ||
| 49 | Use log transform and second derivative | 60 | Use log transform and second derivative |
| 61 | + | ||
| 50 | result: | 62 | result: |
| 51 |  | 63 |  |
| 52 | 64 | ||
| ... | @@ -110,7 +122,24 @@ Directory guide for embedding model files | ... | @@ -110,7 +122,24 @@ Directory guide for embedding model files |
| 110 | - **kospacing_wv.np** | 122 | - **kospacing_wv.np** |
| 111 | - **w2idx.dic** | 123 | - **w2idx.dic** |
| 112 | 124 | ||
| 113 | -### Reference | 125 | +## Demo |
| 126 | + | ||
| 127 | +You can watch Demo video [here](https://drive.google.com/file/d/1fYKapmplTmVKVxypj0-bB2TFV_2IiBO_/view?usp=sharing "here") | ||
| 128 | + | ||
| 129 | +### How to Run Demo | ||
| 130 | +#### 1. run demo server | ||
| 131 | +```bash | ||
| 132 | +cd demo | ||
| 133 | +python server.py | ||
| 134 | +``` | ||
| 135 | +#### 2. open demo client page | ||
| 136 | +open html file on path: demo/front-client/client_demo.html | ||
| 137 | + | ||
| 138 | +Input Korean sentence and click submit | ||
| 139 | + | ||
| 140 | + | ||
| 141 | +## Reference | ||
| 114 | TrainKoSpacing: https://github.com/haven-jeon/TrainKoSpacing | 142 | TrainKoSpacing: https://github.com/haven-jeon/TrainKoSpacing |
| 143 | + | ||
| 115 | 딥 러닝을 이용한 자연어 처리 입문: https://wikidocs.net/book/2155 | 144 | 딥 러닝을 이용한 자연어 처리 입문: https://wikidocs.net/book/2155 |
| 116 | 145 | ... | ... |
img/demo_screenshot.png
0 → 100644
60.9 KB
-
Please register or login to post a comment