장르 분류 최종.ipynb 85.5 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5699\n",
      "9823\n",
      "14020\n",
      "2727\n",
      "1498\n",
      "1464\n",
      "8286\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\IPython\\core\\interactiveshell.py:3063: DtypeWarning: Columns (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119) have mixed types.Specify dtype option on import or set low_memory=False.\n",
      "  interactivity=interactivity, compiler=compiler, result=result)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'\\ntrain_data_size = 732*6\\ntest_data_size = 732*6\\n'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from nltk.corpus import stopwords\n",
    "from nltk.tokenize import word_tokenize\n",
    "import re\n",
    "\"\"\"\n",
    "thriller_plot = pd.read_csv('/Users/yangyoonji/Documents/2020-1/2020-dataCapstone/data/moviedata/moviePlot/thrillerPlot.csv')\n",
    "drama_plot = pd.read_csv('/Users/yangyoonji/Documents/2020-1/2020-dataCapstone/data/moviedata/moviePlot/dramaPlot.csv')\n",
    "fantasy_plot = pd.read_csv('/Users/yangyoonji/Documents/2020-1/2020-dataCapstone/data/moviedata/moviePlot/fantasyPlot.csv')\n",
    "history_plot = pd.read_csv('/Users/yangyoonji/Documents/2020-1/2020-dataCapstone/data/moviedata/moviePlot/historyPlot.csv')\n",
    "social_plot = pd.read_csv('/Users/yangyoonji/Documents/2020-1/2020-dataCapstone/data/moviedata/moviePlot/socialPlot.csv')\n",
    "romance_plot = pd.read_csv('/Users/yangyoonji/Documents/2020-1/2020-dataCapstone/data/moviedata/moviePlot/romancePlot.csv')\n",
    "musical_plot = pd.read_csv('/Users/yangyoonji/Documents/2020-1/2020-dataCapstone/data/musicalData/broadMusicalPlot.csv',encoding='cp949')\n",
    "\n",
    "# /Users/김서영/Desktop/datacap/data/moviedata/moviePlot/romancePlot.csv\n",
    "\"\"\"\n",
    "romance_plot = pd.read_csv('/Users/김서영/Desktop/datacap/data/moviedata/moviePlot/romancePlot.csv')\n",
    "thriller_plot = pd.read_csv('/Users/김서영/Desktop/datacap/data/moviedata/moviePlot/thrillerPlot.csv')\n",
    "drama_plot = pd.read_csv('/Users/김서영/Desktop/datacap/data/moviedata/moviePlot/dramaPlot.csv')\n",
    "fantasy_plot = pd.read_csv('/Users/김서영/Desktop/datacap/data/moviedata/moviePlot/fantasyPlot.csv')\n",
    "history_plot = pd.read_csv('/Users/김서영/Desktop/datacap/data/moviedata/moviePlot/historyPlot.csv')\n",
    "social_plot = pd.read_csv('/Users/김서영/Desktop/datacap/data/moviedata/moviePlot/socialPlot.csv')\n",
    "non_plot = pd.read_csv('/Users/김서영/Desktop/datacap/data/moviedata/moviePlot/nonPlot.csv')\n",
    "\n",
    "musical_plot = pd.read_csv('/Users/김서영/Desktop/datacap/data/musicalData/broadMusicalPlot.csv',encoding='cp949')\n",
    "\n",
    "\n",
    "print(len(romance_plot)) #5699 ==> train 2500 test 2500\n",
    "print(len(thriller_plot)) #9823 ==> train 2500 test 2500\n",
    "print(len(drama_plot))\n",
    "print(len(fantasy_plot)) #2727\n",
    "print(len(history_plot))\n",
    "print(len(social_plot))\n",
    "\n",
    "print(len(non_plot))\n",
    "\n",
    "\"\"\"\n",
    "train_data_size = 732*6\n",
    "test_data_size = 732*6\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#전체 레이블\n",
    "each_len = 1350\n",
    "\n",
    "train_labels = []\n",
    "test_labels = []\n",
    "\n",
    "RM_train = [[] for _ in range(2000)]\n",
    "RM_test = [[] for _ in range(700)]\n",
    "\n",
    "for i in range(2700):\n",
    "    if i < 2000:\n",
    "        RM_train[i].append(''.join(romance_plot.줄거리[i]))\n",
    "        train_labels.append(0)\n",
    "    else:\n",
    "        j = i - 2000 \n",
    "        RM_test[j].append(''.join(romance_plot.줄거리[i]))\n",
    "        test_labels.append(0)\n",
    "\n",
    "TH_train = [[] for _ in range(2000)]\n",
    "TH_test = [[] for _ in range(700)]\n",
    "for i in range(2700):\n",
    "    if i < 2000:\n",
    "        TH_train[i].append(''.join(thriller_plot.줄거리[i+700]))\n",
    "        train_labels.append(1)\n",
    "    else:\n",
    "        j = i - 2000 \n",
    "        TH_test[j].append(''.join(thriller_plot.줄거리[i-1999]))\n",
    "        test_labels.append(1)\n",
    "      \n",
    "        \n",
    "FN_train = [[] for _ in range(2000)]\n",
    "FN_test = [[] for _ in range(700)]\n",
    "for i in range(2700):\n",
    "    if i < 2000:\n",
    "        FN_train[i].append(''.join(fantasy_plot.줄거리[i]))\n",
    "        train_labels.append(2)\n",
    "    else:\n",
    "        j = i - 2000 \n",
    "        FN_test[j].append(''.join(fantasy_plot.줄거리[i]))\n",
    "        test_labels.append(2)\n",
    "\n",
    "HS_train = [[] for _ in range(1000)]\n",
    "HS_test = [[] for _ in range(350)]\n",
    "for i in range(each_len):\n",
    "    if i < 1000:\n",
    "        HS_train[i].append(''.join(history_plot.줄거리[i]))\n",
    "        train_labels.append(3)\n",
    "    else:\n",
    "        j = i- 1000\n",
    "        HS_test[j].append(''.join(history_plot.줄거리[i]))\n",
    "        test_labels.append(3)\n",
    "       \n",
    "\"\"\"    \n",
    "SC_train = [[] for _ in range(1000)]\n",
    "SC_test = [[] for _ in range(350)]\n",
    "for i in range(1350):\n",
    "    if i < 1000:\n",
    "        SC_train[i].append(''.join(social_plot.줄거리[i]))\n",
    "        train_labels.append(3)\n",
    "    else:\n",
    "        j = i-1000\n",
    "        SC_test[j].append(''.join(social_plot.줄거리[i]))\n",
    "        test_labels.append(3)\n",
    "  \n",
    "\n",
    "NN_train = [[] for _ in range(2000)]\n",
    "NN_test = [[] for _ in range(700)]\n",
    "for i in range(2700):\n",
    "    if i < 2000:\n",
    "        NN_train[i].append(''.join(non_plot.줄거리[i]))\n",
    "        train_labels.append(3)\n",
    "    else:\n",
    "        j = i - 2000 \n",
    "        NN_test[j].append(''.join(non_plot.줄거리[i]))\n",
    "        test_labels.append(3)        \n",
    "\n",
    "DR_train = [[] for _ in range(732)]\n",
    "DR_test = [[] for _ in range(732)]\n",
    "for i in range(1464):\n",
    "    if i < 732:\n",
    "        DR_train[i].append(''.join(drama_plot.줄거리[i]))\n",
    "        train_labels.append(5)\n",
    "    else:\n",
    "        j = 732 - i\n",
    "        DR_test[j].append(''.join(drama_plot.줄거리[i]))\n",
    "        test_labels.append(5)\n",
    " \"\"\"   \n",
    "Mu = [[] for _ in range(307)]\n",
    "for i in range(307):\n",
    "    Mu[i].append(''.join(musical_plot.muplot[i]))\n",
    "   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#allplot = RM_train+TH_train+FN_train+HS_train+SC_train+DR_train+RM_test+TH_test+FN_test+HS_test+SC_test+DR_test\n",
    "allplot = RM_train+RM_test+HS_train+HS_test+TH_train+FN_train+TH_test+FN_test#+HS_train+SC_train+HS_test+SC_test "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "alltrain = RM_train+TH_train+FN_train+HS_train \n",
    "alltest = RM_test+TH_test+FN_test+HS_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7000\n",
      "2450\n",
      "['In 1983, off-duty policeman Reiden picks up a suspicious couple, Kei and Ai, who claim to be father and daughter but appear to be close in age. Reiden thus becomes involved in a fight between them and the female fighter K2 who is chasing them. Kei draws out his latent ability to the maximum by a Legendary \"Trigger\" and Ai blows away K2. The night sky splits and strange space is shown. Their enemy is the secret society \"Fraud\" of para-psionics, which is commanded by Kuu Ragua Lee. Kei, Ai and Reiden meet the assassins whom Fraud sends out one after another.']\n"
     ]
    }
   ],
   "source": [
    "train_data_size = 7000\n",
    "test_data_size = 2450\n",
    "print(len(alltrain))\n",
    "print(len(alltest))\n",
    "\n",
    "print(HS_test[3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|█████████████████████████████████████████████████████████████████████████████| 9450/9450 [01:14<00:00, 127.26it/s]\n"
     ]
    }
   ],
   "source": [
    "from tqdm import tqdm\n",
    "all_vocab = {} \n",
    "all_sentences = []\n",
    "stop_words = set(stopwords.words('english'))\n",
    "\n",
    "for i in tqdm(allplot):\n",
    "    all_sentences = word_tokenize(str(i)) # 단어 토큰화를 수행합니다.\n",
    "    result = []\n",
    "    for word in all_sentences: \n",
    "        word = word.lower() # 모든 단어를 소문자화하여 단어의 개수를 줄입니다.\n",
    "        if word not in stop_words: # 단어 토큰화 된 결과에 대해서 불용어를 제거합니다.\n",
    "            if len(word) > 2: # 단어 길이가 2이하인 경우에 대하여 추가로 단어를 제거합니다.\n",
    "                result.append(word)\n",
    "                if word not in all_vocab:\n",
    "                    all_vocab[word] = 0 \n",
    "                all_vocab[word] += 1\n",
    "    all_sentences.append(result) \n",
    "    \n",
    "all_vocab_sorted = sorted(all_vocab.items(), key = lambda x:x[1], reverse = True)\n",
    "\n",
    "#전처리(4) 인덱스 부여\n",
    "all_word_to_index = {}\n",
    "i=0\n",
    "for (word, frequency) in all_vocab_sorted :\n",
    "    if frequency > 1 : # 정제(Cleaning) 챕터에서 언급했듯이 빈도수가 적은 단어는 제외한다.\n",
    "        i=i+1\n",
    "        all_word_to_index[word] = i\n",
    "#print(all_word_to_index)\n",
    "\n",
    "vocab_size = 15000 #상위 15000개 단어만 사용\n",
    "words_frequency = [w for w,c in all_word_to_index.items() if c >= vocab_size + 1] # 인덱스가 200 초과인 단어 제거\n",
    "for w in words_frequency:\n",
    "    del all_word_to_index[w] # 해당 단어에 대한 인덱스 정보를 삭제\n",
    "\n",
    "    \n",
    "all_word_to_index['OOV'] = len(all_word_to_index) + 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 학습 데이터 정리"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 인코딩"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|█████████████████████████████████████████████████████████████████████████████| 7000/7000 [00:57<00:00, 121.22it/s]\n"
     ]
    }
   ],
   "source": [
    "vocab = {} \n",
    "sentences = []\n",
    "stop_words = set(stopwords.words('english'))\n",
    "\n",
    "for i in tqdm(alltrain):\n",
    "    sentence = word_tokenize(str(i)) # 단어 토큰화를 수행합니다.\n",
    "    result = []\n",
    "\n",
    "    for word in sentence: \n",
    "        word = word.lower() # 모든 단어를 소문자화하여 단어의 개수를 줄입니다.\n",
    "        if word not in stop_words: # 단어 토큰화 된 결과에 대해서 불용어를 제거합니다.\n",
    "            if len(word) > 2: # 단어 길이가 2이하인 경우에 대하여 추가로 단어를 제거합니다.\n",
    "                result.append(word)\n",
    "                if word not in vocab:\n",
    "                    vocab[word] = 0 \n",
    "                vocab[word] += 1\n",
    "    sentences.append(result) \n",
    "\n",
    "train_encoded = []\n",
    "for s in sentences:\n",
    "    temp = []\n",
    "    for w in s:\n",
    "        try:\n",
    "            temp.append(all_word_to_index[w])\n",
    "        except KeyError:\n",
    "            temp.append(all_word_to_index['OOV'])\n",
    "    train_encoded.append(temp)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train = []\n",
    "Y_train = []\n",
    "for i in range(train_data_size):\n",
    "    label = [0,0,0,0]\n",
    "    X_train.append(train_encoded[i])\n",
    "    idx = train_labels[i]\n",
    "    label[idx] = 1\n",
    "    Y_train.append(label)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[692, 9385, 2345, 110, 65, 7, 853, 2276, 4248, 3049, 72, 15001, 252, 8415, 1097, 650, 15001, 102, 2501, 6068, 4184, 15001, 1371, 15001, 15001, 8535, 15001, 6187, 256, 3049, 11382, 2933, 2704, 1655, 638, 3049, 15001, 3443, 5395, 7992, 6877, 4524, 1810, 5157, 5857, 15001, 15001, 7287, 15001, 6816, 15001, 15001, 9320, 15001, 58, 14362, 3443, 15001, 11256, 15001, 30, 3332, 15001, 15001, 80, 15001, 13460, 15001, 15001, 1346, 4880, 647, 3049, 1208, 4407, 15001, 15001, 6326, 15001, 2273, 5330, 507, 1614, 15001, 93, 3147, 2273, 324, 3917, 15001, 8463, 4088, 3917, 15001, 15001, 6192, 132, 14085, 12141, 1985, 3899, 7663, 6634, 15001, 931, 15001, 46, 1710, 2276, 5699, 15001, 4850, 3443, 1104, 2095, 1313, 3407, 15001, 833, 15001, 2456, 221, 4125, 3049, 1451, 436, 15001, 15001, 11221, 3443, 1218, 8415, 436, 13189, 15001, 196, 27, 15001, 252, 3443, 80, 15001, 1363, 13056, 2456, 5536, 3443, 94, 3588, 15001, 7648, 6068, 3999, 7097, 15001, 15001, 2911, 1196, 3917, 15001, 15001, 358, 15001, 199, 2373, 2456, 15001, 108, 15001, 1885, 325, 8256, 9376, 3443, 1371, 2276, 110, 11863, 2345, 2213, 15001, 722, 26, 613, 15001, 1655, 2636, 15001, 103, 1655, 3443, 419, 812, 2276, 535, 3443, 15001, 1394, 15001, 9033, 3978, 1394, 110, 1099, 1399, 306, 15001, 9076, 1150, 4638, 206, 94, 6738, 535, 221, 25, 13189, 1194, 15001, 977, 307, 877, 935, 2345, 3443, 217, 15001, 5118, 1994, 15001, 102, 15001, 1473, 5395, 1371, 3443, 4253, 1473, 192, 15001, 3917, 1093, 15001, 31, 977, 563, 4655, 2386, 2933, 148, 259, 427, 11903, 4751, 573, 15001, 1428, 603, 2933, 1521, 2345, 13264, 110, 1897, 2526, 1874, 1473, 374, 31, 9521, 1056, 116, 5395, 1028, 1473, 46, 260, 5395, 199, 15001, 4677, 563, 10481, 11758, 3166, 1291, 15001, 1473, 10803, 1144, 2847, 94, 188, 199, 549, 5395, 138, 1440, 786, 3772, 199, 148, 1388, 10246, 15001, 409, 9047, 1619, 29, 12550, 5395, 94, 188, 9773, 80, 1, 3166, 1144, 2847, 15001, 31, 1144, 94, 9552, 15001, 700, 15001, 412, 3166, 5395, 175, 31, 1619, 29, 1720, 1028, 1473, 275, 122, 3166, 3443, 80, 3166, 1641, 85, 94, 484, 2456, 11355, 4861, 1119, 15001, 239, 3166, 2605, 4464, 1473, 9436, 3166, 1282, 35, 2890, 1266, 1026, 43, 1619, 29, 61, 967, 4749, 126, 3443, 2194, 7967, 3443, 239, 2933, 13394, 43, 3443, 5501, 1606, 7496, 15001, 9033, 2933, 3272, 920, 477, 1212, 2516, 2824, 3272, 3443, 304, 168, 70, 15001, 7017, 382, 693, 1664]\n"
     ]
    }
   ],
   "source": [
    "print(X_train[6000])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### [ 학습 데이터 ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "줄거리 최대 길이 :  2324\n",
      "줄거리 평균 길이 :  234.355\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAYoklEQVR4nO3df2yd1Z3n8fdn8oPsuLTYJKAEQ81so9ZspLaMBdmthdZlofxYbVjUqjWodRtP88eAp52OBJn1H3Rn5FUa7ZZNo1lUqL0DEhgq2inRBspGTEaVxZYlYRgSuNMmhQCeZMGdpLRxlB8O3/3jHodr+zq+/nmvn+fzkh7d5zn33Hu/T3z1vSfnOc85igjMzCwffq/aAZiZ2cJx0jczyxEnfTOzHHHSNzPLESd9M7McWVrtAM5n5cqV0dTUVO0wLMP27t3764hYtdCf6++2zafzfa9rOuk3NTWxZ8+eaodhGSbpzWp8rr/bNp/O9712946ZWY446ZuZ5YiTvplZjjjpm5nliJO+mVmOOOlnRH9/P+vWrWPJkiWsW7eO/v7+aodkZjWopodsWmX6+/vp7u6mt7eX1tZWBgYG6OzsBKC9vb3K0ZlZLXFLPwN6enro7e2lra2NZcuW0dbWRm9vLz09PdUOzcxqjJN+BhQKBVpbW8eUtba2UigUqhSRmdUqd+9kQHNzMwMDA7S1tZ0rGxgYoLm5uYpR2Uw1bd55bv/QllurGIllkVv6GdDd3U1nZye7d+/mzJkz7N69m87OTrq7u6sdms1S0+ad5zazueCWfgaMXqzt6uqiUCjQ3NxMT0+PL+Ka2QRO+hnR3t7uJG9mU3L3jplZjjjpm5nliJO+mVmOOOmbmeWIk76ZWY446ZuZ5YiTvplZjjjpm5nliJO+mVmOOOmbmeWIk76ZWY446ZuZ5YiTvplZjjjpW25t3LgR4JOS9o+WSWqQtEvSgfRYn8ol6XuSDkp6RdLVJa/pSPUPSOpY+DMxq5yTvuXWV7/6VYAD44o3A89FxFrguXQMcDOwNm2bgAeg+CMB3AdcC1wD3Df6Q2FWi5z0Lbeuu+46gJFxxRuAh9P+w8BtJeWPRNHPgYskrQY+B+yKiKMRcQzYBdw078GbzZCTvtlYl0bEEYD0eEkqvwx4u6TeYCqbrHwCSZsk7ZG0Z2hoaM4DN6uEk75ZZVSmLM5TPrEw4sGIaImIllWrVs1pcGaVctI3G+ud1G1Denw3lQ8Cl5fUawQOn6fcrCY56ZuNtQMYHYHTATxVUv6VNIpnPfBe6v55FrhRUn26gHtjKjOrSVMmfUmXS9otqSDpVUnfSOUe2maLWlpI/hPAxyUNSuoEtgA3SDoA3JCOAZ4GXgcOAg8BfwwQEUeBvwReTNtfpDKzmrS0gjojwJ9FxEuSLgT2StoFfJXi0LYtkjZTHNp2L2OHtl1LcWjbtSVD21oo9nnulbQjjXgwW3D9/f08/vjjr0REy7inrh9fNyICuKvc+0REH9A3DyGazbkpW/oRcSQiXkr7vwMKFEcneGhbDenv72fdunUsWbKEdevW0d/fX+2QzKwGVdLSP0dSE/Bp4AXGDW2TNCdD2yRtonjzC1dcccV0wsut/v5+uru76e3tpbW1lYGBATo7O4FzXRhmZsA0LuRK+hDwI+CbEfHb81UtU1bx0DYPa5u+np4eent7aWtrY9myZbS1tdHb20tPT0+1QzOzGlNR0pe0jGLCfzQifpyKPbStRhQKBVpbW8eUtba2UigUqhSRmdWqSkbvCOgFChHx3ZKnPLStRjQ3NzMwMDCmbGBggObm5ipFZGa1qpKW/meALwOflfRy2m7BQ9tqRnd3N52dnezevZszZ86we/duOjs76e7urnZoZlZjpryQGxEDlO+PBw9tqwmjF2u7urooFAo0NzfT09Pji7hmNsG0Ru9Y7Wpvb3eSN7MpeRoGM7MccdI3M8sRJ30zsxxx0jczyxEnfTOzHHHSzwhPuGZmlfCQzQzwhGtmVim39DPAE66ZWaWc9DPAE66ZWaWc9DPAE66ZWaWc9DPAE66ZWaV8ITcDPOGamVXKLX0zsxxxSz8DPGTTzCrlln4G9PT0cMcdd9DV1cWKFSvo6urijjvu8JBNM5vALf0MeO211zhx4sSElv6hQ4eqHZqZ1Ri39DNg+fLl3H333WNuzrr77rtZvnx5tUMzsxrjpJ8Bp0+fZvv27WOGbG7fvp3Tp09XOzQzqzHu3smAq666ittuu23MkM0777yTn/zkJ9UOzcxqjFv6GdDd3c1jjz3G9u3bOXnyJNu3b+exxx7zzVlmNoFb+hngm7PmnqQ/Bf4ICGAf8DVgNfA40AC8BHw5Ik5LugB4BPhD4J+BL0bEoWrEbTYVt/Qzor29nf3793P27Fn279/vhD8Lki4D/gRoiYh1wBLgS8B3gPsjYi1wDOhML+kEjkXEx4D7Uz2zmuSkb1beUuBfSFoK/D5wBPgs8GR6/mHgtrS/IR2Tnr9ekhYwVrOKOembjRMR/wT8V+Atisn+PWAv8JuIGEnVBoHL0v5lwNvptSOp/sULGbNZpZz0zcaRVE+x9X4lsAaoA24uUzVGX3Ke50rfd5OkPZL2DA0NzVW4ZtPipJ8RXiN3Tv074I2IGIqIM8CPgX8DXJS6ewAagcNpfxC4HCA9/xHg6Pg3jYgHI6IlIlpWrVo13+dgVpaTfgaMTrhWOmSzu7vbiX/m3gLWS/r91Dd/PfAasBv4fKrTATyV9nekY9LzfxsRE1r6ZrXAST8DvEbu3IqIFyhekH2J4nDN3wMeBO4FviXpIMU++970kl7g4lT+LWDzggdtViEn/QwoFAoMDg6O6d4ZHBz0GrmzEBH3RcQnImJdRHw5Ik5FxOsRcU1EfCwivhARp1Ldk+n4Y+n516sdv9lkfHNWBqxZs4Z7772XRx999Nwsm3feeSdr1qypdmhmVmPc0s+I8V3I7lI2s3Kc9DPg8OHDbN26dcwiKlu3buXw4cNTv9jMcmXKpC+pT9K7kvaXlH1b0j9Jejltt5Q89+eSDkr6haTPlZTflMoOSvKFrjnU3NxMY2PjmGkYGhsbaW5urnZoZlZjKmnp/zVwU5ny+yPiU2l7GkDSVRTnKPlX6TX/Q9ISSUuAv6J4g8tVQHuqa3Ogu7ubzs7OMfPpd3Z2epZNM5tgygu5EfEzSU0Vvt8G4PE0quGNNITtmvTcwdFRDZIeT3Vfm3bENkF7ezvPP/88N998M6dOneKCCy7g61//uiddy5imzTsBOLTl1ipHYovZbPr075b0Sur+qU9l5+YgSUbnJ5ms3OZAf38/O3fu5JlnnuH06dM888wz7Ny50zdnmdkEM036DwD/EvgUxQmp/lsqn2wOkormJgHPTzITvjnLzCo1o6QfEe9ExNmIeB94iA+6cM7NQZKMzk8yWXm59/b8JNNUKBRobW0dU9ba2uqbs8xsghklfUmrSw7/IzA6smcH8CVJF0i6ElgL/F/gRWCtpCslLad4sXfHzMO2Us3NzQwMDIwpGxgY8OgdM5ugkiGb/cD/AT4uaVBSJ7BV0j5JrwBtwJ8CRMSrwA8pXqD9KXBX+h/BCHA38CxQAH6Y6toc8OgdM6tUJaN3yg0B6S1TNlq/B5jQmZyGdT49reisIl4j18wq5bl3MqK9vd1J3sym5GkYMsKLqJhZJdzSz4DRRVR6e3vPzbLZ2dkJ4Na/mY3hln4GeJy+mVXKST8DPE7fzCrlpJ8BHqdvZpVyn34GdHd388UvfpG6ujrefPNNPvrRjzI8PMy2bduqHZqZ1Ri39DNGKjfNkZlZkZN+BvT09PDEE0/wxhtvcPbsWd544w2eeOIJX8g1swmc9DOgUCgwODg4Zpz+4OCgL+Sa2QRO+hmwZs0aurq6GB4eJiIYHh6mq6uLNWvWVDs0M6sxTvoZcOLECY4fP05XV9eYxxMnTlQ7NDOrMU76GXD06FHuuece+vr6uPDCC+nr6+Oee+7h6NGj1Q7NzGqMk35GtLW1sX//fs6ePcv+/ftpa2urdkhmVoOc9DOgsbGRjo6OMfPpd3R00NjYWO3QzKzGOOlnwNatWxkZGWHjxo2sWLGCjRs3MjIywtatW6sdmpnVGCf9DGhvb2fbtm3U1dUBUFdXx7Zt2zzDpplN4KRvVoakiyQ9KekfJRUk/WtJDZJ2STqQHutTXUn6nqSDkl6RdHW14zebjJN+BozOp799+3ZOnjzJ9u3b6e7u9kIqs7MN+GlEfAL4JMW1nTcDz0XEWuC5dAxwM7A2bZuABxY+XLPKOOlngOfTn1uSPgxcR1oLOiJOR8RvgA3Aw6naw8BtaX8D8EgU/Ry4SNLq+YqvafPOc5vZdDnpZ4Dn059zfwAMAf9T0t9L+oGkOuDSiDgCkB4vSfUvA94uef1gKhtD0iZJeyTtGRoamt8zMJuEk34GeD79ObcUuBp4ICI+DQzzQVdOOeWmNo0JBREPRkRLRLSsWrVqbiI1mybPp58B3d3dbNiwgZMnT3LmzBmWLVvGihUr+P73v1/t0BarQWAwIl5Ix09STPrvSFodEUdS9827JfUvL3l9I3B4waI1mwa39DPg+eefZ3h4mIaGBiTR0NDA8PAwzz//fLVDW5Qi4v8Bb0v6eCq6HngN2AF0pLIO4Km0vwP4ShrFsx54b7QbyKzWOOlnwEMPPUR7ezsrV65EEitXrqS9vZ2HHnqo2qEtZl3Ao5JeAT4F/BdgC3CDpAPADekY4GngdeAg8BDwxwsfrlll3L2TAadOnWLHjh2cPHmS999/n1/+8pe89dZbnDp1qtqhLVoR8TLQUuap68vUDeCueQ/KbA64pZ8Rx48fH9O9c/z48WqHZGY1yEk/IyKC22+/nWPHjnH77bdTbHyamY3lpJ8R69evp6+vj4suuoi+vj7Wr19f7ZDMrAY56WfEvn37WL16NZJYvXo1+/btq3ZIZlaDfCE3A+rq6hgeHmZ4eBiAQ4cOnSs3Myvlln4GjIyMTKvczPLLST8DRodm1tfXI4n6+vox5WZmo9y9kxGNjY0MDQ0REZw4cYLGxkYGBwerHZYtoNJZNw9tubWKkVgtm7KlL6lP0ruS9peUTXsxCUkdqf4BSR3lPstmbnBwkDNnzgBw5swZJ3wzK6uS7p2/Bm4aVzatxSQkNQD3AdcC1wD3jf5Q2Nx5//33xzyamY03ZdKPiJ8BR8cVT3cxic8BuyLiaEQcA3Yx8YfEzMzm2Uz79McsJiFpqsUkKlpkAooLTVD8XwJXXHHFDMMzywevnmXTNdejdyZbTKKiRSbAC02Ymc2nmSb9d0bXAK1wMQkvMmG2gLyGrk1mpkl/uotJPAvcKKk+XcC9MZWZmdkCmrJPX1I/8G+BlZIGKY7C2QL8UFIn8BbwhVT9aeAWiotJnAC+BhARRyX9JfBiqvcXETH+4rCZmc2zKZN+RLRP8tS0FpOIiD6gb1rR2bQsW7bs3Bq5o2P2zcxKeRqGDCm9OcvMrBwnfTOzHPHcO2YZ5vl4bDy39M3McsRJ38wsR5z0M0TSmEczs/Gc9DOkOGL2g0czs/Gc9M3McsRJ38wsRzxk0ywnPHzTwC19M7NccdI3m4SkJZL+XtL/SsdXSnohrfP8hKTlqfyCdHwwPd9UzbjNzsdJ32xy3wAKJcffAe5Pa0MfAzpTeSdwLCI+Btyf6pnVJCd9szIkNQK3Aj9IxwI+CzyZqoxfG3p0zegngevlmyWsRjnpm5X334F7gPfT8cXAbyJiJB2XrvN8bg3o9Px7qf4YkjZJ2iNpz9DQ0HzGbjYpJ32zcST9e+DdiNhbWlymalTw3AcFXv/ZaoCHbC5SlfYejNbzXbrT8hngP0i6BVgBfJhiy/8iSUtTa750nefRNaAHJS0FPgJ4ZTirSW7pL1IRcW6rq6sDYOnSpWMe6+rqztWxykXEn0dEY0Q0AV8C/jYi7gR2A59P1cavDT26ZvTnU33/o1tNctLPgOPHj1NXV8fISLG7eWRkhLq6Oo4fP17lyDLnXuBbkg5S7LPvTeW9wMWp/FvA5irFZzYld+9kxGiCl+SW/RyKiL8D/i7tvw5cU6bOSeALCxqY2Qy5pW9mliNO+mZmOeKkb2aWI076ZmY54qRvZpYjTvpmZjnipG9mliNO+mZmOeKkb2aWI076ZmY54qRvZpYjTvpmZjnipG9mliOeZdMsh5o27zy3f2jLrVWMxBbarFr6kg5J2ifpZUl7UlmDpF2SDqTH+lQuSd+TdFDSK5KunosTMDOzys1F905bRHwqIlrS8WbguYhYCzzHBwtK3AysTdsm4IE5+GwzM5uG+ejT3wA8nPYfBm4rKX8kin5Ocb3R1fPw+WZmNonZJv0A/rekvZI2pbJLI+IIQHq8JJVfBrxd8trBVDaGpE2S9kjaMzQ0NMvwzMys1Gwv5H4mIg5LugTYJekfz1NXZcomrOsXEQ8CDwK0tLR43T8zszk0q5Z+RBxOj+8Cf0Nx/dB3Rrtt0uO7qfogcHnJyxuBw7P5fDMzm54ZJ31JdZIuHN0HbgT2AzuAjlStA3gq7e8AvpJG8awH3hvtBjIzs4Uxm+6dS4G/kTT6Po9FxE8lvQj8UFIn8BbwhVT/aeAW4CBwAvjaLD7bzMxmYMZJPyJeBz5ZpvyfgevLlAdw10w/z8zMZs/TMJiZ5YiTvplZjjjpm5nliJO+mVmOOOmbmeWIk77ZOJIul7RbUkHSq5K+kcozOYNs0+adY6Zatmxz0jebaAT4s4hoBtYDd0m6Cs8gaxngpF/jGhoakFTxBlRct6GhocpnV5si4khEvJT2fwcUKE4O6BlkbdHzylk17tixYxTva5t7oz8SNjlJTcCngRcYN4NsmmgQJp9Bdsw0I2km2k0AV1xxxbzGbTYZt/TNJiHpQ8CPgG9GxG/PV7VMWdkZZCOiJSJaVq1aNVdhmk2Lk75ZGZKWUUz4j0bEj1OxZ5C1Rc9J32wcFfu9eoFCRHy35CnPIGuLnvv0zSb6DPBlYJ+kl1PZfwK24BlkbZFz0jcbJyIGKN9PD55B1hY5d++YmeWIk76ZWY446ZuZ5YiTvplZjvhCrpkBjJl07dCWW6sYic0nt/TNzHLESd/MLEfcvVPj4r4Pw7c/Mn/vbWa54qRf4/Sffzuvs2zGt+flrc2sRrl7x8wsR5z0zcxyxEnfzCxHnPTNzHLESd/MLEc8esfMJvDdudnllr6ZWY64pb8IFFfvm3v19fXz8r5mVruc9GvcdG/MkjRvN3NZPrmrJ1vcvWNmliNO+mZmObLgSV/STZJ+IemgpM0L/flmZnm2oElf0hLgr4CbgauAdklXLWQMZmZ5ttAt/WuAgxHxekScBh4HNixwDJkgqew22XNmZrDwSf8y4O2S48FUdo6kTZL2SNozNDS0oMEtJhExrc1sLjRt3jlmNI8tPgud9Ms1OcdkpIh4MCJaIqJl1apVCxSWmVk+LHTSHwQuLzluBA4vcAxmZrm10En/RWCtpCslLQe+BOxY4BjMzHJrQe/IjYgRSXcDzwJLgL6IeHUhYzCz2fNduovXgk/DEBFPA08v9Oea2fzwD8Di4rl3zOaApJuAbRT/B/uDiNhS5ZCqYrKRPf4xqB1O+mazVHLT4Q0UByu8KGlHRLxW3chqx3SGefoHYn456ZvN3rmbDgEkjd506KQ/A+V+ICb7IRit6x+KytV00t+7d++vJb1Z7TgWmZXAr6sdxCLy0Tl4j3I3HV47vpKkTcCmdHhc0i/KvFfe/n4Vna++M7vna8hC/X0n/V7XdNKPCN+dNU2S9kRES7XjyJkpbzqE4o2HwIPnfaOc/f18vgvPUyubzZ5vOrRFw0nfbPZ806EtGjXdvWMzct7uA5t7c3zTYd7+fj7fBSbPwGhmlh/u3jEzyxEnfTOzHHHSzwBJfZLelbS/2rHYzGVl/ehy30dJDZJ2STqQHutTuSR9L53zK5KuLnlNR6p/QFJHNc5lKpIul7RbUkHSq5K+kcpr93ynuwKTt9rbgOuAq4H91Y7F24z/hkuAXwF/ACwH/gG4qtpxzfBcJnwfga3A5rS/GfhO2r8FeIbivQ7rgRdSeQPwenqsT/v11T63Mue6Grg67V8I/JLi+t81e75u6WdARPwMOFrtOGxWMrN+9CTfxw3Aw2n/YeC2kvJHoujnwEWSVgOfA3ZFxNGIOAbsAm6a/+inJyKORMRLaf93QIHiHdo1e75O+ma1Ycr1oxe5SyPiCBQTJXBJKp/svBfdv4ekJuDTwAvU8Pk66ZvVhoqmcsigyc57Uf17SPoQ8CPgmxHx2/NVLVO2oOfrpG9WG7I+lcM7qRuD9PhuKp/svBfNv4ekZRQT/qMR8eNUXLPn66RvVhuyPpXDDmB0REoH8FRJ+VfSqJb1wHupO+RZ4EZJ9Wnky42prKZIEtALFCLiuyVP1e75Vvvqt7fZb0A/cAQ4Q7HF0FntmLzN6O94C8XRH78CuqsdzyzOY8L3EbgYeA44kB4bUl1RXIDmV8A+oKXkfTYCB9P2tWqf1yTn2kqxG+YV4OW03VLL5+tpGMzMcsTdO2ZmOeKkb2aWI076ZmY54qRvZpYjTvpmZjnipG9mliNO+mZmOfL/Ac8DRfc6CQUrAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "len_result = [len(s) for s in X_train]\n",
    "print(\"줄거리 최대 길이 : \",max(len_result))\n",
    "print(\"줄거리 평균 길이 : \",sum(len_result)/len(len_result))\n",
    "\n",
    "plt.subplot(1,2,1)\n",
    "plt.boxplot(len_result)\n",
    "plt.subplot(1,2,2)\n",
    "plt.hist(len_result, bins=50)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 0, 0, 1]\n",
      "[652, 15001, 130, 607, 576, 2448, 1656, 57, 943, 2956, 14, 157, 1539, 11530, 4330, 1539, 809, 652, 5218, 607, 2453, 2751, 9090, 5156, 652, 1196, 8, 4302, 8345, 15001, 339, 607, 7879, 19, 2575, 2720, 32, 1787, 29, 3, 127, 652, 3074, 15001, 544, 9387, 339, 650, 62, 3490, 9839, 1461, 15001, 21, 523, 17, 2, 1870, 146, 3104, 15001, 117, 235, 235, 53, 15001, 2664, 1237, 1832, 15001, 652, 1656, 1076, 607, 2720, 883, 212, 1095, 15001, 1126, 176, 15001, 8851, 15001, 1320, 15001, 15001, 1231, 19, 661, 393, 12959, 1790, 1372, 766, 652, 55, 1237, 50, 98, 6526, 15001, 36, 15001, 3074, 14, 652, 138, 1427, 1480, 58, 1037, 390, 168, 1818, 26, 599, 5601, 296, 420, 15, 9387, 107, 15001, 862, 181, 604, 55, 15001, 8085, 1608, 10714, 607, 1656, 2741, 1076, 685, 1613, 1351, 439, 406, 5398, 2592, 4495, 1320, 2575, 278, 3450, 607, 1769, 57, 439, 3183, 3426, 7573, 15001, 3646, 652, 61, 387, 607, 254, 57, 1790, 3355, 1987, 1237, 1656, 15001, 16, 103, 607, 502, 29, 226, 4621, 14, 176, 1, 241, 303, 29, 607, 1290, 169, 607, 667, 685, 264, 15001, 8, 15001, 1402, 15001, 275, 11236, 15001, 6530, 15001, 8086, 15001, 178, 239, 607, 2837, 17, 8606, 10, 9387, 1406, 652, 2127, 28, 547, 15, 1692, 652, 3445, 81, 685, 427, 2239, 163, 1237, 171, 2720, 273, 416, 607, 175, 13971, 245, 6940, 18, 66, 15001, 2720, 234, 2131, 387, 450, 652, 83, 607, 3346, 36, 4, 2550, 486, 607, 2080, 64, 14, 652, 5099, 7154, 2597, 652, 385, 6125, 56, 2597, 265, 255, 607, 163, 226, 2720, 10273, 461, 2128, 105, 209, 127, 1889, 5354, 105, 2720, 357, 502, 607, 51, 900, 4416, 652, 69, 9387, 2635, 138, 1260, 2, 607, 320, 1487, 2069, 1785, 607, 68, 118, 2862, 3429, 652, 1, 2321, 8332, 1548, 15001, 5470, 12337, 453, 5554, 58, 33, 702, 687, 15001, 1825, 125, 764, 15001, 15001, 6987, 1237, 9036, 22, 607, 895, 652, 306, 22, 512, 400, 297, 746, 4719, 1376, 3133, 4, 633, 42, 24, 312, 9387, 66, 5040, 522, 9387, 1029, 652, 1715, 1238, 1961, 4075, 253]\n"
     ]
    }
   ],
   "source": [
    "print(Y_train[6999])\n",
    "print(X_train[701])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 테스트 데이터 정리"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 인코딩"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|█████████████████████████████████████████████████████████████████████████████| 2450/2450 [00:19<00:00, 123.36it/s]\n"
     ]
    }
   ],
   "source": [
    "###\n",
    "vocab1 = {} \n",
    "sentences1 = []\n",
    "stop_words1 = set(stopwords.words('english'))\n",
    "\n",
    "for i in tqdm(alltest):\n",
    "    sentence1 = word_tokenize(str(i)) # 단어 토큰화를 수행합니다.\n",
    "    result1 = []\n",
    "\n",
    "    for word in sentence1: \n",
    "        word = word.lower() # 모든 단어를 소문자화하여 단어의 개수를 줄입니다.\n",
    "        if word not in stop_words1: # 단어 토큰화 된 결과에 대해서 불용어를 제거합니다.\n",
    "            if len(word) > 2: # 단어 길이가 2이하인 경우에 대하여 추가로 단어를 제거합니다.\n",
    "                result1.append(word)\n",
    "                if word not in vocab1:\n",
    "                    vocab1[word] = 0 \n",
    "                vocab1[word] += 1\n",
    "    sentences1.append(result1) \n",
    "\n",
    "test_encoded = []\n",
    "for s in sentences1:\n",
    "    temp = []\n",
    "    for w in s:\n",
    "        try:\n",
    "            temp.append(all_word_to_index[w])\n",
    "        except KeyError:\n",
    "            temp.append(all_word_to_index['OOV'])\n",
    "    test_encoded.append(temp)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test = []\n",
    "Y_test = []\n",
    "for i in range(test_data_size):\n",
    "    label = [0,0,0,0]\n",
    "    X_test.append(test_encoded[i])\n",
    "    idx = test_labels[i]\n",
    "    label[idx] = 1\n",
    "    Y_test.append(label)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### [ 테스트 데이터 ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "줄거리 최대 길이 :  2593\n",
      "줄거리 평균 길이 :  203.25755102040816\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAZ1klEQVR4nO3dfYxc1Znn8e/P7bcNefEL5mV5WZMZk23TUgzbAq9ojWjYOIagMSOFFQ0K3tArD1rcInJWG4eWljCjFrCCsCSwSGbaGyNBEyYJwgomjNfTUdRoIDaMB2wqrI3HCR4c7MEOeEFDN+bZP+qUqe6u7q5+q6qu+/tIpbr3uaeqz3WXnz517rnnKCIwM7NsmFXtCpiZWeU46ZuZZYiTvplZhjjpm5lliJO+mVmGzK52BUZz+umnx9KlS6tdDatjL7/88j9HxJJK/1x/tm06jfa5rumkv3TpUnbt2lXtalgdk/TbErEvAT8uCn0R+O/AYym+FDgI/MeIOC5JwIPANcCHwH+KiFdG+7n+bNt0KvW5LnD3jtkQEfFGRKyIiBXAvyOfyJ8GNgI7ImIZsCPtA1wNLEuPdcAjla+1WXmc9M1GdxXwZkT8FlgDbEnxLcB1aXsN8FjkvQgskHR25atqNjYnfbPR3QD0pO0zI+IwQHo+I8XPAd4qes2hFBtE0jpJuyTtOnr06DRW2WxkTvpmI5A0F/hT4K/HKloiNmx+k4jYFBHNEdG8ZEnFrx2bAU76ZqO5GnglIt5J++8Uum3S85EUPwScV/S6c4G3K1ZLs3Fw0q8TPT09NDU10dDQQFNTEz09PWO/yMbSxqddOwBbgbVpey3wTFH8ZuWtBN4rdAOZ1ZqaHrJp5enp6aGzs5Pu7m5aWlro6+ujvb0dgLa2tirXbmaS9BngK8CfF4XvAZ6S1A78Drg+xbeRH665n/xIn29WsKpm46Janlq5ubk5PJZ5bE1NTfzwhz+ktbX1VKy3t5eOjg727NlTxZrVPkkvR0RzpX+uP9s2nUb7XLt7pw7kcjlaWloGxVpaWsjlclWqkZnVKnfv1IHGxkb6+voGtfT7+vpobGysYq1sopZufPbU9sF7vlbFmlg9cku/DnR2dtLe3k5vby8DAwP09vbS3t5OZ2dntatmZjXGLf06ULhY29HRQS6Xo7Gxka6uLl/ENbNhxmzpSzpPUq+knKS9km5P8e9J+idJu9PjmqLXfFfSfklvSPpqUXx1iu2XtLHUz7OJaWtrY8+ePZw8eZI9e/Y44ZtZSeW09D8Gvh0Rr0j6HPCypO3p2AMRcV9xYUnLyd+6fhHwr4H/I+nCdPhh8sPgDgE7JW2NiNen4kTMzGxsYyb9dJNJYb6RE5JylJhXpMga4MmI+Aj4R0n7gUvTsf0RcQBA0pOprJO+mVmFjOtCrqSlwMXASym0XtKrkjZLWphiI00+5UmpzMyqrOykL+mzwE+Bb0XE++TnDP8jYAX5bwL3F4qWeHmMEh8c8KRUZmbTpqzRO5LmkE/4j0fEzwCKJqFC0qPAz9PuaJNPeVIqM7MqKmf0joBuIBcR3y+KFy8S8WdA4X7/rcANkuZJuoD8akK/BnYCyyRdkKasvSGVNTOzCimnpX858A3gNUm7U+wOoE3SCvJdNAdJE1NFxF5JT5G/QPsxcFtEnASQtB54HmgANkfE3ik8FzMzG0M5o3f6KN0fv22U13QBXSXi20Z7nZmZTS9Pw2BmliFO+mZmGeKkb2aWIU76ZmYZ4qRvZpYhTvpmZhnipG9mliFO+mZmGeKkb2aWIU76ZiVIWiDpJ5J+k1aN+/eSFknaLmlfel6YykrSD9KKcK9KuqTa9TcbiZO+WWkPAr+IiH8LfBnIARuBHRGxDNiR9gGuJj+x4DJgHflpx81qkpO+2RCSPg/8CfnZZYmI/oj4A/mV3rakYluA69L2GuCxyHsRWDBkFlqzmuGkbzbcF4GjwP+W9PeS/krSacCZafnQwjKiZ6TyXhXOZgwnfbPhZgOXAI9ExMXAB3zalVOKV4WzGcNJ32y4Q8ChiCisBf0T8n8E3il026TnI0XlvSqczQhO+mZDRMTvgbckfSmFriK/KNBWYG2KrQWeSdtbgZvTKJ6VwHuFbiCzWlPWGrlmGdQBPJ6W9jwAfJN8I+kpSe3A74DrU9ltwDXAfuDDVNasJjnpm5UQEbuB5hKHripRNoDbpr1SZlPA3TtmZhnipG9mliFO+mZmGeKkb2aWIU76ZmYZ4qRvZpYhTvpmZhnipG9mliFO+mZmGeKkb2aWIU76ZmYZMmbSl3SepN60TuheSben+LjXC5W0NpXfJ2ntSD/TzMymRzkt/Y+Bb0dEI7ASuE3Scsa5XqikRcCdwGXApcCdhT8UZmZWGWMm/Yg4HBGvpO0T5BeIPofxrxf6VWB7RByLiOPAdmD1lJ6NmZmNalx9+pKWAhcDLzH+9ULLWkfUzMymT9lJX9JngZ8C34qI90crWiIWo8SH/hwvHm1mNk3KSvqS5pBP+I9HxM9SeLzrhZa1jqgXjzYzmz7ljN4R0A3kIuL7RYfGu17o88AqSQvTBdxVKWZmZhVSznKJlwPfAF6TtDvF7gDuYRzrhUbEMUl/CexM5f4iIo5NyVmYmVlZxkz6EdFH6f54GOd6oRGxGdg8ngqamdnU8R25ZmYZ4qRvVoKkg5Jek7Rb0q4UG/dd6Ga1xknfbGStEbEiIprT/rjuQjerRU76ZuUb713oZjXHSd+stAD+RtLLktal2HjvQh/ENx5aLShnyKZZFl0eEW9LOgPYLuk3o5Qt627ziNgEbAJobm4edtysEtzSNyshIt5Oz0eAp8nPDDveu9DNao6TvtkQkk6T9LnCNvm7x/cw/rvQzWqOu3fMhjsTeDo/AwmzgSci4heSdjKOu9DNapGTvtkQEXEA+HKJ+LuM8y50s1rj7h0zswxx0q8TPT09NDU10dDQQFNTEz09PdWukpnVIHfv1IGenh46Ozvp7u6mpaWFvr4+2tvbAWhra6ty7cyslrilXwe6urro7u6mtbWVOXPm0NraSnd3N11dXdWumpnVGCf9OpDL5WhpaRkUa2lpIZfLValGZlarnPTrQGNjI319fYNifX19NDY2VqlGZlarnPTrQGdnJ+3t7fT29jIwMEBvby/t7e10dnZWu2pmVmN8IbcOFC7WdnR0kMvlaGxspKuryxdxzWwYJ/060dbW5iRvZmNy946ZWYY46ZuZZYiTvplZhjjpm5lliC/kmtWwpRufPbV98J6vVbEmVi/c0jczyxAnfTOzDHHSNzPLECd9M7MMcdI3M8sQJ30zswwZM+lL2izpiKQ9RbHvSfonSbvT45qiY9+VtF/SG5K+WhRfnWL7JW2c+lMxM7OxlNPS/xGwukT8gYhYkR7bACQtB24ALkqv+V+SGiQ1AA8DVwPLgbZU1szMKmjMpB8RvwKOlfl+a4AnI+KjiPhHYD9waXrsj4gDEdEPPJnKmtWs1GD5e0k/T/sXSHpJ0j5JP5Y0N8Xnpf396fjSatbbbDST6dNfL+nV1P2zMMXOAd4qKnMoxUaKDyNpnaRdknYdPXp0EtUzm7TbgeI1J+8l/w13GXAcaE/xduB4RPwx8EAqZ1aTJpr0HwH+CFgBHAbuT3GVKBujxIcHIzZFRHNENC9ZsmSC1TObHEnnAl8D/irtC7gS+EkqsgW4Lm2vSfuk41el8mY1Z0JJPyLeiYiTEfEJ8Cj57hvIt+DPKyp6LvD2KHGzWvU/gf8GfJL2FwN/iIiP037xt9VT32TT8fdS+UH8LdZqwYSSvqSzi3b/DCiM7NkK3JD6OC8AlgG/BnYCy1Kf6FzyF3u3TrzaZtNH0rXAkYh4uThcomiUcezTgL/FWg0Yc5ZNST3AFcDpkg4BdwJXSFpB/oN9EPhzgIjYK+kp4HXgY+C2iDiZ3mc98DzQAGyOiL1TfjZmU+Ny4E/TUOT5wOfJt/wXSJqdWvPF31YL32QPSZoNfIHyBz+YVdSYST8iSi282j1K+S6gq0R8G7BtXLUzq4KI+C7wXQBJVwD/NSJukvTXwNfJjz5bCzyTXrI17f9dOv63EVHympVZtfmOXLPyfQfYIGk/+T77QuOnG1ic4hsA33xoNcuLqJiNIiJ+CfwybR/g00ELxWX+Bbi+ohUzmyC39M3MMsRJ38wsQ5z0zcwyxEnfzCxDnPTNzDLESd/MLEOc9M3MMsRJ38wsQ5z0zcwyxEnfzCxDnPTNzDLESb9O9PT00NTURENDA01NTfT09FS7SmZWgzzhWh3o6emhs7OT7u5uWlpa6Ovro709v3xrW1upmbHNLKvc0q8DXV1ddHd309raypw5c2htbaW7u5uurmHLGphZxjnp14FcLkdLS8ugWEtLC7lcrko1MrNa5aRfBxobG+nr6xsU6+vro7GxsUo1MrNa5aRfBzo7O2lvb6e3t5eBgQF6e3tpb2+ns7Oz2lUzsxrjC7l1oHCxtqOjg1wuR2NjI11dXb6Ia2bDuKVvZpYhbunXAQ/ZNLNyuaVfBzxk08zK5aRfBzxkc2pJmi/p15L+QdJeSXel+AWSXpK0T9KPJc1N8Xlpf386vrSa9TcbjZN+HWhsbOSuu+4aNA3DXXfd5SGbE/cRcGVEfBlYAayWtBK4F3ggIpYBx4H2VL4dOB4Rfww8kMqZ1SQn/TrQ2trKvffeyy233MKJEye45ZZbuPfee2ltba121WakyPt/aXdOegRwJfCTFN8CXJe216R90vGrJKlC1TUbFyf9OtDb28u1117LHXfcwWmnncYdd9zBtddeS29vb7WrNmNJapC0GzgCbAfeBP4QER+nIoeAc9L2OcBbAOn4e8DiEu+5TtIuSbuOHj063adgVpKTfh14/fXX2b17N8899xz9/f0899xz7N69m9dff73aVZuxIuJkRKwAzgUuBUr1lUV6LtWqj2GBiE0R0RwRzUuWLJm6ypqNg5N+HZg7dy4dHR2DRu90dHQwd+7caldtxouIPwC/BFYCCyQVhjmfC7ydtg8B5wGk418AjlW2pmblGTPpS9os6YikPUWxRZK2p1EM2yUtTHFJ+kEaxfCqpEuKXrM2ld8nae30nE429ff389BDDw2ahuGhhx6iv7+/2lWbkSQtkbQgbf8r4D8AOaAX+HoqthZ4Jm1vTfuk438bEcNa+ma1oJyW/o+A1UNiG4EdaRTDjrQPcDWwLD3WAY9A/o8EcCdwGfmvyncW/lDY5C1fvpwbb7yRjo4O5s+fT0dHBzfeeCPLly+vdtVmqrOBXkmvAjuB7RHxc+A7wAZJ+8n32Xen8t3A4hTfwKf/H8xqzph35EbEr0qMO14DXJG2t5D/+vudFH8stXJelLRA0tmp7PaIOAYgaTv5PyRe3mkKdHZ2cvvtt3PaaacB8MEHH7Bp0yYefPDBKtdsZoqIV4GLS8QPkG+0DI3/C3B9BapmNmkTnYbhzIg4DBARhyWdkeKnRjEkhREOI8WHkbSO/LcEzj///AlWL3tOnDhBYUTIwYMHmT9/fpVrZGa1aKov5I40iqGs0Q3gEQ4TsX79egYGBrj//vv54IMPuP/++xkYGGD9+vXVrpqZ1ZiJJv13UrcN6flIip8axZAURjiMFLcpcOzYMe6++242bNjAZz7zGTZs2MDdd9/NsWMeQGJmg0006RePVhg6iuHmNIpnJfBe6gZ6HlglaWG6gLsqxWyKNDU1jbpvZgblDdnsAf4O+JKkQ5LagXuAr0jaB3wl7QNsAw4A+4FHgf8CkC7g/iX5kRA7gb8oXNS1yZs9ezY33XTToCGbN910E7Nne+ZsMxusnNE7I03IflWJsgHcNsL7bAY2j6t2VpZbb72Vhx9+mCuvvPJUTBK33VbyV2FmGeY7cutIoWXvFr6ZjcRJvw48+uij3HfffQwMDBARDAwMcN999/Hoo49Wu2pmVmOc9OvARx99xK233jooduutt/LRRx9VqUZmVquc9OvAvHnzWLVqFfPnz0cS8+fPZ9WqVcybN6/aVTOzGuOkXwcuvPBCXnjhhVMTrPX39/PCCy9w4YUXVrlmZlZrnPTrQGHe/MLEjoVnz6dvZkM56deBkydPAnDWWWcxa9YszjrrrEFxM7MCJ/060dDQwLvvvssnn3zCu+++S0NDQ7WrZGY1yEm/Tpw8efLUhdt58+a5lV+Hlm58lqUbn612NWyGc9KvIx9++OGgZzOzoZz064ikQc9mZkM56deJefPmMWtW/tc5a9Ysj9E3s5Kc9OvAokWL6O/vZ/HixcyaNYvFixfT39/PokWLql01M6sxnplrhirVhfP73/9+0POxY8dOlSuM3TezbHNLf4aKiEGPJ554gosuugiAiy66iCeeeGLQcTMzcEu/brS1tdHW1oYk9uzZU+3qmFmNckvfbAhJ50nqlZSTtFfS7Sm+SNJ2SfvS88IUl6QfSNov6VVJl1T3DMxG5qRvNtzHwLcjohFYCdwmaTmwEdgREcuAHWkf4GpgWXqsAx6pfJXNyuOkbzZERByOiFfS9gkgB5wDrAG2pGJbgOvS9hrgsch7EVgg6ewKV9usLE76ZqOQtBS4GHgJODMiDkP+DwNwRip2DvBW0csOpdjQ91onaZekXUePHp3OapuNyEnfbASSPgv8FPhWRLw/WtESsWFDpiJiU0Q0R0TzkiVLpqqaZuPipG9WgqQ55BP+4xHxsxR+p9Btk56PpPgh4Lyil58LvF2pupqNh4dsmg2h/B1t3UAuIr5fdGgrsBa4Jz0/UxRfL+lJ4DLgvUI3ULUVz8p58J6vVbEmViuc9M2Guxz4BvCapN0pdgf5ZP+UpHbgd8D16dg24BpgP/Ah8M3KVtesfE76ZkNERB+l++kBripRPoDbprVSZlPEffpmZhnipG9mliFO+mZmGeKkb2aWIZNK+pIOSnpN0m5Ju1LMk1KZmdWoqWjpt0bEiohoTvuelMrMrEZNR/eOJ6UyM6tRk036AfyNpJclrUsxT0plZlajJntz1uUR8bakM4Dtkn4zStmyJ6UCNgE0Nzd7nT8zsyk0qaQfEW+n5yOSngYuJU1KFRGHPSmVWXUUz7ljVmzC3TuSTpP0ucI2sArYw6eTUsHwSaluTqN4VlJDk1KZmWXFZFr6ZwJP5yckZDbwRET8QtJOPCmV2bQZaeZMt+6tHBNO+hFxAPhyifi7eFIqs4pworfx8h25ZmYZ4qRvZpYhTvpmZhnipG9mliFO+mZmGeKkb2aWIU76ZmYZ4qRf4xYtWoSksh9A2WUXLVpU5bMzs0pz0q9xx48fJyKm5XH8+PFqn15NkrRZ0hFJe4piXhzI6oKTvtlwPwJWD4l5cSCrC5OdWtms7kTEryQtHRJeA1yRtrcAvwS+Q9HiQMCLkhYUZpmtTG3LN9KcPZYtbumblWdSiwOBFwiy2uCkbzY5ZS0OBPkFgiKiOSKalyxZMs3VMivNSd+sPO8U1nT24kA2kznpm5XHiwNZXfCFXLMhJPWQv2h7uqRDwJ3APdTR4kC+qJtdTvpmQ0RE2wiHvDiQzXju3jEzyxAnfTOzDHHSNzPLECd9M7MMcdI3M8sQj96pcXHn5+F7X5i+9zazTHHSr3G6633yowKn4b0l4nvT8tZmVqPcvWNmliFO+mZmGeKkb2aWIU76Zhm3dOOzg+bisfrmC7kzQGHB86m2cOHCaXlfM6tdFW/pS1ot6Y20kPTGsV+RbeNd7Hw8rzl27FiVz87MKq2iSV9SA/Aw+cWklwNtkpZXsg5mZllW6Zb+pcD+iDgQEf3Ak+QXljYzswqodJ9+qUWkLysuIGkdsA7g/PPPr1zNZpjR+vlLHZuuG7ysfnhhlWyodEt/zEWkvXh0eSbS129mVumk70WkzcyqqNJJfyewTNIFkuYCN5BfWNrMzCqgon36EfGxpPXA80ADsDki9layDmY2Nvfv16+K35wVEduAbZX+uWZm5jtyzWwMbvXXF8+9YzYFfKe5zRRu6ZtNUtGd5l8hP0Jtp6StEfF6dWs29UpNzObW/8zipG82eafuNAeQVLjTvO6SfilTMUNn8R+OwvtV4o/JTOy6muy/j2r5xh1JR4HfVrseM8zpwD9XuxIzyL+JiEndBSjp68DqiPjPaf8bwGURsX5IuVN3mwNfAt4Y4S3r8XdYj+cEtXteI36ua7qlP9n/jFkkaVdENFe7Hhkz5p3mkL/bHNg05pvV4e+wHs8JZuZ5+UKu2eT5TnObMZz0zSbPd5rbjFHT3Ts2IWN2H9jUmoY7zevxd1iP5wQz8Lxq+kKumZlNLXfvmJlliJO+mVmGOOnXAUmbJR2RtKfadbGJm2lTOZT63ElaJGm7pH3peWGKS9IP0rm9KumSotesTeX3SVpbjXMpqst5knol5STtlXR7is/o8xpkvCsw+VF7D+BPgEuAPdWuix8T/h02AG8CXwTmAv8ALK92vcao87DPHfA/gI1peyNwb9q+BniO/D0NK4GXUnwRcCA9L0zbC6t4TmcDl6TtzwH/F1g+08+r+OGWfh2IiF8Bx6pdD5uUU1M5REQ/UJjKoWaN8LlbA2xJ21uA64rij0Xei8ACSWcDXwW2R8SxiDgObAdWT3/tS4uIwxHxSto+AeTIr+09o8+rmJO+WW04B3iraP9Qis00Z0bEYcgnUOCMFB/p/Gr2vCUtBS4GXqKOzstJ36w2lDWVwww20vnV5HlL+izwU+BbEfH+aEVLxGr2vMBJ36xW1MtUDu+k7g3S85EUH+n8au68Jc0hn/Afj4ifpfCMP68CJ32z2lAvUzlsBQojVdYCzxTFb06jXVYC76VukueBVZIWphExq1KsKiQJ6AZyEfH9okMz+rwGqfaVZD8m/wB6gMPAAPkWRnu16+THhH6P15AfLfIm0Fnt+pRR32GfO2AxsAPYl54XpbIiv9DMm8BrQHPR+9wC7E+Pb1b5nFrId8O8CuxOj2tm+nkVPzwNg5lZhrh7x8wsQ5z0zcwyxEnfzCxDnPTNzDLESd/MLEOc9M3MMsRJ38wsQ/4/u3CdUPZooCAAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "len_result = [len(s) for s in X_test]\n",
    "print(\"줄거리 최대 길이 : \",max(len_result))\n",
    "print(\"줄거리 평균 길이 : \",sum(len_result)/len(len_result))\n",
    "\n",
    "plt.subplot(1,2,1)\n",
    "plt.boxplot(len_result)\n",
    "plt.subplot(1,2,2)\n",
    "plt.hist(len_result, bins=50)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 0, 0, 0]\n",
      "[15001, 11704, 354, 691, 447, 2738, 2441, 15001, 11, 220, 163, 447, 658, 15001, 996, 261, 792, 313, 329, 40, 45, 69, 2738, 4051, 432, 2157, 4968, 70, 2738, 119, 440, 11, 447, 2674, 119, 8, 2108, 257, 386, 1785, 784, 2735, 102, 467, 688, 379, 13040, 15001, 789, 279, 226, 265, 1640, 91, 9333, 113, 18]\n"
     ]
    }
   ],
   "source": [
    "print(Y_test[1])\n",
    "print(X_test[1])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 뮤지컬 데이터 정리"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|███████████████████████████████████████████████████████████████████████████████| 307/307 [00:00<00:00, 653.38it/s]\n"
     ]
    }
   ],
   "source": [
    "from tqdm import tqdm\n",
    "Mu_vocab = {} \n",
    "Mu_sentences = []\n",
    "\n",
    "for i in tqdm(Mu):\n",
    "    Mu_sentence = word_tokenize(str(i)) # 단어 토큰화를 수행합니다.\n",
    "    result = []\n",
    "    \n",
    "    for word in Mu_sentence: \n",
    "        word = word.lower() # 모든 단어를 소문자화하여 단어의 개수를 줄입니다.\n",
    "        if word not in stop_words: # 단어 토큰화 된 결과에 대해서 불용어를 제거합니다.\n",
    "            if len(word) > 2: # 단어 길이가 2이하인 경우에 대하여 추가로 단어를 제거합니다.\n",
    "                result.append(word)\n",
    "                if word not in Mu_vocab:\n",
    "                    Mu_vocab[word] = 0 \n",
    "                Mu_vocab[word] += 1\n",
    "\n",
    "    Mu_sentences.append(result) \n",
    "    \n",
    "    \n",
    "Mu_encoded = []\n",
    "for s in Mu_sentences:\n",
    "    temp = []\n",
    "    for w in s:\n",
    "        try:\n",
    "            temp.append(all_word_to_index[w])\n",
    "        except KeyError:\n",
    "            temp.append(all_word_to_index['OOV'])\n",
    "    Mu_encoded.append(temp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. LSTM 분류 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 7000 samples, validate on 2450 samples\n",
      "Epoch 1/7\n",
      "6976/7000 [============================>.] - ETA: 0s - loss: 1.2966 - acc: 0.4074\n",
      "Epoch 00001: val_acc improved from -inf to 0.46367, saving model to best_model.h5\n",
      "7000/7000 [==============================] - 71s 10ms/sample - loss: 1.2965 - acc: 0.4076 - val_loss: 1.2583 - val_acc: 0.4637\n",
      "Epoch 2/7\n",
      "6976/7000 [============================>.] - ETA: 0s - loss: 0.8872 - acc: 0.6608\n",
      "Epoch 00002: val_acc improved from 0.46367 to 0.58980, saving model to best_model.h5\n",
      "7000/7000 [==============================] - 69s 10ms/sample - loss: 0.8868 - acc: 0.6611 - val_loss: 1.0250 - val_acc: 0.5898\n",
      "Epoch 3/7\n",
      "6976/7000 [============================>.] - ETA: 0s - loss: 0.6036 - acc: 0.7808\n",
      "Epoch 00003: val_acc did not improve from 0.58980\n",
      "7000/7000 [==============================] - 80s 11ms/sample - loss: 0.6033 - acc: 0.7810 - val_loss: 1.2084 - val_acc: 0.5731\n",
      "Epoch 4/7\n",
      " 768/7000 [==>...........................] - ETA: 1:07 - loss: 0.3901 - acc: 0.8736WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss,acc\n",
      "WARNING:tensorflow:Can save best model only with val_acc available, skipping.\n"
     ]
    },
    {
     "ename": "KeyboardInterrupt",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-58-08c0a794745f>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m     26\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     27\u001b[0m \u001b[0mmodel\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcompile\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mloss\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'categorical_crossentropy'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0moptimizer\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;34m'rmsprop'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmetrics\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'acc'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 28\u001b[1;33m \u001b[0mmodel\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mX_train\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mY_train\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mvalidation_data\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mX_test\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mY_test\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mepochs\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m7\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mbatch_size\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m64\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcallbacks\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mes\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmc\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\keras\\engine\\training.py\u001b[0m in \u001b[0;36mfit\u001b[1;34m(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)\u001b[0m\n\u001b[0;32m    817\u001b[0m         \u001b[0mmax_queue_size\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mmax_queue_size\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    818\u001b[0m         \u001b[0mworkers\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mworkers\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 819\u001b[1;33m         use_multiprocessing=use_multiprocessing)\n\u001b[0m\u001b[0;32m    820\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    821\u001b[0m   def evaluate(self,\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\keras\\engine\\training_v2.py\u001b[0m in \u001b[0;36mfit\u001b[1;34m(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)\u001b[0m\n\u001b[0;32m    340\u001b[0m                 \u001b[0mmode\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mModeKeys\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mTRAIN\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    341\u001b[0m                 \u001b[0mtraining_context\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mtraining_context\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 342\u001b[1;33m                 total_epochs=epochs)\n\u001b[0m\u001b[0;32m    343\u001b[0m             \u001b[0mcbks\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mmake_logs\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mepoch_logs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mtraining_result\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mModeKeys\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mTRAIN\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    344\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\keras\\engine\\training_v2.py\u001b[0m in \u001b[0;36mrun_one_epoch\u001b[1;34m(model, iterator, execution_function, dataset_size, batch_size, strategy, steps_per_epoch, num_samples, mode, training_context, total_epochs)\u001b[0m\n\u001b[0;32m    126\u001b[0m         step=step, mode=mode, size=current_batch_size) as batch_logs:\n\u001b[0;32m    127\u001b[0m       \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 128\u001b[1;33m         \u001b[0mbatch_outs\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mexecution_function\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0miterator\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    129\u001b[0m       \u001b[1;32mexcept\u001b[0m \u001b[1;33m(\u001b[0m\u001b[0mStopIteration\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merrors\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mOutOfRangeError\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    130\u001b[0m         \u001b[1;31m# TODO(kaftan): File bug about tf function and errors.OutOfRangeError?\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\keras\\engine\\training_v2_utils.py\u001b[0m in \u001b[0;36mexecution_function\u001b[1;34m(input_fn)\u001b[0m\n\u001b[0;32m     96\u001b[0m     \u001b[1;31m# `numpy` translates Tensors to values in Eager mode.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     97\u001b[0m     return nest.map_structure(_non_none_constant_value,\n\u001b[1;32m---> 98\u001b[1;33m                               distributed_function(input_fn))\n\u001b[0m\u001b[0;32m     99\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    100\u001b[0m   \u001b[1;32mreturn\u001b[0m \u001b[0mexecution_function\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\eager\\def_function.py\u001b[0m in \u001b[0;36m__call__\u001b[1;34m(self, *args, **kwds)\u001b[0m\n\u001b[0;32m    566\u001b[0m         \u001b[0mxla_context\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mExit\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    567\u001b[0m     \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 568\u001b[1;33m       \u001b[0mresult\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_call\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m*\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    569\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    570\u001b[0m     \u001b[1;32mif\u001b[0m \u001b[0mtracing_count\u001b[0m \u001b[1;33m==\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_get_tracing_count\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\eager\\def_function.py\u001b[0m in \u001b[0;36m_call\u001b[1;34m(self, *args, **kwds)\u001b[0m\n\u001b[0;32m    597\u001b[0m       \u001b[1;31m# In this case we have created variables on the first call, so we run the\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    598\u001b[0m       \u001b[1;31m# defunned version which is guaranteed to never create variables.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 599\u001b[1;33m       \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_stateless_fn\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m*\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m  \u001b[1;31m# pylint: disable=not-callable\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    600\u001b[0m     \u001b[1;32melif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_stateful_fn\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    601\u001b[0m       \u001b[1;31m# Release the lock early so that multiple threads can perform the call\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\eager\\function.py\u001b[0m in \u001b[0;36m__call__\u001b[1;34m(self, *args, **kwargs)\u001b[0m\n\u001b[0;32m   2361\u001b[0m     \u001b[1;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_lock\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   2362\u001b[0m       \u001b[0mgraph_function\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwargs\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_maybe_define_function\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2363\u001b[1;33m     \u001b[1;32mreturn\u001b[0m \u001b[0mgraph_function\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_filtered_call\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[1;33m)\u001b[0m  \u001b[1;31m# pylint: disable=protected-access\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   2364\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   2365\u001b[0m   \u001b[1;33m@\u001b[0m\u001b[0mproperty\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\eager\\function.py\u001b[0m in \u001b[0;36m_filtered_call\u001b[1;34m(self, args, kwargs)\u001b[0m\n\u001b[0;32m   1609\u001b[0m          if isinstance(t, (ops.Tensor,\n\u001b[0;32m   1610\u001b[0m                            resource_variable_ops.BaseResourceVariable))),\n\u001b[1;32m-> 1611\u001b[1;33m         self.captured_inputs)\n\u001b[0m\u001b[0;32m   1612\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1613\u001b[0m   \u001b[1;32mdef\u001b[0m \u001b[0m_call_flat\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcaptured_inputs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcancellation_manager\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\eager\\function.py\u001b[0m in \u001b[0;36m_call_flat\u001b[1;34m(self, args, captured_inputs, cancellation_manager)\u001b[0m\n\u001b[0;32m   1690\u001b[0m       \u001b[1;31m# No tape is watching; skip to running the function.\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1691\u001b[0m       return self._build_call_outputs(self._inference_function.call(\n\u001b[1;32m-> 1692\u001b[1;33m           ctx, args, cancellation_manager=cancellation_manager))\n\u001b[0m\u001b[0;32m   1693\u001b[0m     forward_backward = self._select_forward_and_backward_functions(\n\u001b[0;32m   1694\u001b[0m         \u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\eager\\function.py\u001b[0m in \u001b[0;36mcall\u001b[1;34m(self, ctx, args, cancellation_manager)\u001b[0m\n\u001b[0;32m    543\u001b[0m               \u001b[0minputs\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0margs\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    544\u001b[0m               \u001b[0mattrs\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"executor_type\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mexecutor_type\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"config_proto\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mconfig\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 545\u001b[1;33m               ctx=ctx)\n\u001b[0m\u001b[0;32m    546\u001b[0m         \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    547\u001b[0m           outputs = execute.execute_with_cancellation(\n",
      "\u001b[1;32mC:\\ProgramData\\Anaconda3\\lib\\site-packages\\tensorflow_core\\python\\eager\\execute.py\u001b[0m in \u001b[0;36mquick_execute\u001b[1;34m(op_name, num_outputs, inputs, attrs, ctx, name)\u001b[0m\n\u001b[0;32m     59\u001b[0m     tensors = pywrap_tensorflow.TFE_Py_Execute(ctx._handle, device_name,\n\u001b[0;32m     60\u001b[0m                                                \u001b[0mop_name\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0minputs\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mattrs\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 61\u001b[1;33m                                                num_outputs)\n\u001b[0m\u001b[0;32m     62\u001b[0m   \u001b[1;32mexcept\u001b[0m \u001b[0mcore\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_NotOkStatusException\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     63\u001b[0m     \u001b[1;32mif\u001b[0m \u001b[0mname\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mKeyboardInterrupt\u001b[0m: "
     ]
    }
   ],
   "source": [
    "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, LSTM, Embedding\n",
    "from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint\n",
    "import numpy as np\n",
    "\n",
    "\n",
    "M_test=Mu_encoded\n",
    "M_test= np.array(M_test)\n",
    "max_len = 230\n",
    "X_train = pad_sequences(X_train, maxlen=max_len)\n",
    "X_test = pad_sequences(X_test, maxlen=max_len)\n",
    "\n",
    "model = Sequential()\n",
    "model.add(Embedding(15002, 120))\n",
    "model.add(LSTM(128))\n",
    "model.add(Dense(4, activation='softmax'))\n",
    "\n",
    "es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=4)\n",
    "mc = ModelCheckpoint('best_model.h5', monitor='val_acc', mode='max', verbose=1, save_best_only=True)\n",
    "\n",
    "X_train = np.array(X_train)\n",
    "Y_train = np.array(Y_train)\n",
    "X_test = np.array(X_test)\n",
    "Y_test = np.array(Y_test)\n",
    "\n",
    "model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc'])\n",
    "model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=7, batch_size=64, callbacks=[es, mc])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 뮤지컬 데이터 분류 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0.38870266 0.02433592 0.07328184 0.51367956]\n",
      " [0.6450228  0.05261524 0.04859437 0.25376767]\n",
      " [0.9823596  0.00549819 0.00262331 0.00951902]\n",
      " ...\n",
      " [0.49974144 0.09077708 0.2564418  0.1530396 ]\n",
      " [0.22712483 0.24804659 0.28475055 0.24007803]\n",
      " [0.9531947  0.03475152 0.00676594 0.00528788]]\n"
     ]
    }
   ],
   "source": [
    "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
    "import numpy as np\n",
    "\n",
    "M_test = pad_sequences(M_test, maxlen=max_len)\n",
    "predictions = model.predict(M_test)\n",
    "print(predictions)\n",
    "predict_labels = np.argmax(predictions, axis = 1)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "ro = list(predict_labels).count(0)\n",
    "th = list(predict_labels).count(1)\n",
    "fn = list(predict_labels).count(2)\n",
    "his = list(predict_labels).count(3)\n",
    "data = [ro, th, fn,his]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAOcAAADnCAYAAADl9EEgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3deXhTVd4H8O+592Zt2tDSlbIESmvTBSxo3YDCIA4zdRgVd9AiKouKwuhoRhyn41oHGRnU4oILjvjquCE0IgMjlILKbklpUwqSAhXK2nRNm+Te94+0WpGlS5Jzk5zP8/QRy809v9B8e+5yzrlEkiQwDCM/HO0CGIY5OxZOhpEpFk6GkSkWToaRKRZOhpEpFk6GkSkWToaRKRZOhpEpFk6GkSkWToaRKYF2AUxg2bFjR6wgCEsBZID9cu8OEUCZy+W6Z+TIkce68gIWTqZbBEFYGh8fb4yJiTnNcRwbmN1FoiiS48ePpx09enQpgEldeQ37zcd0V0ZMTEw9C2b3cBwnxcTE2OE54ujaa3xYDxOcOBbMnmn/d+ty5lg4GUam2Dkn0ysGk3mkN/dnK8jd4c39BTLWczIBTRRFuN1u2mX4BAsnE3AqKyuVQ4YMSZ86derA9PT0tMLCwr4pKSlpycnJ6bNnz07s2E6r1WbNnj07MT093XjllVemrF+/XpudnX1R//79M5cvX67v2NfIkSMvSktLM6alpRnXrl0bBgBFRUXh2dnZF02cOHHI4MGD0ydNmjRYFEUAQHFxsTYrKyv1oosuSsvMzDSePn2ac7lcmDlzZv+MjAxjSkpK2oIFC6J7+z5ZOJmAZLPZ1HfdddfJ1atXVz333HP9NmzYsLe8vHzPrl27wv7973/3AYCWlhZu3LhxDXv27KkICwtzP/HEE4klJSV7P/74431PP/10IgD069fPVVJSsre8vLzio48++mHevHkDO9qoqKjQvPrqq4f27du35+DBg6q1a9fqHA4HmTJlStKiRYsOVlZWlhcXF1fqdDpx0aJF0Xq93l1WVlZRWlpasWzZshir1arszXtk55xMQEpISGgbP3580/vvv9/n8ssvb+jXr58LAG655ZZTxcXFujvuuKNOoVBIN954Yz0ApKent6hUKlGlUknZ2dktNTU1SgBoa2sjd99996Dy8nINx3Gorq5WdbSRmZnZlJSU5Gx/ffP+/fuVkZGR7tjYWGdOTk4zAERFRYkAsG7dugir1apduXJlJAA0NDTw5eXl6tTU1LaevkcWTiYgabVaEQDOt0CdIAgSx3kODjmOg0qlkgCA53m43W4CAM8++2xcbGys89NPPz0giiI0Gs1PF7g6tu94jcvlIpIkgRDyq0YlSSILFy48OHny5HpvvUd2WMsEtDFjxjRt2bIl/MiRI4LL5cLHH38cNXbs2Mauvt5ut/MJCQlOnudRWFjY90IXl4YPH+6ora1VFhcXawHg9OnTnNPpxIQJE+xLliyJaW1tJQCwe/duVX19fa/yxXpOpldo3/oYNGiQ88knn6zJyclJkSSJjB8/3j516tS6rr5+7ty5xyZPnpy0YsWKyFGjRjVoNBrxfNur1Wpp+fLl+x988MGBDoeDU6vV4saNG/fOmzfvhM1mU2VmZholSSJRUVHOL7/8cn9v3hth69Yy3VFaWmobPnz4Cdp1BKrS0tLo4cOHG7qyLTusZRiZYuFkGJli4WQYmWIXhGTMYDLzAOIAJACIb/86889xABQA2s74aj3L9+oAHARg6/T1o60g97wXQRg6WDhlwmAy6wBcCuAyAJcDuASe8Pn66MZhMJkrAZR3+iq1FeT26koj03ssnBQYTGYOgBGeEHaEMQ0AT6EcNYDh7V8/MZjMBwGsA7AWwDpbQS67QutnLJx+YjCZEwBcD+CP8IQxgm5FFzQQwPT2L8lgMn8PYN3HNydqRFEiHNc+SiZf79UpY8i3n/e+6YkTJ/ilS5dGmUym40VFReELFy6MW79+/b4L7Xbu3Ln9xo4d23Ddddc1ZGdnX/Tiiy8eGjNmTHNiYmLm9u3bKxISElzeexPewcLpQwaTeRCAye1fVwAgdCvqMQIgC0BWnUPEniP1MRoF3xCuFuxxfi7k5MmT/FtvvRVrMpmOd/U1LpcLixYt+tEb7btcLgiCf2LDwullBpM5AsBNAO4EMBqBG8hzkiSJNLe5IprbXBH+DufDDz/c/9ChQ6rU1NQ0QRAkrVYrTpw4cUhlZaUmMzOzecWKFQc4jkNiYmLmbbfddmL9+vURM2fOPLZmzRr9tddea7/rrrtOn2vfhYWFUUuWLIlzOp1kxIgRTe+99161IAjQarVZM2bMqP36668jFixYcPi3v/1tl4cH9ga7leIlBpN5nMFk/gDAUQBLAYxBEAaTtoULFx4eMGBAq9VqLS8oKDh8tmldHduq1Wpxx44dlTNmzDhnIDvs3LlT/cknn0Rt377darVayzmOk1577bW+gGfqWUZGRsvu3but/gomwHrOXjOYzNcCmA/PeSTjZ2eb1tXxd3feeecFQ9nhq6++Ci8rK9MOHz7cCAAOh4OLjY11AZ4ZKdOmTevyvryFhbMHDCYzAXADPKHMolxOSDvbtK6O/w8PD+/y/VtJkshNN9108tVXX6058++USqXor/PMzthhbTcYTGbeYDJPAVAG4BOwYPqdXq93NzU1ef1zO3HixPqioqLImpoaAQBqa2v5vXv39molg95iPWcXGExmBTwXeEwAhlIuR1Z231Pdre21SqEhQa8+HKYSmnvSXnx8vHvkyJGNycnJ6SqVSoyJiXH2ZD9nGjlypOOJJ56oGT9+fIooilAoFNLixYsPpqSk9Hglg95iU8YuwGAyTwRQCGAw7Vrk4M1JCYgbOKTX+4lQK07F69U1agVP7cNPQ3emjLGe8xwMJnM0gEUAptCuJRjVO5xRDQ5XZGSY4nh8hPpHgeeCc33LXmDhPIv288pFAHq9vCFzbhIkcqqpLbau2dk3NkJVExuu7vLAglDAwtmJwWQeCOA1AL+jXUsoESWJP2p3DGxocfXpH6WxqQTeK+eRgY6FEz8NRJ8D4BkAugtszvhIU5sroqq2MT1Brz7YV6c6Rbse2kL+VorBZDYC2AzPYSwLJmWiJPE1dS2DD5xoGuJyizRm6chGSIfTYDJPArAFbHSP7DQ4nJFVxxrTGltdWtq10BKyh7UGk/kv8BzGhvQvqN6a8r/RXt3f8vElP/3Z6RaVB443pcZGqGriItS1Hd9/5plnYt9+++2YjIyM5pUrVx7ozv5NJlN8QUHBUS+W7DMh98E0mMxqg8m8HMBzCMH3H2gkSKS23tH/h+ONSaIoEQB46623Yr788suq7gYTABYvXpzg/Sp9I6Q+nAaTuR+AjQBup10L0z2Nra4++483ptx++xTD4cOHVZMmTRo6f/78+KysrFSj0ZiWlZWVWlpaqgKAxYsX973mmmuSRo8enTxo0KCMWbNm9QeA++67L7G1tZVLTU1NmzRp0mAAuPrqq5PS09ONQ4cOTX/xxRejAc+czcmTJxuSk5PTU1JS0v7+97/H7tmzR5WWlmbsqMdisajS09ONZ6vVW0LmsNZgMl8KYAWAfrRrYXqmxenW/e2fS/ji4g2u4uLivSqVSszPzz+qUCiwYsWK8EcffbT/mjVr9gNAeXm5trS0tFyj0YhDhw7NeOSRR2oLCwtr3n333Vir1Vresc/ly5fb4uLi3I2NjSQrKytt6tSpp6uqqlRHjhxRVFVV7QE8qy9ER0e7w8PD3d98843myiuvbHn99dejb7/99pO+fL8hEU6DyXw7gLfgWS+HCWCtLlEDjkerS1Q2NZ1y3nLLLYNtNpuaECI5nc6fZqSMGjWqvm/fvm4AGDp0qGP//v2qoUOH/ur+6QsvvBBnNpv7AMDRo0cVe/bsUQ8bNsxx6NAhVV5e3oA//OEP9uuvv74eAKZNm3bizTffjM7Ozj70xRdfRG7btq3Cl+816A9rDSbzfADLwYIZVA7XOZIffcw0KCcnp6GqqmrPqlWr9rW1tf30eVYqlZ2nkv0iuB2KiorCi4uLw7dv326trKwsNxqNLS0tLVxMTIy7rKysfNy4cQ2FhYWxt956qwEA8vLyTq9fv17/4Ycf9snMzGyOj4/36ZDDoO45DSbzEwCepl0H431uCYK9yaGP7de/GQBef/31Lg21FARBam1tJSqVSqqrq+P1er07PDxc3LVrl7q0tDQMAI4cOSKoVCpx2rRpdSkpKa3Tp08fDABarVbKycmx/+lPfxr4yiuv2Hz25jpq9XUDtLT3mCyYPtb51oe/TZv9IJ6Yd1/CkldfjhwzalSXViqYMmXKcaPRmJaRkdH80Ucf2d54442YlJSUtKSkJMfw4cObAMBmsynuvvtugyiKBACeeuqpwx2vv/POO0+tXr068oYbbvDaczjPJSinjLXfw3yOdh3ByFtTxrytXx+NLVqn8ukFGgB48skn4+x2O/+vf/2rR6v5hfSUMYPJ/CBYMEPOkTrHIAXPtek1igZftTFhwoSk6upqVXFx8V5ftdFZUIWz01QvJsRIkMihU81Jipgwq1YpOHzRxtq1a/36iIqguVprMJl/B+AdsOUoQ5YoSXz1yebkNpcYFJ1OUITTYDJfAc+CWwratTB0Od2i0nayKdktSgH/2Q74N2AwmeMAfAYgZGcvML/kcLq11SebhgT6xc6ADmf7JOn34XlWJcP8pLHVpT98umUA7Tp6I9CPzecDuJp2EaFMcfUVXt2fc923PXrd0pcX4p45D//ie6eb22KVAtcaF6E+5o3a/C1ge06DyTwWwN9o18HIw9JXXjrr92vrHQPqHc6AXOEiIHtOg8kcC+AD0HnYLENZ0Wcf4YO334DL2YaMrJHQ6cLR6mjBzb8djaSUVDz/8ptY9cmHWPb6yyCEICU1LXnFJx+V8hzp8uMZ5CDgwtnpPDNgJs0y3vNDVSXWrPocyz7/CgqFAs8+/jCSU9OhUmvwnzWeoYT7Kivw5ssLsezzrxAZ1Rf206e5mrqW/gOjtAcpl98tARdOeM4zJ9AugqFjy+ZiVOwuxZRrfwMAcDgciIqO+cU2W78pwYTfT0JkVF8AgD4yEnXNbTF6jeK0L0cQeVtAhZOdZzKSBPzhplvxkOmXH4Nlr7/SaRsJhPx6LEpNXYshTMXvETguIA5vA+aCkMFk1gBYBnaeGdIuu2oM1plX4uQJz+Lw9tOn8ePhgxAUApxO50/brClagbrTp37aBgBcblFZE0C3VwKp53wUwEDaRTC/1NNbHz2VlJKK+/88H7On3ABRFCEoFHj8mQWYfHsebrpmFIwZw/D8y2/i3jkPY/qNueB5Hqnpw/D0S4UAAHuLM7quua2uj1Zp92vhPRAQU8YMJvMAAJUANLRrCXVynTLWHQLHOVPidHtoPDypO1PGAuWwdgFYMBkvcYmi4nBdi+yPwmQfToPJPArALbTrYIJLfYszSu6rycs6nO33NP9Fuw7mZxIkBMKpUFcctfv34lD7siddvlIs63ACmA5gBO0imJ9V1znhaq4PioA2t7l1dc1ten+0JYoiOX78uB5AWVdfI9sLQgaTOQJAFYBY2rUwP4tQcZhzWSQG9VGABMG8doGDM0rD92g9oG4SAZS5XK57Ro4c2aWB+HK+lfIkWDBlp75VxLMbfb6Olj8pADxhK8h9n3YhZ5Jlz9k+sP0gABXtWpiQYAWQbivIldXIIbmec84ECybjP6kAbqJdxJlk13MaTGYFABvYA4cY/yoDMMxWkCubQMix57wRLJiM/2UAuJ52EZ3JMZwP0i6ACVkP0C6gM1kd1rY/Q3Mr7TqYkCUBGGwryK2mXQggv56T9ZoMTQTAHbSL6CCbcLavP3sz7TqYkJdHu4AOsgkngFkAlLSLYELeUIPJfBXtIgCZhNNgMvPw3NtkGDmQRe8pi3ACGAW2mh4jHze3L4tDlVzCKav7S0zI0wO4jnYRcgkn9X8IhjkD9UNb6vc5DSbzCAA7qBbBML8mAki0FeQepVWAHHrOP9IugGHOggMwjnYBtE2kXQDDnMMYmo1TDafBZI4EcAnNGhjmPEI3nPA8W5N2DQxzLkaDyRxNq3HawbiGcvsMcz4EwGhajdMO528ot88wF5JDq2Fq4TSYzOEAAntdfyYUUDvvpNlzplFsm2G6anj7Mq1+RzOcGRTbZpiu4uAZ+02lYVrSKbbNMN1BZQoZCyfDXFgyjUZZOBnmwgbRaJRKOA0mcx8AiTTaZpgeoPIsT1o9J+s1mUASZzCZ/f4EAhZOhrkwAgq9J61wsnucTKDx+3knrXBSG0zMMD0UMj1nOKV2GaanQqbn1FFql2F6ioWTYWSKHdYyjEzp/d0g6zkZpmsU/m6Q9ZwM0zUhE84wSu0yTE8Ffzjbh0H5/Y0yTC/5/TMr+LtBsENan1DA1TYj4uX/ofyY4vfbXcMAicbPNmhJhGtAQa5f26TxA3RTaDOoJeL4ka9UplOLwtXa/0wIz9mewpXP/9CtVYgw0K4taEhivb+bpHHOaYfnORSMF1zDbdtVoporhJOW9GKtRgMA5YNI2vR5fMy+BJTQri+I+L1T8Xs4bQW5IjwBZXrpeeHN4tcVLw3jiBTjBty1PJ/a8XetShL2+DRh9JLfc1tF4ATNOoNEm78bpHW19hSldoNCGFoaNyjnfXubsD6HEPAAsEOtsoKQX60St344lz1rDi+d0mG7/ysNKn7/zLJwBpiLyMEDO1Wzag1c7RWdv79SF3bO3rFOR2JmzREuKbqUbJSAFt9XGZT8fvTBwhlApvJrv/tKaYpWEWfSmX+3WaO54Kir967mxzw6nf/RoYDVNxUGtZP+bpCFMwBwEN3vKF7Y8IzincsJ+fWtKCfgPMFzxq7sqzqOJE2fxw/ZNYQUS+zCXHewnpP5pSjYT25V3Vc6ji8de65tvtWoK0CItqv7dPFE+fwtfM4/buR2uwlqvFJo8GPhZH52GSkv36q6vzWa1I8433ardGE9+vfckcxdfM9DvK6mL77pWYUhhR3WMh6PCB+VfKh8JkkgYr8LbfudRh3Z03aaNEQ/b4Zw5b/HcZsldovrfA75u0Fa4TxKqV3ZU8LZukr5eMkDwhejCcEFl2NsJXDUcVzqhba7kFWXc1fNmc03NqhR2tt9BalKfzdIK5zllNqVtYGk9vBO1cz9mZytyw9s3ajRVIAQr6ypeqwPSbxnLp+5IZMUS4DTG/sMEs0IoZ7TCsBFqW1ZyuW+27FB+Setjji6tWzoKl2YVw9FJUK4wmv5nCen8vucPH7w5r4D2F6jtULyd6NUwmkryG0DsI9G2/IjSQsVSza8olicxREpqruv3q5W+2SZ0coBxHjXPD6hMhEbfbH/AEPlvjDNBxmVUWxbFnRori9RPrR1Ml8ylpDu/yyaCGls4EivzzfPpU1BNH+9UxjzyrXcNpHguK/aCQB+P98E6IZzN8W2qUsnB/btVM06OYA7cVlP97Feq7GCEJ9P+9uYyV06cw5PTkRgq6/bkqkKGo3SDOc2im1TdRe/+tsi5fwEJXEN7s1+VunCmrxV04XYw0j0ffcL2SsuJyWS5wJJKNlCo1Gas+VD7rcwD7drmeKFzaP4shxv7O97tSrWG/vpjg/G8aM3ZkgHnn3P3aJpC4ln3hw2WitsNBqm1nPaCnJPIYQuCkWj7vh21ewybwXTzhF7MyEXeWNf3XU4hgyePpdP3pYcEuNzN9FqmOZhLUDpcMHfruLKyraoHnBFksaLvbXPtVqtFYRQ+/m5eaJYcCOf8/zNXJmLw2FadfhByIYz6Md0Pi4s3/i+4rkUnogJ3tyvWRfW6s399dT3Sdywux/iIw7GYDPtWnwkZMO5inL7PqNGa8tXysc2zRDMYwiB0tv7t6iUXg17b7SoScQj9whXvXM1960E1NGux4vsACy0GqcaTltB7iEg+JbPMJAjh3aqZh1M5Q6N8sX+T3LciVZChvpi372x+lLuivvv41vsWuyiXYuXFButFdTOqWn3nADwOe0CvOk6btP2r5UPh2tJq88u1qzWaatACPHV/nvjhJ4k3Psgf/G6i0mxRGFRLC+jemQnh3B+RrsA75CklxWLN7ykKBzBEfTxZUurw8LkPSidEPLG7/ic+XfyB9qEgL0iLwEoolkAkSS/j+f9FYPJXAHAZ8PQfC0CjfavVKbKfuRUtj/aG2EYcMBJSK8GMPiLwiU5Hv/QvSXtEMYQQJa9/Tl8Z7RWXHHhzXxHDj0nEMC95zCyv2qHanadv4J5lOePBkowAcApEPXfpwo5iydxO0SCWtr1dMMntAuQSzgD8rxzBl+0+QvlXxMVxO23R5KbdWEBOY1rczp3yb0P8opaPb6jXUsXUQ+nLA5rAcBgMleDwqO9e0KAy7lc+dy3l3HWMf5u+6Z+8ZusKqVPrgL7y80b3Zsmb5YuJvJ9iPIWo7XictpFyKXnBIAVtAvoinicqt2hmlVBI5gAUKVUGGi0603/GcOP+tO9/MlmJfbQruUc3qRdACCvcP4f7QIuZCz3/e7NqjlET5qH0Wi/WhAOuwnpT6Ntb6uJJoOmz+Mv+jaVbJDk9eQ5O4APaRcByCictoLc7wB8S7uOc8kX3t34juIfRp5Ifp8J0qFIF3aAVtu+IHJEeOl6fuwzt3IVLg7VtOtp977RWuG3qXjnI5twtnuRdgFn0qC1ea3ykc3ThP+OIYTuE7nXhml4mu37imUwlzF9Ht/3QBy9caydvE67gA5yC+cKAFW0i+iQRGqqd6pm1iRzP15FuxYAOKBQDKFdg684lET32HRh1NJruO9Eeusaf2u0VlAbS3smWYWz/dmdL9GuAwBu4jdsW6f8cx8NaUumXQsAVCkUB0RC4mnX4Wv/Hcldfv/9fFtdGHZQaP41Cm2ek6zC2e5dgN5iUgSi+JripQ3/EN64hBDoadVxppW6sGCeM/kLJyNI/Iw5/IjVI0mxBDj81Gw1ZHZRUnbhtBXktgAopNF2HzSc/k71wM6J/LaxhMhrqNnXYRqaS8r4HyHknWv4nL9M4w+1CtjrhxafM1orZDVmWXbhbPcK/PyQ1xFkr3Wb6r7GOFJ3iT/b7QoJkA4JgiwOr/3thwSSfNc8ftBuAymWPIPRfeEggHd8tO8ek2U4bQW5JwAs81d7D/Cfb/pUmT9IQdwD/NVmd+xRKvdJhPhk8ehA4BKI6pnb+Jx/Xs/tchMc8UETz8ut1wRkGs52C+Hjm9MKuNo+Uz5Z8oji41GEQOPLtnpjpS7MFx/IgLMllRtx70O85kikV++HHwLwthf35zWyDaetIHcfgFd9tf9+OHFkh2rm3hHcvi4/NIiWDVqNVx5UFAwaNaTPQ7OEK/5vDLdJAhq8sMvnjNYKWU4Kl2042/0VPnhc4ARu+/ebVA8JEaQlw9v79jYREI8IfMDOdfWVz6/iRj00k69rVPXqyQEWyGQc7dnIOpy2gtx6AA97c5/PCUuL31D8M4MjUow39+srO1WqShAim1s6cnI0igy4Zy6fvimNbJB69tS6B4zWCjmN6/0F2UwZOx+Dyfw1gHG92UcYWhpXKedbhnBHqc5u766/RUdt+CxcN5Z2HXKXVi2Vz//QrVWIMHTxJR8YrRVTfFlTb8m65+zkfvTiYa4p5NCBHapZtYEWTAAo0ajlOudRVsoHkbTp8/iYfQko6cLmDQAe8XVNvRUQ4bQV5FYA+GdPXns7v+67NcrH+qqJM8nLZfmcC3Ad59n5Zle1KknY49OE0Ut+z20VgRPn2fRpo7VC9lfAAyKc7Z6C52Zxl3AQ3W8pFmx4Vnj7MkIQ4cO6fGarRm0FIazn7Kb1w7nsWXN46ZTurGsilwFY5O+aeiJgwmkryG0G8FBXto2C/eRW1X2l4/ldshuG1x0rdWHn++3PnEedjsTMmiNcUnQp2Sj9PNrMCeAOOQ44OJuACScA2ApyVwBYeb5tsklF+VbV/Y5oUj/CT2X5zDcaNbtK20vvXc2PeXQ6/6NDASuAp4zWiu9p19RVARXOdncDZ3+q1cPCf0o+Uj6dJBAx0c81eV0b0Hqa44y06wgG1XEk6e65fC2A52nX0h0BF872cbe3otN9LSWcrSuV80vmCCtGE4KgGE2zSaupACFq2nUECbtTIHlyvqd5NgEXTgCwFeRuBvAEAAwgx2p2qmbuH8YdkP0wvO5YpQuz064hiMyw5FnkskZRlwVkONv9I5f79s1i5Ty1jjiC7vHnW9WqSNo1BImFljzLf2gX0RMBMULonPL1fQBsBRBUcx1bCGnOHtRfAUKoLigWBFYDuNaSZ6H2GL/eCOSeE8i31wG4Dt6ZnSAbGzznmyyYvVMB4NZADSYQ6OEEgHx7OYApkNfCxL2yShfWSLuGAHcKwCRLnqWediG9EfjhBIB8+yoAM2iX4S071aqAmDEjUy4AN1vyLIH6XNCfBEc4ASDf/jaAP9Muo7caCKlvIsRnT8UOciKAPEue5X+0C/GG4AknAOTbXwRQQLuM3lgXpq0EIUG5sruPSQDuteRZPqBdiLcEVzgBIN/+FwBv0C6jp8y6sGbaNQSoBy15FlmuBdRTwRdOj9kAAvLeVqlKmUC7hgD0mCXP8grtIrwtOMOZbxcB3A6Zrqp2Lqc57pSDkKC6Z+sHf7PkWf5BuwhfCM5wAkC+3Y18+90AAuYHtyZMuxeEBOwUNz8TATxgybM8RbsQXwnecHbItz8Gz5IUsh8KZdaFyXKJRhlqhed2ic+WTpWD4A8nAOTbFwK4Cz1boc1vypXKgJ/q5gd1AK6x5Fk+pV2Ir4VGOAEg374MwPWQ6VC/Yzx/rI0jAbfOkZ8dBjDakmfZSLsQfwidcAJAvr0IwKUAymmXcqYvw7QBP6LFx0oAZFvyLGW0C/GX0AonAOTbKwFkA/iQdimdfaXTBs3YYB9YAOA3ljyL7FfM86bAnjLWW/n6BwG8CID6DJAsw4BqFyGDaNchM3YA0yx5lhW0C6Eh9HrOzvLtiwGMBVBDs4zDAl/Dgvkr3wMYGarBBEI9nACQb/8GwDBQfOR4kS7MRqttGXLBsxDX5ZY8y37axdAU2oe1Z8rX3wBgCYBYfzZ7Q2L8piqlcpQ/25SpUgDTLXmWnbQLkQPWc3aWb8tJ8MgAAALJSURBVP8MgBHAu/5s9geFYog/25OhNgB/A3ApC+bPWM95Lvn63wB4DT5en+gHhVD9x/79Qvl8czOAWaF0i6SrWM95Lvn2rwGkA3gAwDFfNbNKF9bl578EmSoAky15llEsmGfHwnk++XYn8u2vAhgKz4OUvL62zzqtNtQmVp8AMAdAuiXP8hntYuSMHdZ2R74+Dp5zo3sBCN7Y5TDDgOMSIaGwZlATgJcBPB/oC2/5CwtnT+TrhwKYByAPQFhPd1OuVOy/JTEh2MfTHoMnlIWWPMsp2sUEEhbO3sjXR8Kz6t8DAPp39+UvRPUpfl8fkeP1uuRhL4CFAN6z5FkctIsJRCyc3pCvFwDcCE9vmt3Vl03s329LjUK4zGd1+Z8E4H8ACgF8EcgLOssBC6e35euzANwG4BYAA8+1mQhIFxsG1EmEBMMzUfYD+DeAdwPxgUFyxcLpK/l6AuBKeB5XeBOAuM5//b1KWXlHv/hAXp/2RwCfAlhuybNsoV1MMGLh9Id8PQ9gHIA/AJgAwPhU38jijyPCA+l80wXPgIHVAFZb8iy7KdcT9Fg4acjX989LiL1yp1o9EcAYAHK8YusCYIHnKW7/BbCO3QLxLxZOGchclpkAYDiATHhmyGTCM8ZX6acSnPCM2NkGYHv7f0vZVVa6WDhlKnNZpgAgBZ7RSf0AJLR/dfw5HoAGngAr8esJ4yI8I5oaOn3VAzgEwAbgQPuXDUANu7IqPyycQSJzWSbBz0EVATRb8izshxvAWDgZRqbYwHeGkSkWToaRKRZOhpEpFk6GkSkWToaRKRZOhpEpFk6GkSkWToaRKRZOhpEpFk6GkSkWToaRKRZOhpEpFk6GkSkWToaRKRZOhpEpFk6GkSkWToaRKRZOhpEpFk6GkSkWToaRKRZOhpEpFk6GkSkWToaRKRZOhpEpFk6GkSkWToaRKRZOhpGp/wdHX7jnGIUeoQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from matplotlib import pyplot as plt\n",
    "plt.pie(data)\n",
    "categories = ['romance', 'thriller', 'fantasy', 'etc']\n",
    "plt.legend(categories)\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAa40lEQVR4nO3df4xd5X3n8fd37h1fahMly/UwZsAeM5Wxp61oSK5IUKsIbdstsaLSaGnjyGoI2sqqk3SJtPtHWiSSRrK0u1KjekuzdNqYmsgiSUm29VaOWnaTKIlEKWPWEBLXruti8ALDdFCglGZgPN/94547XB+fn/eeO9f36ecljXzvOc95nu/z43znzjlnPObuiIjI6BsbdgAiIlINJXQRkUAooYuIBEIJXUQkEEroIiKBqA+r4c2bN/v27duH1byIyEg6fvz4P7r7RNK+oSX07du3Mz8/P6zmRURGkpmdS9unSy4iIoFQQhcRCYQSuohIIJTQRUQCoYQuIhIIJXQRkUAooYuIBEIJXUQkELm/WGRmVwDfBhpR+Yfd/dOxMg3gQeDdwBLwIXd/pupgjywscM/Zszy7vMy2RoMDMzPsnZysrPyw4y1S392nT7N04QIAzXqdgzt2AFzUzu5mk2NLS6nt9jOOV9VqYMbLKyuZxw5r7AfRblqdVbVVpJ5BjmfSuvrVq6++ZA0BhWJIWi9LKyvUgAuw9u90gbVaJPay45J2HqUdl9RG0bFYb5b3By7MzIBN7v6amY0D3wXudve/7irzMeBGd/8NM9sDfNDdP5RVb6vV8jK/KXpkYYF9p07x+urq2raNY2PM7dyZuqjKlK9a1e0fWVjgrpMneTO2vQbUzHgjYx67261iHNPqzjpmPcZ+EO2m1Xnnli0cfvHFvtsqEvMgxzNtXcWNAxZbZ0Xnvowy/eplXNL6u8GMQ7t2JX4jjbdRdCwGxcyOu3srcV+Zv1hkZhtpJ/T97v5Y1/a/BD7j7o+aWR14EZjwjMrLJvTtjz7KueXlS7ZPNxo8c8stfZevWtXtp9VXVKfdqsYx69hhjf0g2k2rs/Mps9+2isQ8yPGsal1VVV9SnWl6GZes+JKOK9Of9cotWQm90DV0M6uZ2QngJeCR7mQeuRZ4DsDdV4BXgGZCPfvMbN7M5hcXF8v0gWdTBrWq7VWruv1+4+4cP4jxipcZ1tgPot20Y5OSeS9tFYl5kONZ1bqqqr4ydfQyLmX3lenPeuWWLIUSurtfcPd3AtcBN5vZT8WKWNJhCfXMuXvL3VsTE4n/WViqbY3GQLdXrer2+427c/wgxiteZlhjP4h2046tlSxftv7u7YMcz6rWVVX1lamjl3Epu69Mf9Yrt2Qp9ZSLu/8Q+BZwW2zXeWArQHTJ5e3AyxXEt+bAzAwbxy4Od+PY2NoNin7LV63q9g/MzDCesL1G+/pflu52qxjHvGOHNfaDaDetzn1TU5W0VSTmQY5n2rqKG+fSdVZ07sso069exiWtvxvMEo9LaqPoWAxD7sib2YSZvSN6/WPAzwN/Gyt2FLgzen0H8I2s6+e92Ds5ydzOnUw3Ghjt61VZNyHKlq9a1e3vnZzkgdlZmrW3Phs263UOz85yaNeui9rZPzWV2m6/49is1WjW65nHDmvsB9FuWp2fv+GGStoqEvMgxzNtXcXX0AMJ66zI3HfWC7z1U03n37y1WiT2suOS1t+kG6JpbRQdi2Eo8pTLjcBh2vMwBnzF3T9rZp8F5t39aPRo4xeBm2h/Mt/j7mez6i17U1RERLJviuY+h+7uT9FO1PHt93a9/hHwK/0EKSIi/dFvioqIBEIJXUQkEEroIiKBUEIXEQmEErqISCCU0EVEAqGELiISCCV0EZFAKKGLiARCCV1EJBBK6CIigVBCFxEJhBK6iEgglNBFRAKhhC4iEggldBGRQCihi4gEQgldRCQQSugiIoFQQhcRCYQSuohIIJTQRUQCoYQuIhKI3IRuZlvN7JtmdtLMvm9mdyeUudXMXjGzE9HXvYMJV0RE0tQLlFkB/pO7P2FmbwOOm9kj7v6DWLnvuPsHqg9RRESKyP2E7u4vuPsT0et/Ak4C1w46MBERKafUNXQz2w7cBDyWsPsWM3vSzL5uZj+Zcvw+M5s3s/nFxcXSwYqISLrCCd3MrgS+CnzS3V+N7X4CmHb3nwZ+H/izpDrcfc7dW+7empiY6DVmERFJUCihm9k47WR+xN2/Ft/v7q+6+2vR62PAuJltrjRSERHJVOQpFwO+AJx098+llNkSlcPMbo7qXaoyUBERyVbkKZefAX4N+J6ZnYi2/TawDcDd7wfuAPab2QrwL8Aed/cBxCsiIilyE7q7fxewnDL3AfdVFZSIiJSn3xQVEQmEErqISCCU0EVEAqGELiISCCV0EZFAKKGLiARCCV1EJBBK6CIigVBCFxEJhBK6iEgglNBFRAKhhC4iEggldBGRQCihi4gEQgldRCQQSugiIoFQQhcRCYQSuohIIJTQRUQCoYQuIhIIJXQRkUAooYuIBEIJXUQkEPW8Ama2FXgQ2AKsAnPufjBWxoCDwG7gdeCj7v5E9eEmO7KwwN2nT7N04QIAm8y4olbj5ZUVtjUa7G42Oba0xLnlZWrABaBZq/Gj1VX+2R2AZr3OwR072Ds5mVj/PWfP8uzyMlfVamDG0srKWl3TjQYHZmbWjs2Lp7tsWjvxckn7gNTyWbF3x5FUR2db93h19zFeZ944ZvUraX9nvuLle+lL1vjl1bm72eQrCwtr85i1RrrbOLe8jAEebY8fl9ePtLFJii++rtPWZNF1ltWvIvH2o+h5nDaHvZ7fvY5LkeMh/xytkrl7dgGza4Br3P0JM3sbcBz4ZXf/QVeZ3cBv0k7o7wEOuvt7supttVo+Pz/fb/wcWVjgrpMnebPvmmCDGYd27brkhNp36hSvr65mHrtxbIy5nTsBcuPplM1rp7vO+L5xwMx4o2v+4vXmxZ5UR9K27vrv3LKFwy++mDke3eOY1a+0/b20mzUecOn4Fe1LVt+65fWjc1xSLPG+lhmbLGXXWVryLxpvP3o5j8vOYdrcddovMy5Fji9yjvbCzI67eytxX15CT6jsz4H73P2Rrm1/CHzL3R+K3p8CbnX3F9LqqSqhb3/0Uc4tL/ddT8d0o8Ezt9zSU/3TjQZAofJF2ylTZ7zeqscG3voEWDSOrH5l7e+13aR2IHn8+qmze+6g2FgXncuyY1Mm1rz5iCvar6Rjy+i1r2XnsGw/i/atbJ7oZ7yyEnruJZdYRduBm4DHYruuBZ7ren8+2nZRQjezfcA+gG3btpVpOtWzFSeseH1l6u+nbNqxZfvXXb7qsYHiJ0+n7bx+FY2xl8SbV3+VdRbpR9G+lh2bMm2WXWdV9msQdZSdw0H0v0y5smXLKnxT1MyuBL4KfNLdX43vTjjkko/+7j7n7i13b01MTJSLNMW26FNPVeL1lal/W6NRuHzRdsrUGa+n6rGB9ieiMnFk9Strf6/tJrWT1kY/dRbZViaWpLqqmL8y66zM9rJlBlVH2TkcRP/LlCtbtqxCCd3Mxmkn8yPu/rWEIueBrV3vrwOe7z+8fAdmZhivqK4NZms3Mrrr3ziWP0wbx8Y4MDNTKJ5O2bx2uuuM7xuP4s2qNy/2pDqStnXXv29qKnc8uscxq19FYizabtZ4pMVQpC9xSWsE8vvROS6vXNmxyVJ2nSUpE28/ejmPy85h2tx12i8zLkWOL3KOVi13JKInWL4AnHT3z6UUOwp8xNreC7ySdf28SnsnJ3lgdpZm7a3v1ZvMaNbrGO3rVfunptauX3ZKNWs1NnUNdrNeT7xhsndykrmdO5luNLDouGa9flFd043G2o2OIvEk3RSJtxOvM77vgdlZDu3alVg+L/asOrq3JfXx8zfccEmdWeOY1a+0/Z35spx28/qSNX5F6tw/NXXRPKatkXg/4OIfWbuPy5uTrLFJii8+T0lrsug6S1Im3n6UOY/T5rC7/0XP717HpcjxRc7RqhV5yuVnge8A36P92CLAbwPbANz9/ijp3wfcRvuxxbvcPfOOZ1U3RUVE/jXp66aou3+X5Gvk3WUc+Hhv4YmISBX0m6IiIoFQQhcRCYQSuohIIJTQRUQCoYQuIhIIJXQRkUAooYuIBEIJXUQkEEroIiKBUEIXEQmEErqISCCU0EVEAqGELiISCCV0EZFAKKGLiARCCV1EJBBK6CIigVBCFxEJhBK6iEgglNBFRAKhhC4iEggldBGRQCihi4gEIjehm9khM3vJzJ5O2X+rmb1iZieir3urD1NERPLUC5T5E+A+4MGMMt9x9w9UEpGIiPQk9xO6u38beHkdYhERkT5UdQ39FjN70sy+bmY/mVbIzPaZ2byZzS8uLlbUtIiIQDUJ/Qlg2t1/Gvh94M/SCrr7nLu33L01MTFRQdMiItLRd0J391fd/bXo9TFg3Mw29x2ZiIiU0ndCN7MtZmbR65ujOpf6rVdERMrJfcrFzB4CbgU2m9l54NPAOIC73w/cAew3sxXgX4A97u4Di1hERBLlJnR3/3DO/vtoP9YoIiJDpN8UFREJhBK6iEgglNBFRAKhhC4iEggldBGRQCihi4gEQgldRCQQSugiIoFQQhcRCYQSuohIIJTQRUQCoYQuIhIIJXQRkUAooYuIBEIJXUQkEEroIiKBUEIXEQmEErqISCCU0EVEAqGELiISCCV0EZFAKKGLiARCCV1EJBD1vAJmdgj4APCSu/9Uwn4DDgK7gdeBj7r7E1UHWtaRhQXuOXuWZ5eX2dZocGBmhr2Tk6XKA6XqKNLukYUF7j59mqULFwDYZMYVtRovr6wUbqPKeMsoO6aDbncQ8XTqPLe8TA24AEw3GuxuNjm2tLQuMfQTd9J6iPell3Mhvobz+py3Pq+q1cCs1Lrv99wpso46cS2trJQas7JjOCjm7tkFzN4HvAY8mJLQdwO/STuhvwc46O7vyWu41Wr5/Px8T0HnObKwwL5Tp3h9dXVt28axMeZ27kwc1KTy44CZ8UbX+GTVUaTdIwsL3HXyJG9mxJ7XRpXxllF2TKuS1u6dW7Zw+MUXK40nqa00g4qhF0XXQ5EYi6zhvHVQNp68mDp19nPulFlHvcRXpK2q1oWZHXf3VuK+vIQeVbAd+IuUhP6HwLfc/aHo/SngVnd/IavOQSb07Y8+yrnl5Uu2TzcaPHPLLYXLJ0mro0i7RdvJaqPKeMsoO6ZVSWu38+mpynjKjOugYuhF2bih/LmQt4a76+slnqyYytRZtl9pc1g2viJtVbUushJ6FdfQrwWe63p/PtqWFMg+M5s3s/nFxcUKmk72bMrEl91etmxe/UXbyStXVbxlVDF2VbabdhL2E0/ZYwcRQy96aa/Xc6TIcb32v5dzq2i5sutoPWOoUhUJ3RK2JX7sd/c5d2+5e2tiYqKCppNtazQq2V62bF79RdvJK1dVvGVUMXZVtlsrWb6fttIMIoZe9NJer+dIkeN67X8v51bRcmXX0XrGUKUqEvp5YGvX++uA5yuot2cHZmbYOHZx1zaOja3dmClSfhzYYBd/r8qqo0i7B2ZmGM+JPa+NKuMto+yYViWt3X1TU5XHk9RWmkHF0Iui66Gj7LkQX8N5fS4bT15MnTr7OXfKrKNe4ivS1nqsiyoS+lHgI9b2XuCVvOvng7Z3cpK5nTuZbjQw2teusm5IJJV/YHaWQ7t2Fa6jSLt7Jyd5YHaWZu2tzwWbzGjW64XbqDLeMsqOaVXS2v38DTdUHk93W/DWp7fpRoP9U1PrEkO/cSeth3hfyp4L8TWc1+ci67NZq5Va9/2eO0XXUSeuMmNWdgwHqchTLg8BtwKbgQXg07S/4eLu90ePLd4H3Eb7scW73D33bucgb4qKiIQq66Zo7nPo7v7hnP0OfLzH2EREpCL6TVERkUAooYuIBEIJXUQkEEroIiKBUEIXEQmEErqISCCU0EVEAqGELiISCCV0EZFAKKGLiARCCV1EJBBK6CIigVBCFxEJhBK6iEgglNBFRAKhhC4iEggldBGRQCihi4gEQgldRCQQSugiIoFQQhcRCYQSuohIIJTQRUQCUSihm9ltZnbKzM6Y2acS9n/UzBbN7ET09evVhyoiIlnqeQXMrAb8AfALwHngcTM76u4/iBX9srt/YgAxiohIAUU+od8MnHH3s+7+BvAl4PbBhiUiImUVSejXAs91vT8fbYv792b2lJk9bGZbkyoys31mNm9m84uLiz2EKyIiaYokdEvY5rH3/wvY7u43Av8bOJxUkbvPuXvL3VsTExPlIhURkUxFEvp5oPsT93XA890F3H3J3Zejt38EvLua8EREpKgiCf1xYIeZXW9mG4A9wNHuAmZ2TdfbXwJOVheiiIgUkfuUi7uvmNkngL8EasAhd/++mX0WmHf3o8B/NLNfAlaAl4GPDjBmERFJYO7xy+Hro9Vq+fz8/FDaFhEZVWZ23N1bSfv0m6IiIoFQQhcRCYQSuohIIJTQRUQCoYQuIhIIJXQRkUAooYuIBEIJXUQkEEroIiKBUEIXEQmEErqISCCU0EVEAqGELiISCCV0EZFAKKGLiARCCV1EJBBK6CIigVBCFxEJhBK6iEgglNBFRAKhhC4iEggldBGRQCihi4gEol6kkJndBhwEasAfu/t/ie1vAA8C7waWgA+5+zPVhtp2ZGGBu0+fZunChUv2bTID4J/dAWjW6/zq1VdzbGmJc8vL1IALgAGeckySMWA1dlzStm7Nep2DO3YAcM/Zs5xbXi7dbpIrazV+bXKSY0tLPLu8zFW1GpixtLKy1r+kfub1J0mnXLze6UaD3c0mX1lYWJuHXvvTr6R2+41lkxlX1Gq8vLLCtkaDAzMz7J2cTF178fY649as1fjR6mqhOPqJudNe0rx078+b73gMaf1KayepvitqtYvWZufYfs7JvPM8r660+ene3jmn8sYsT/wcmu5aT4NgnrOAzKwGnAZ+ATgPPA582N1/0FXmY8CN7v4bZrYH+KC7fyir3lar5fPz86WCPbKwwF0nT/JmqaOGpwbUzHhjnZOcVGvj2Bh3btnCHz///MisPbl8bRwbY27nzp6Tupkdd/dW0r4il1xuBs64+1l3fwP4EnB7rMztwOHo9cPAz5lF3xordM/ZsyN1Ql0AJfMAvL66ypySuVTk9dVV7jl7diB1F0no1wLPdb0/H21LLOPuK8ArQDNekZntM7N5M5tfXFwsHeyzy8uljxGpQvqFBZHyBpXLiiT0pE/a8Y+dRcrg7nPu3nL31sTERJH4LrKt0Sh9jEgVasMOQIIyqFxWJKGfB7Z2vb8OeD6tjJnVgbcDL1cRYLcDMzOMV13pANWADdVfeZJ1tnFsjH1TUyO19uTytXFsjAMzMwOpu0hCfxzYYWbXm9kGYA9wNFbmKHBn9PoO4Bued7e1B3snJ3lgdrZ9FzrBJrO1u9rQvvu9f2qK6ei7YecoyzgmSWeQLGdbt2a9zuHZWQ7t2rXWftl2k1xZq631yWjfkW/W2w8rdfqX1M9uebHHy8XrnW402D81ddE89NqffiW1228sm8xo1usY7b7O7dzJ52+4IXXtxdvrjFuzViscRz8xd9pLmpfu/Xm1x2NI61daO0n1xddm59h+zsm88zyvrrT56d7eibvfFR0/hzrraWhPuQCY2W7g96K4Drn7ATP7LDDv7kfN7Argi8BNtD+Z73H3zKv+vTzlIiLyr13WUy6FnkN392PAsdi2e7te/wj4lX6CFBGR/ug3RUVEAqGELiISCCV0EZFAKKGLiARCCV1EJBBK6CIigVBCFxEJRKFfLBpIw2aLwLkeDt0M/GPF4QxDCP1QHy4PIfQBwujHevRh2t0T/zOsoSX0XpnZfNpvSY2SEPqhPlweQugDhNGPYfdBl1xERAKhhC4iEohRTOhzww6gIiH0Q324PITQBwijH0Ptw8hdQxcRkWSj+AldREQSKKGLiARipBK6md1mZqfM7IyZfWrY8RRlZs+Y2ffM7ISZzUfbrjKzR8zs76J//82w44wzs0Nm9pKZPd21LTFua/vv0dw8ZWbvGl7kb0npw2fM7P9F83Ei+gMunX2/FfXhlJn94nCivpiZbTWzb5rZSTP7vpndHW0fmbnI6MPIzIWZXWFmf2NmT0Z9+J1o+/Vm9lg0D1+O/rIbZtaI3p+J9m8feJDuPhJftP9a0t8DM8AG4EngJ4YdV8HYnwE2x7b9N+BT0etPAf912HEmxP0+4F3A03lxA7uBr9P+q13vBR4bdvwZffgM8J8Tyv5EtK4awPXReqtdBn24BnhX9PptwOko1pGZi4w+jMxcRON5ZfR6HHgsGt+v0P4rbQD3A/uj1x8D7o9e7wG+POgYR+kT+s3AGXc/6+5vAF8Cbh9yTP24HTgcvT4M/PIQY0nk7t/m0j/2nRb37cCD3vbXwDvM7Jr1iTRdSh/S3A58yd2X3f0fgDO0191QufsL7v5E9PqfgJPAtYzQXGT0Ic1lNxfReL4WvR2Pvhz4t8DD0fb4PHTm52Hg58wG+4d3RymhXws81/X+PNkL4nLiwF+Z2XEz2xdtm3T3F6C92IGrhxZdOWlxj9r8fCK6HHGo63LXZd+H6Mf2m2h/OhzJuYj1AUZoLsysZmYngJeAR2j/5PBDd1+JinTHudaHaP8rQHOQ8Y1SQk/6zjYqz1z+jLu/C3g/8HEze9+wAxqAUZqf/wH8OPBO4AXgd6Ptl3UfzOxK4KvAJ9391ayiCdsui34k9GGk5sLdL7j7O4HraP/EMJtULPp33fswSgn9PLC16/11wPNDiqUUd38++vcl4H/SXggLnR+Do39fGl6EpaTFPTLz4+4L0Ym5CvwRb/0of9n2wczGaSfCI+7+tWjzSM1FUh9GcS4A3P2HwLdoX0N/h5nVo13dca71Idr/dopf/uvJKCX0x4Ed0R3lDbRvMhwdcky5zGyTmb2t8xr4d8DTtGO/Myp2J/Dnw4mwtLS4jwIfiZ6weC/wSudywOUmdj35g7TnA9p92BM9nXA9sAP4m/WOLy667voF4KS7f65r18jMRVofRmkuzGzCzN4Rvf4x4Odp3wv4JnBHVCw+D535uQP4hkd3SAdmmHeNy37Rvnt/mvZ1q3uGHU/BmGdo361/Evh+J27a19L+D/B30b9XDTvWhNgfov1j8Ju0P238h7S4af94+QfR3HwPaA07/ow+fDGK8SnaJ901XeXvifpwCnj/sOOPYvpZ2j+qPwWciL52j9JcZPRhZOYCuBH4v1GsTwP3RttnaH+zOQP8KdCItl8RvT8T7Z8ZdIz61X8RkUCM0iUXERHJoIQuIhIIJXQRkUAooYuIBEIJXUQkEEroIiKBUEIXEQnE/wfPdtcIYl0E3wAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "for i in range(len(M_test)):\n",
    "    plt.scatter(i+1, predict_labels[i],color = 'c')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3 0 0 0 0 0 2 2 0 0 2 1 3 0 2 0 2 0 0 2 2 0 0 1 0 0 0 0 1 0 2 0 0 2 0 2 0\n",
      " 0 0 0 0 1 2 2 3 0 2 0 0 1 1 0 0 1 0 0 2 0 0 0 0 0 0 0 2 3 3 0 1 2 0 3 0 1\n",
      " 0 3 0 0 2 0 0 2 0 3 0 0 2 0 0 0 0 2 0 2 0 3 0 1 3 2 0 0 2 2 0 0 0 0 3 0 0\n",
      " 2 0 0 0 0 0 0 0 2 0 0 0 2 0 0 0 0 1 0 2 0 0 2 1 0 0 0 2 3 0 0 0 0 0 2 0 0\n",
      " 0 0 0 2 0 0 2 1 0 1 2 2 3 1 0 0 0 0 3 1 0 0 2 0 2 2 0 0 0 0 0 0 2 0 0 2 3\n",
      " 0 0 0 0 0 0 2 0 0 3 2 0 0 0 2 0 2 1 2 0 0 0 1 0 1 0 0 0 2 0 0 0 3 3 0 0 1\n",
      " 3 0 0 2 0 3 2 1 0 0 0 0 0 0 0 3 1 0 3 0 3 1 3 0 1 0 3 0 0 2 2 0 0 2 0 0 0\n",
      " 2 1 1 0 0 1 3 2 0 2 0 0 2 2 0 0 0 1 0 3 3 0 0 1 0 2 1 0 1 0 0 3 0 0 0 0 3\n",
      " 0 0 0 0 2 0 0 1 0 2 0]\n"
     ]
    }
   ],
   "source": [
    "print(predict_labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 확인 : 오페라의 유령"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['The 1988 Tony Award-winner for Best Musical features the work of Andrew Lloyd Webber and is the longest-running show in Broadway history.Story:It is 1881 and the backdrop in the notoriously haunted Op챕ra Populaire has just mysteriously fallen during rehearsal frightening the star performer from continuing with the show and forcing young Christine to be recast in the role.  After opening the production the Phantom abducts Christine and brings her to his lair and reveals his love for her.  When the Phantom discovers that Christine is already in love with Raoul he vows to destroy him - a promise that leads both him and Christine to a dramatic discovery of the true power of music and love. ']\n"
     ]
    }
   ],
   "source": [
    "print(Mu[226])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n"
     ]
    }
   ],
   "source": [
    "print(predict_labels[226])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "장르 색이 강한 오페라의 유령 같은 데이터에 대해서는 항상 정확한 값이 나오는 것을 확인할 수 있음!\n",
    "영화처럼 전체 내용이 다 나와 있는 것이 아니라서 정확도가 떨어지는 경우 발생"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## END"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}