๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

โœ’๏ธ Kibwa Voice Phishing Prev Project12

[Flask] Flask ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ™œ์šฉํ•œ Python WebApp ์ œ์ž‘ ๐ŸŒ Groom ์„ ํ™œ์šฉํ•œ Flask WebApp ์ œ์ž‘ ๊ณผ์ • 1. Flask ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ƒ์„ฑ์„ ์œ„ํ•œ application.py ํŽ˜์ด์ง€ ์ œ์ž‘ # Flask ๋ชจ๋“ˆ๋กœ๋ถ€ํ„ฐ ํ•„์š”ํ•œ ํด๋ž˜์Šค์™€ ํ•จ์ˆ˜ ์ž„ํฌํŠธ from flask import Flask, render_template, redirect, url_for # AWS ์„œ๋น„์Šค์™€ ์ƒํ˜ธ์ž‘์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก boto3 ๋ชจ๋“ˆ ์ž„ํฌํŠธ import boto3 import json import time # AWS ๊ณ„์ • ์ •๋ณด ๋ฐ S3 ๋ฒ„ํ‚ท ์ด๋ฆ„ ์„ค์ • aws_access_key = 'AWS ์—‘์„ธ์Šค ํ‚ค ID' aws_secret_key = 'AWS ์‹œํฌ๋ฆฟ ์—‘์„ธ์Šค ํ‚ค' bucket_name = 'ํŒŒ์ผ์ด ์œ„์น˜ํ•œ ๋ฒ„ํ‚ท๋ช…' file_key = 'ํŒŒ์ผ๋ช….txt' # AWS S3 ํด๋ผ์ด์–ธํŠธ ์ƒ์„ฑ.. 2023. 9. 26.
[Data Processing] ๋‹จ์–ด ์‚ญ์ œ๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(Data Augmentation) ๐Ÿ’ƒ ๋‹จ์–ด ์‚ญ์ œ๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(Data Augmentation) 1. OS ์™€ Random ๋ชจ๋“ˆ Import import os import random OS Random ์šด์˜ ์ฒด์ œ์™€ ์ƒํ˜ธ์ž‘์šฉ์„ ์œ„ํ•œ ๋ชจ๋“ˆ๋กœ, ๋””๋ ‰ํ† ๋ฆฌ๋‚˜ ํŒŒ์ผ๊ณผ ๊ด€๋ จ๋œ ๋‹ค์–‘ํ•œ ์ž‘์—…์— ์ด์šฉ๋จ ๋‚œ์ˆ˜ ์ƒ์„ฑ ๋ฐ ์‹œํ€€์Šค์—์„œ์˜ ๋ฌด์ž‘์œ„ ์š”์†Œ ์„ ํƒ ๋“ฑ์˜ ๊ธฐ๋Šฅ์„ ์œ„ํ•œ ๋ชจ๋“ˆ 2. ์ฃผ์–ด์ง„ ํŒŒ์ผ ๊ฒฝ๋กœ์—์„œ ๋žœ๋คํ•˜๊ฒŒ ํ•˜๋‚˜์˜ ๋‹จ์–ด๋ฅผ ์‚ญ์ œํ•˜๋Š” remove_random_word() ํ•จ์ˆ˜ ๊ตฌํ˜„ def remove_random_word(file_path): with open(file_path, 'r', encoding='utf-8') as file: content = file.read() words = content.split() if len(words) > 1: # E.. 2023. 7. 28.
[Data Processing] '๋ฒˆ์—ญ ํ›„ ํšŒ๊ท€' ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(Data Augmentation) ์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ™œ์šฉํ•œ ์‹ค์ œ ๋ณด์ด์Šคํ”ผ์‹ฑ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๐Ÿ“ ๋ฒˆ์—ญ ํ›„ ํšŒ๊ท€ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ์‹ค์ œ ๋ณด์ด์Šคํ”ผ์‹ฑ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• 1. pandas ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ '์‚ฌ์นญํ˜•_phising_data.csv' ํŒŒ์ผ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ , ๋ถˆ๋Ÿฌ์˜จ ๋ฐ์ดํ„ฐ์˜ ์ฒซ ๋ถ€๋ถ„ ํ™•์ธ #๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ data=pd.read_csv('์‚ฌ์นญํ˜•_phising_data.csv') data.head() 2. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ 'data'์˜ comments ์—ด์—์„œ ์ฒซ ๋‹ค์„ฏ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์ถœ๋ ฅ data['comments'].head() 3. 'comments' ์—ด์˜ ๊ฐ ๋ฌธ์žฅ์˜ ๊ธธ์ด๋ฅผ ์ธก์ •ํ•˜์—ฌ 5000์ž๋ฅผ ์ดˆ๊ณผํ•˜๋Š” ๋ฌธ์žฅ์˜ ๊ฐœ์ˆ˜ ํ™•์ธ #ํŒŒํŒŒ๊ณ  ํ•œ๋ฒˆ์— 5000๊นŒ์ง€๋งŒ ๊ฐ€๋Šฅ #์ตœ๋Œ€ ๊ธ€์ž ์ˆ˜ ํ™•์ธ li=[] for i in range(len(data['comments'])): li.append(len(data.ilo.. 2023. 7. 27.
[Data Processing] ๋™์˜์–ด(์œ ์˜์–ด) ๊ต์ฒด๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(Data Augmentation) ์Šคํฌ๋ฆฝํŠธ ์ž‘์„ฑ ๐Ÿ™ ๋™์˜์–ด(์œ ์˜์–ด) ๊ต์ฒด ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ์Šคํฌ๋ฆฝํŠธ 1๏ธโƒฃ ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€ ๊ด€๋ฆฌ์ž์ธ pip๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ konlpy ํŒจํ‚ค์ง€ ์„ค์น˜ !pip install konlpy โ€ป konlpy ๋ž€? konlpy๋Š” ํ•œ๊ตญ์–ด ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ํŒจํ‚ค์ง€๋กœ ํ˜•ํƒœ์†Œ ๋ถ„์„์™€ ํ’ˆ์‚ฌ ํƒœ๊น…, ๊ตฌ๋ฌธ ๋ถ„์„ ๋“ฑ์˜ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค. konlpy๋ฅผ ์„ค์น˜ํ•˜๋ฉด ์‹œ ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์ž‘์—…์— ์œ ์šฉํ•œ ํด๋ž˜์Šค์™€ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 2๏ธโƒฃ ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€ ๊ด€๋ฆฌ์ž์ธ pip๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ nltk ํŒจํ‚ค์ง€ ์„ค์น˜ !pip install nltk โ€ป NLTK ๋ž€? NLTK๋Š” Natural Language Toolkit์˜ ์•ฝ์ž๋กœ, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(Natural Language Processing, NLP) ์ž‘์—…์— ์‚ฌ์šฉ๋˜๋Š” ์˜คํ”ˆ ์†Œ์Šค ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ด๋‹ค... 2023. 7. 14.
[Data Processing] ๋ฌธ์žฅ ์žฌ๊ตฌ์„ฑ์„ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(Data Augmentation) ์Šคํฌ๋ฆฝํŠธ ์ž‘์„ฑ ๐ŸŒจ๏ธ ๋ฌธ์žฅ ์žฌ๊ตฌ์„ฑ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ์Šคํฌ๋ฆฝํŠธ 1๏ธโƒฃ random ๋ชจ๋“ˆ ๊ฐ€์ ธ์˜ค๊ธฐ + ๋ฌธ์žฅ ์žฌ๊ตฌ์„ฑ ํ•จ์ˆ˜ sentence_rearrangement ๊ตฌํ˜„ import random def sentence_rearrangement(sentence): words = sentence.split() # ๋ฌธ์žฅ์„ ๋‹จ์–ด๋กœ ๋ถ„๋ฆฌ random.shuffle(words) # ๋‹จ์–ด์˜ ์ˆœ์„œ๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์„ž์Œ new_sentence = ' '.join(words) # ๋‹จ์–ด๋“ค์„ ๋‹ค์‹œ ๋ฌธ์žฅ์œผ๋กœ ์กฐํ•ฉ return new_sentence โ€ป random ๋ชจ๋“ˆ์ด๋ž€? random ๋ชจ๋“ˆ์€ ํŒŒ์ด์ฌ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋‚ด์žฅ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ, ๋‚œ์ˆ˜ ์ƒ์„ฑ ๋ฐ ๊ด€๋ฆฌ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค. ์ด ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜๋ฉด ๋‹ค์–‘ํ•œ ๋‚œ์ˆ˜ ์ƒ์„ฑ๊ณผ ๋ฌด์ž‘์œ„ ์š”์†Œ ์„ ํƒ, ์‹œํ€€์Šค ์„ž๊ธฐ ๋“ฑ ๋‹ค.. 2023. 7. 13.
[Data Processing] ๋ฒˆ์—ญ ํ›„ ํšŒ๊ท€๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(Data Augmentation) ์Šคํฌ๋ฆฝํŠธ ์ž‘์„ฑ ๐Ÿ’– ๋ฒˆ์—ญ ํ›„ ํšŒ๊ท€ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ์Šคํฌ๋ฆฝํŠธ 1. Python์—์„œ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์™€ ๋”ฅ๋Ÿฌ๋‹ ๊ด€๋ จ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ Import import pandas as pd from glob import glob import os import numpy as np import pandas as pd from tqdm import tqdm, tqdm_notebook import random import torch import torch.nn.functional as F ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค๋ช… & ์šฉ๋„ pandas ๋ฐ์ดํ„ฐ ์กฐ์ž‘ glob ํŒŒ์ผ ๊ฒ€์ƒ‰ os ์šด์˜ ์ฒด์ œ์™€์˜ ์ƒํ˜ธ์ž‘์šฉ numpy ์ˆ˜์น˜ ๊ณ„์‚ฐ tqdm ์ง„ํ–‰ ์ƒํ™ฉ ์‹œ๊ฐํ™” random ๋‚œ์ˆ˜ ์ƒ์„ฑ torch ํŒŒ์ดํ† ์น˜ ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ 2. Papago ๋ฅผ ํ†ตํ•œ KOR -> EN .. 2023. 7. 12.