๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
โœ’๏ธ Kibwa Voice Phishing Prev Project/Natural Language Processing Model Study

[Natural Language Processing Model] KOBERT ์—๋Ÿฌ ๊ธฐ๋ก & ์ •๋ฆฌ

by A Lim Han 2023. 7. 1.

๐Ÿ”ฅ KOBERT Code ์ถœ์ฒ˜ ๋ฐ ์šด์šฉ ์‹œ ์ฐธ๊ณ  ์ž๋ฃŒ

-->  https://bbarry-lee.github.io/ai-tech/KoBERT%EB%A5%BC-%ED%99%9C%EC%9A%A9%ED%95%9C-%EA%B0%90%EC%A0%95%EB%B6%84%EB%A5%98-%EB%AA%A8%EB%8D%B8-%EA%B5%AC%ED%98%84.html

 

KoBERT๋ฅผ ํ™œ์šฉํ•œ ๊ฐ์ •๋ถ„๋ฅ˜ ๋ชจ๋ธ ๊ตฌํ˜„ with Colab

์•ˆ๋…•ํ•˜์„ธ์š”, Daisy ์ž…๋‹ˆ๋‹ค โ˜บ๏ธ SKT Brain์—์„œ ๊ฐœ๋ฐœํ•œ KoBERT ์˜คํ”ˆ์†Œ์Šค ์ฝ”๋“œ๋ฅผ ํ™œ์šฉํ•ด Google Colab์—์„œ ๊ฐ์ •๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•ด๋ณด์•˜์œผ๋ฉฐ ๊ทธ ๊ณผ์ •์„ ์†Œ๊ฐœํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

bbarry-lee.github.io

 

 

-->  https://github.com/SKTBrain/KoBERT

 

GitHub - SKTBrain/KoBERT: Korean BERT pre-trained cased (KoBERT)

Korean BERT pre-trained cased (KoBERT). Contribute to SKTBrain/KoBERT development by creating an account on GitHub.

github.com

 


 

1. ๊ธฐ๋ณธ ์š”๊ตฌ์‚ฌํ•ญ ์„ค์น˜ ์˜ค๋ฅ˜

!pip install mxnet
!pip install gluonnlp pandas tqdm
!pip install sentencepiece
!pip install transformers==3
!pip install torch

Building wheels for collected packages: tokenizers, sacremoses
  error: subprocess-exited-with-error
  
  × Building wheel for tokenizers (pyproject.toml) did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Building wheel for tokenizers (pyproject.toml) ... error
  ERROR: Failed building wheel for tokenizers
  Building wheel for sacremoses (setup.py) ... done
  Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895241 sha256=a8bb676cc707cfe8e9bb31eb0a6c6d4f4f5aace8c64280d6397835bfe8d6fb0c
  Stored in directory: /root/.cache/pip/wheels/00/24/97/a2ea5324f36bc626e1ea0267f33db6aa80d157ee977e9e42fb
Successfully built sacremoses
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (2.0.1+cu118)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch) (3.12.2)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch) (4.6.3)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch) (1.11.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch) (3.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch) (3.1.2)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch) (2.0.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch) (3.25.2)
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch) (16.0.6)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch) (1.3.0)

 

 


 

2. onnxruntime ๋ฒ„์ „ ์˜ค๋ฅ˜

!pip install git+https://git@github.com/SKTBrain/KoBERT.git@master

INFO: pip is looking at multiple versions of kobert to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement onnxruntime<=1.8.0,==1.8.0 (from kobert) (from versions: 1.12.0, 1.12.1, 1.13.1, 1.14.0, 1.14.1, 1.15.0, 1.15.1)
ERROR: No matching distribution found for onnxruntime<=1.8.0,==1.8.0

 

 


 

3. onnxruntime 1.8.0 ์ดํ•˜ ๋ฒ„์ „ ์„ค์น˜ ์‹œ ์˜ค๋ฅ˜

!pip install onnxruntime==1.8.0

ERROR: Could not find a version that satisfies the requirement onnxruntime==1.8.0 (from versions: 1.12.0, 1.12.1, 1.13.1, 1.14.0, 1.14.1, 1.15.0, 1.15.1)
ERROR: No matching distribution found for onnxruntime==1.8.0

 

2๋ฒˆ๊ณผ ๋™์ผํ•œ ์—๋Ÿฌ๋ฉ”์‹œ์ง€ ์ถœ๋ ฅ

 


 

4. sentencepiece ์„ค์น˜ ์˜ค๋ฅ˜

!pip install sentencepiece==0.1.91

ํ˜น์‹œ ๋ชฐ๋ผ ์บ์‹œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œํ•œ ํ›„ ๋‹ค์‹œ ์‹œ๋„ํ•ด๋ณด์•˜์ง€๋งŒ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•จ

 


 

5. tokenizers wheel ๋นŒ๋“œ ์˜ค๋ฅ˜

!pip install transformers==4.8.2

Building wheels for collected packages: tokenizers, sacremoses
  error: subprocess-exited-with-error
  
  × Building wheel for tokenizers (pyproject.toml) did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Building wheel for tokenizers (pyproject.toml) ... error
  ERROR: Failed building wheel for tokenizers
  Building wheel for sacremoses (setup.py) ... done
  Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895241 sha256=ba86428ff071e9f7fa693d22dda1a6d4ca4a716204726eb6fc5032cfc61e0788
  Stored in directory: /root/.cache/pip/wheels/00/24/97/a2ea5324f36bc626e1ea0267f33db6aa80d157ee977e9e42fb
Successfully built sacremoses
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects