Research Projects

🔬 Research Projects

Explored transfer learning techniques with Hugging Face Transformers to adapt pre-trained models for Korean NLP tasks.
Fine-tuned KoBERT, DistilBERT, and GPT-2 on custom Korean datasets, addressing both emotion classification and dialogue generation.
Experimented with multiple training strategies, including Supervised Fine-tuning (SFT), early stopping, and hyperparameter tuning.
Evaluated model performance using accuracy and qualitative assessments of generated outputs.
Gained practical experience in adapting large pre-trained models to domain-specific tasks and comparing their effectiveness across architectures.
- 🔗 GitHub Repository

Investigated methods for compressing large-scale NLP models to enable efficient on-device inference.
Implemented knowledge distillation from BERT to a compact student model, reducing model size while retaining accuracy.
Applied parameter reduction techniques (e.g., pruning and low-rank factorization) to further optimize resource usage.
Conducted benchmarks on text classification tasks, analyzing the trade-offs between computational efficiency and predictive performance.
Gained insights into the practical deployment of lightweight language models in resource-constrained environments.
- 🔗 GitHub Repository
- 📄 View Full Report (PDF)

Completed an introductory project to build foundational skills in machine learning and deep learning.
Implemented core ML algorithms (SVM, Random Forest, Logistic Regression) on structured datasets, practicing feature engineering and model selection.
Designed and trained basic neural networks using TensorFlow and PyTorch to compare performance with classical ML approaches.
Evaluated models through cross-validation and performance metrics, gaining a solid understanding of trade-offs between classical ML and DL methods.
- 🔗 GitHub Repository

Developed a Korean dialogue system prototype inspired by the Cornell Movie Dialogs dataset, aiming to explore open-domain chatbot capabilities.
Built preprocessing pipelines, including SentencePiece tokenization, subword vocabulary construction, and text cleaning.
Implemented and trained baseline models (Seq2Seq and Transformer-based architectures) for response generation.
Conducted evaluation using BLEU and BERTScore, and analyzed the effects of different decoding strategies (greedy, beam search, sampling).
- 🔗 GitHub Repository

Developed an ESG-integrated credit scoring model to improve financial accessibility for small businesses in Korea.
Incorporated non-financial ESG indicators (energy savings, social contributions, privacy protection) alongside traditional credit factors.
Applied XGBoost and CatBoost for predictive modeling, achieving higher fairness and interpretability.
Proposed policy implications for regional banks to enhance financial inclusion.
- 📄 View Full Report (PDF)

Defined a Green Consumption Index using PCA and clustering techniques on consumption data.
Identified customer segments based on ESG consumption behaviors.
Designed a new green savings account product, linking financial incentives to sustainable consumption.
Proposed marketing and policy strategies to attract young customers (20s–30s) to ESG products.
- 🔗 Notion Page

Performed computational text analysis on Shakespeare’s tragedies and comedies to explore linguistic and thematic differences.
Quantified the frequency of “death” and related semantic fields, highlighting genre-specific lexical patterns.
Applied sentiment analysis to compare emotional tones across the two genres.
Showcased the integration of an English Literature background with NLP and Data Science techniques, bridging the gap between humanities and computational research.
- 📄 View Full Report (PDF)

Across these projects, my research contributions can be summarized into three main themes:

NLP & Large Language Models – Building and fine-tuning models for dialogue systems, text classification, and multilingual NLP, with a focus on model compression and efficient deployment.
Applied AI for Social Impact – Applying machine learning to domains such as ESG-based credit scoring and green financial product design, demonstrating how AI can support financial inclusion and sustainability.
Interdisciplinary AI – Bridging English Literature and Data Science through projects like Shakespeare text mining, showing how computational methods can enrich traditional humanities research.

Together, these projects highlight my interest in advancing practical, efficient, and socially meaningful NLP research.