Research Projects
π¬ Research Projects
Hugging Face Transformer Fine-tuning | Sep 2025
- Explored transfer learning techniques with Hugging Face Transformers to adapt pre-trained models for Korean NLP tasks.
- Fine-tuned KoBERT, DistilBERT, and GPT-2 on custom Korean datasets, addressing both emotion classification and dialogue generation.
- Experimented with multiple training strategies, including Supervised Fine-tuning (SFT), early stopping, and hyperparameter tuning.
- Evaluated model performance using accuracy and qualitative assessments of generated outputs.
- Gained practical experience in adapting large pre-trained models to domain-specific tasks and comparing their effectiveness across architectures.
π Mini-BERT Model Compression | Sep 2025
- Investigated methods for compressing large-scale NLP models to enable efficient on-device inference.
- Implemented knowledge distillation from BERT to a compact student model, reducing model size while retaining accuracy.
- Applied parameter reduction techniques (e.g., pruning and low-rank factorization) to further optimize resource usage.
- Conducted benchmarks on text classification tasks, analyzing the trade-offs between computational efficiency and predictive performance.
- Gained insights into the practical deployment of lightweight language models in resource-constrained environments.
π€ Classical Machine Learning and DL Fundamentals | Aug 2025
- Completed an introductory project to build foundational skills in machine learning and deep learning.
- Implemented core ML algorithms (SVM, Random Forest, Logistic Regression) on structured datasets, practicing feature engineering and model selection.
- Designed and trained basic neural networks using TensorFlow and PyTorch to compare performance with classical ML approaches.
- Evaluated models through cross-validation and performance metrics, gaining a solid understanding of trade-offs between classical ML and DL methods.
π¨οΈ NLP Pipeline and Korean Chatbot | Aug 2025
- Developed a Korean dialogue system prototype inspired by the Cornell Movie Dialogs dataset, aiming to explore open-domain chatbot capabilities.
- Built preprocessing pipelines, including SentencePiece tokenization, subword vocabulary construction, and text cleaning.
- Implemented and trained baseline models (Seq2Seq and Transformer-based architectures) for response generation.
- Conducted evaluation using BLEU and BERTScore, and analyzed the effects of different decoding strategies (greedy, beam search, sampling).
π³ ESG-based Credit Scoring Model for Small Businesses | Jun 2025 - Aug 2025
- Developed an ESG-integrated credit scoring model to improve financial accessibility for small businesses in Korea.
- Incorporated non-financial ESG indicators (energy savings, social contributions, privacy protection) alongside traditional credit factors.
- Applied XGBoost and CatBoost for predictive modeling, achieving higher fairness and interpretability.
- Proposed policy implications for regional banks to enhance financial inclusion.
π¦ Shinhan Big Data Hackathon: ESG Financial Product Design | Oct 2024 - Nov 2024
- Defined a Green Consumption Index using PCA and clustering techniques on consumption data.
- Identified customer segments based on ESG consumption behaviors.
- Designed a new green savings account product, linking financial incentives to sustainable consumption.
- Proposed marketing and policy strategies to attract young customers (20sβ30s) to ESG products.
π Shakespeare Text Mining: Tragedies vs. Comedies | Dec 2023
- Performed computational text analysis on Shakespeareβs tragedies and comedies to explore linguistic and thematic differences.
- Quantified the frequency of βdeathβ and related semantic fields, highlighting genre-specific lexical patterns.
- Applied sentiment analysis to compare emotional tones across the two genres.
- Showcased the integration of an English Literature background with NLP and Data Science techniques, bridging the gap between humanities and computational research.
π Research Themes
Across these projects, my research contributions can be summarized into three main themes:
NLP & Large Language Models β Building and fine-tuning models for dialogue systems, text classification, and multilingual NLP, with a focus on model compression and efficient deployment.
Applied AI for Social Impact β Applying machine learning to domains such as ESG-based credit scoring and green financial product design, demonstrating how AI can support financial inclusion and sustainability.
Interdisciplinary AI β Bridging English Literature and Data Science through projects like Shakespeare text mining, showing how computational methods can enrich traditional humanities research.
Together, these projects highlight my interest in advancing practical, efficient, and socially meaningful NLP research.