Skills

Python3

Over the past four years, I have worked on numerous college projects and personal research using Python. Have various experience recreating Java or C++ written structure in Python language, and have no difficulty understanding algorithm/codes written by others.

PyTorch

Along with crucial Pytorch related libraries such as Pandas, NumPy and TensorBoard, I worked with Pytorch on Transformer based deep learning NLP projects and practiced with RNN models. Capable of performing preprocessing, training, hyperparameter search. Recent work with PyTorch include STS API project, MatchSum and Hate Speech classification.

Java

I have 2 years of experience, as I have worked on various assignments during my college years using Java. Do not have a wide range of library usage, but comfortable with implementing desired algorithms.

MySQL

MySQL has been used for creating a database in Insurance Database proejct. Currently, capable of writing/reading queries along with creating tables and loading data.

R

I have experience using R for data analysis and visualization during college project and assignment. In one of college project, have experience using appropriate plotting for data visualization, Diagnostic plots with regression, ANOVA table, Box-Cox power transformation, log transformation and T-Test for data analysis of entry data into Hawaii.

HTML/PHP

HTML and PHP were used in a local setting to visualize queries for insurance database project and often used for personal uses such as creating protfolios and personal websites using CSS frameworks.

Independent Projects

Korean Semantic Textual Similarity API

Created a model scoring semantic textual similarity between two sentences, using Klue/Roberta-large pre- trained model along with the klue-sts dataset. Roberta model was used in a weight sharing Siamese structure, producing mean pooling value for each of the two outputs, which would be used to calculate the cosine similarity between the input sentences within the range of 0 to 5. The best model found through random searching of hyper parameters showed pearsonr correlation of 0.882, along with F1 score of 0.835, given a threshold of three for the initial prediction. The model was implemented as a Rest API, predicting the STS score for 1:1 , 1:N sentence inputs.

MatchSum

Focused on preprocessing and implementing MatchSum extractive text summarization model without Gine- tuning process. As BertSum trained in Korean dataset is not available, modiGied version of Greedy-Selection algorithm from BertSum was used to score and prune sentences for each articles. MatchSum, as described in the paper, matches CLS token vector of the original text with candidate summary created from pruned article along with gold summaries written by human. Although no functioning summarization model was produced, the project helped to educate myself with both MatchSum and BertSum along with oracle summary generating algorithms and deeper understanding on the Rouge scores.

Korean Hate Speech Detection

Using a Korean Hate Speech Dataset, Gine-tuned a pre-trained KcELECTRA-base model to classify a short comment to one of three labels (Hate/Offensive/None). For multi-class classiGication, the model include two linear classiGication layer which uses the CLS token output from Electra’s discriminator. Fine-Tuned model showed an accuracy of seventy four percent for its validation set. Having the model purposefully Gine-tuned upon an unchanging hyper parameters, the project allowed to observe the affects of altering preprocessing data, model’s parameters and different versions of the KcELECTRA model.

Boyer-Moore Algorithm for pattern matching

Implemented Boyer-Moore algorithm on Python for exact pattern matching. The algorithm consists of Bad Character heuristic and Good Suffix heuristic to determine the optimal shift of the pattern as it is matched to the text.

Academic Projects

Automated Reasoning

Created a Backus-Naur Form structure for representing models. The first inference method, truth-table enumeration generates all possible variation of Atomic Sentences with truth values to determine if the knowledge base entails the query. Second inference method converts the grammer of knowledge base and query into conjunctive normal form for proving entailment of query by resolution.

Uncertain Inference

The project was based on Bayesian Network structure implemented with Python. Exact Inference algorithm uses enumeration to return posterior distribution of each query variable based on given evidence. Two algorithms were implemented for Approximate Inference. Rejection sampling generates all possible events based on prior distribution and rejects the events inconsistent with given evidence, while Likelihood-Weighting algorithm generates only the events consistent with the evidence, improvement over the inefficiency of rejection sampling.

Decision Tree Learing & Linear Classifiers

Implemented an entropy based decision tree which maximizes the information gain when selecting attributes. Invalid Nodes were pruned during the training to prevent overfitting while monitoring the tree's accuracy on the data. Additionally, a classifier that applies Perceptron Learning Rule with Hard THreshold, and a classifier based on logistic regression were implemented and trained. The project allowed to observe the effects of altering learning rates on the training curve of each model, as well as the effects of clean and noisy datasets.

Insurance Database

Using MySQL, HTML and PHP, the project was to create a website on a local server capable of managing health insurance data, consisting of six entities and Give relationships. Main functionalities support editing and viewing of non-key attributes and providing insurance plan recommendations based on health conditions. The project helped practice generating an ER diagram, mapping relationship types, and displaying a proper view of the database based on granted privileges of the account.

Get in touch