I am Erchi Zhang. I am a second-year graduate student pursuing an M.S. in Data Science at New York University, and I have obtained my B.S. in Computer Science at Brandeis University in 2023. I am actively seeking 2025 new grad opportunities in data science, software development engineering, machine learning, data analysis, or any relevant field. I am proficient in Python, Java, JavaScript, R, SQL, HTML/CSS and Shell script. During my undergraduate studies, I have coauthored two data science research papers, one is about using Graph adaption BERT to detect malicious behaviors on Twitter and another one is about assisting any GNNs to distinguishable representations without unbiased attributes. In my GitHub repositories I mainly post my projects completed during my graduate studies:
- Convert Deck to CPT Codes: AI-Driven Reimbursement Code Discovery for Health Tech Startups, in which I have built a website for processing PDF pitch decks and returning relevant Current Procedural Terminology (CPT) codes using AI. We applied Named Entity Recognition (NER) to extract key information from PDFs and utilized Retrieval-Augmented Generation (RAG) for accurate CPT code recommendations.
- Fixplainer: Failure Explainer for Multiple Object Tracking (MOT), in which I have developed a GUI tool with teammates to use SHAP explainers to explain/comprehend failures in Multiple Object Tracking (MOT) tasks. Our paper can be found here.
- JEPA Model for Agent Trajectory Prediction, in which I have implemented and trained a recurrent JEPA model to predict the trajectories of the moving agents. This is the final project assignment for NYU's DS-GA 1008 Deep Learning course, and our model's performance ended up in the 1st quartile in the class, giving us a full score in this project.
- Billionaire Data Analysis, in which I have collaborated with my teammates to conduct a comprehensive data analysis via Python on a Billionaires Statistics Dataset from Kaggle.
- Spotify Songs Data Analysis, in which I have collaborated with my teammates to perform Data Analysis on a Spotify songs dataset; we have applied various techniques including multiple linear regression, lasso/ridge regression, significance tests, PCA, K-means clustering, logistic regression, SVM, Random Forest, MLP neural network, recommendation system, and so forth, to solve each of the given questions.
- Data Visualizations, in which I have applied D3.js, NotebookJS, and some Python tools to plot static as well as dynamic graphs for visualizations in Machine Learning.