Richard Paul Yim

PhD Student in Biostatistics at UNC Chapel Hill

Publications

Published BMJ Digital Health & AI · 2025

Optimal strategies for adapting open-source large language models for clinical information extraction: a benchmarking study in the context of ulcerative colitis research.

R. Yim, A. Silverman, S. Wang, V. Rudrapatna.

bmjdigitalhealth.bmj.com
In Revision AIP – Chaos · 2024

An interpretable latent linear model for nonlinear coupled oscillators on graphs.

A. Goyal*, Z. Wu*, R. Yim, B. Chen, Z. Xu, H. Lyu. (*equal contribution)

arxiv.org/abs/2311.14910
Published Scientific Reports – Nature · 2022

Learning to predict synchronization of pulse coupled oscillators on heterogeneous graphs.

R. Yim*, H. Bassi*, R. Kodukula, C. Zhu, H. Lyu. (*equal contribution)

nature.com/articles/s41598-022-18953-8
Published SIAM Undergraduate Research Online · 2021

Statistical Learning for Best Practices in Tattoo Removal.

R. Yim.

siam.org

Experience

University of California, San Francisco (UCSF) Aug 2023 – Aug 2025 · San Francisco, CA

Clinical Data Scientist

  • Researched performance of open-source (Hugging Face) and closed-source (OpenAI) LLMs for information extraction from EHR-derived clinical notes, including QLoRA fine-tuning on HPC clusters.
  • Developed longitudinal models for early diagnosis of systemic mastocytosis in collaboration with Mayo Clinic using PyTorch and caret.
  • Worked within Bakar Computational Health Sciences Institute (BCHSI) under Dr. Vivek Rudrapatna on inflammatory bowel disease research.
Principal Financial Group May 2022 – Aug 2022 · Sacramento, CA

Data Analyst Intern

  • Applied nonparametric methods — spectral clustering to random forests — via AWS SageMaker and Athena, segmenting financial advisors and clients by behavioral and demographic data.
  • Developed an unsupervised clustering pipeline over 160K+ SEC-registered financial advisors for targeted marketing; results formally integrated into the Sacramento sales office.
  • Communicated critical and actionable findings to nontechnical internal stakeholders.
Homeboy Industries Dec 2020 – Jul 2021 · Los Angeles, CA

Statistics Research Intern

  • Awarded $2,000 for a cohort study in joint collaboration with UCLA Computational Applied Mathematics and USC Keck School of Medicine.
  • Performed end-to-end analysis on six years of longitudinal data on 500+ tattoo-removal patients using regression and nonparametric methods.
  • Presented findings to laser-removal clinicians, recommending conservative treatment practices.

Education

University of North Carolina at Chapel Hill 2025 – 2030

Ph.D. in Biostatistics

NIEHS T32 Predoctoral Trainee in Environmental Health Biostatistics

University of California, Davis 2021 – 2023

M.S. in Applied Mathematics

Thesis: Neural Profiles of Two- and Three-State Cellular Automata on Two-Dimensional Lattice Domains — escholarship.org

Graduate coursework in Biostatistics/Statistics: Probability Theory I/II, Survival Analysis, GLMs, Longitudinal Data Analysis, Clinical Trials Design, Computational Statistics, Causal Inference.

University of California, Los Angeles 2017 – 2021

B.S. in Mathematics and Statistics (Double Major)

Honors Linear Algebra, Honors Algebra I/II, Real Analysis I/II, Graduate-level Numerical Linear Algebra, Mathematical Statistics, Statistical Learning, Reinforcement Learning.

Teaching

University of California, Davis Sep 2022 – Jun 2023 · Davis, CA

Teaching Assistant

  • Fall 2022: Mat 17A (Calculus for Biology/Medicine), Mat 21A (Calculus)
  • Winter 2023: Mat 17B (Calculus for Biology/Medicine), Mat 21C (Calculus)
  • Spring 2023: Mat 127B (Real Analysis II), Mat 17B (Calculus for Biology/Medicine)
American River College Nov 2022 – Jun 2023 · Sacramento, CA

Instructor Assistant

  • Tutor for remedial support courses, holding several weekly office hours sessions.
Olga Radko Endowed Math Circle at UCLA Sep 2018 – Aug 2021 · Los Angeles, CA

Lead Instructor

  • Lead instructor for Intermediate 2A students (6th–7th graders), in-person and over Zoom.
  • Organized and curated weekly lessons in logic, geometry, and graph theory.

Skills

Programming

R Python SQL Bash MATLAB JS / HTML / CSS

Tools & Infrastructure

Docker Git GNU/Linux AWS (S3, EC2, SageMaker) Azure VM HPC / SLURM Emacs / Vim LaTeX

Frameworks & Libraries

PyTorch TensorFlow / Keras Hugging Face Scikit-Learn Tidyverse caret SciPy RStan Three.js

Modeling

LLMs / QLoRA GNNs Neural Networks GLM / GEE Proportional Hazards Mixed Effects Bayesian Hierarchical Reinforcement Learning Dimensionality Reduction

About

I'm a PhD student in Biostatistics at the University of North Carolina at Chapel Hill, supported by the NIEHS T32 Predoctoral Trainee program in Environmental Health Biostatistics. Before that, I completed an M.S. in Applied Mathematics at UC Davis and a double major in Mathematics and Statistics at UCLA.

My research interests are in statistical methodology and machine learning, particularly as applied to clinical and environmental health data. Previously I worked as a Clinical Data Scientist at UCSF, where I worked on LLM-based information extraction from EHR data and longitudinal modeling for rare disease diagnosis.

Outside of work I enjoy going to the gym, cooking at home, and listening to good music.

yim.richardp@gmail.com