Protiva Rahman

I am a Research Assistant Professor in the Health Outcomes and Biomedical Informatics Department at the University of Florida, with an adjunct appointment in Biomedical Informatics at Vanderbilt University. I am interested in building tools to democratize data science, with a focus on empowering domain experts in healthcare. My research spans the disciplines of data science, natural language processing, and human-computer interaction in domains of infectious diseases and cancer. My overarching goal is to bridge the gap between data science and domain experts. To this end, my PhD thesis presented a framework to amplify expertise in clinical data pipelines while my postdoctoral work aimed to automate data extraction from unstructured texts such as EHR notes and biomedical literature. I received my PhD in Computer Science from The Ohio State University in August 2020, under the supervision of Arnab Nandi and Courtney Hebert and completed my postdoctoral training at the Vanderbilt University Medical Center.

  • June 2023: Excited to join University of Florida as a Research Assistant Professor in Biomedical Informatics!
  • March 2023: New JAMIA paper on automating immunocheckpoint inhibitor induced colitis from the EHR!
  • Nov 2022: Presenting a poster and paper at AMIA on automated data extraction from biomedical literature!
  • Oct 2022: I am on the academic job market! Please reach out if you have openings in Biomedical Informatics or Computer Science.
  • June 2022: Submitted my first K99/R00 grant to NCI!
  • Nov 2021: Megan and Rachel are presenting our work on the novel PRESCIANT method at AHA[paper]
  • Oct 2021: Excited to attend AMIA in person after two years and present our work on Automating Data Curation from Unstructured Text[pdf]
  • July 2021: Our ASCO poster was highlighted in Oncology Nurse Advisor
  • June 2021: Presenting our work on Predicting Brain Mets in NSCLC patients at ASCO 2021
  • Nov 2020: Virtually presenting a podium abstract on my thesis work at AMIA
  • Sept 2020: Moved to Nashville and started my postdoc at Vanderbilt University Medical Center
  • Aug 2020: I successfully defended my Ph.D. thesis!
  • July 2020: Our framework for Amplifying Domain Expertise was accepted to JMIR Medical Infomatics
  • Jan 2020: Our review paper on Evaluating Interactive Data Systems was published in the VLDB Journal
  • April 2019: Excited to be interning with NIH/NLM/NCBI Pubmed Labs this summer
  • Dec 2018: Transformer was accepted to IUI 2019
  • Aug 2018: Presenting our work on visualizing rules at DSIA on October 21st
  • July 2018: Icarus got accepted to VLDB 2018, see you in Rio
  • May 2018: Presenting our tutorial on Evaluating Interactive Data Systems at SIGMOD on June 12th
  • April 2018: Participating at the Ph.D. Consortium at IEEE ICHI in May

Selected Publications

Automated Curation of Checkpoint Inhibitor Side-Effects from EHR
Protiva Rahman, Cheng Ye, Kathleen Mittendorf, Michele LeNoue-Newton, Christine Micheel, Jan Wolber, Travis Osterman, Daniel Fabbri. JAMIA 2023. [pdf][poster]

While checkpoint inhibitors (CPI) have improved cancer treatment, they cause side-effects such as CPI-induced colitis which reduce patient's quality of life. Predictive models for CPI-induced colitis can allow physicians to adapt their care. However, these models require data extracted from the EHR since CPI-induced colitis do not have clear diagnosis codes and keyword search returns over 200,000 notes with a 10% accuracy. In this work we present data pipelines which extract EHR notes related to CPI-induced colitis with 84% precision and reduced note review load by 75%.

Semi-Automated Data Curation from Biomedical Literature
Protiva Rahman, Rachel Xiang, Yihua Wang, Joshua Beckman, Quinn Wells, Iris Jaffe, Megan Shuey, Daniel Fabbri. AMIA 2022. [pdf]

Aggregating data from preclinical studies to identify and validate novel pathways in humans using the PRESCIANT method [paper] requires data extraction from biomedical literature. In this work, we present a preliminary tool which uses NLP methods to automatically extract and populate a web form which can be reviewed by curators. Our NLP model has a 70% accuracy on categorical fields and our curation tool accelerates task completion time by 49% compared to manual curation.

Amplifying Domain Expertise in Clinical Data Pipelines
Protiva Rahman, Arnab Nandi, Courtney Hebert. JMIR Medical Informatics 2020. [pdf]

Domain expertise is needed at all stages of the data pipeline from collection to cleaning and not just at the analysis stage. While tools exist to incorporate domain expert feedback to a limited stage for the analysis stage, tools at earlier pipeline steps focus on data scientist and neglect domain experts who have different needs and skills. We present a framework to amplify domain expertise through the entire data pipeline. This involves building systems that summarize data, guide the user, allow interactive updates, and accelerate domain expert's task. Expertise amplification is demonstrated with a case study.

Evaluating Interactive Data Systems
Protiva Rahman, Lilong Jiang, Arnab Nandi. VLDB Journal 2020. [pdf][tutorial slides][github]

We extensively survey the literature to guide the development and evaluation of interactive data systems. We highlight unique characteristics of interactive workloads, discuss confounding factors when conducting user studies, and catalog popular metrics for evaluation. We further delineate certain behaviors not captured by these metrics and propose complementary ones to provide a complete picture of interactivity. Our survey and case studies motivate the need for behavior-driven evaluation and optimizations when building interactive interfaces. This work can provide a starting point for newcomers in the field of interactive data systems.

TRANSFORMER: A Database-Driven Approach to Generating Forms for Constrained Interaction
Protiva Rahman, Arnab Nandi. ACM IUI 2019 (acceptance rate: 25%). [pdf][slides]

We present Transformer, a system that leverages the contents of the database to automatically optimize forms for constrained input settings. Our cost function models the user input effort based on the schema and data distribution. This is used by Transformer to find the user interface (UI) widget and layout with ideal input cost for each form field. We demonstrate through user studies that Transformer provides a significantly improved user experience, with up to 50% and 57% reduction in form completion time for smartphones and smartwatches respectively.

ICARUS: Minimizing Human Effort in Iterative Data Completion
Protiva Rahman, Courtney Hebert, Arnab Nandi. VLDB 2018 (acceptance rate: 17%). [pdf][website]

Icarus is an interactive data completion system that shows the user useful subsets for edits. It leverages the database structure to generalize the user's single edit to multiple cells through suggested rules, thereby reducing user effort and time. It is currently being used to complete multiple microbiology datasets at Ohio State's Biomedical Informatics department.

Exploratory Visualizations of Rules for Validation of Expert Decisions
Protiva Rahman, Jian Chen, Courtney Hebert, Preeti Pancholi, Mark Lustberg, Kurt Stevenson, Arnab Nandi. DSIA workshop at IEEE VIS 2018. [pdf][slides]

We present an interactive visualization system which allows users to explore relationships between rules, such as conflicts and redundancies, and see how applying the rule would affect the data. Our design allows users to interactively resolve conflicts by providing feedback on the correctness of each rule. Results of interacting with a rule are immediately applied to the data and redundant rules automatically removed.

Conference Presentations