Research
-
Future Events as
Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs (2024)
Sara Price, Arjun Panickssery, Sam Bowman Asa Cooper Stickland
github | arXiv
We investigate whether current LLMs meaningfully distinguish between past and future events. We then train LLMs to behave differently when encountering future events as a proxy for signals the models are in deployment. This builds on the original Sleeper Agents paper from Anthropic by studying deceptive models that have a more complex trigger (i.e. a temporal distributional shift).
Testing Robust Image Understanding Through Contextual Phrase Detection (2022)
Aishwarya Kamath*, Sara Price*, Jonas Pfeiffer, Yann LeCun Nicolas Carion*
We introduce TRICD, a novel dataset and task for evaluating computer vision models' ability to detect objects based on contextual phrases. By requiring models to consider full sentence context when detecting objects, TRICD reveals limitations in state-of-the-art models' contextual understanding and proposes Contextual Phrase Detection as a more comprehensive benchmark for visual reasoning capabilities.
SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networks (2022)
Ajay Subramanian, Sara Price, Omar Kumbhar, Elena Sizikova Najib J. Majaj Denis J. Pelli
github | arXiv
We create SATBench, a large-scale dataset and benchmark for evaluating the speed-accuracy tradeoff in object recognition by both humans and dynamic neural networks. We collected timed object recognition data from 148 human observers on ImageNet images under various conditions, and compare this to the performance of several dynamic neural network architectures. By proposing metrics to analyze the tradeoff between speed and accuracy, the work aims to bridge the gap between human and machine vision models in capturing this key aspect of visual processing.
Applying Self Debiasing Techniques to Toxic Language Detection Models (2022)
Sara Price, David May, Pedro Galarza, Pavel Gladkevich
github
We apply self-debiasing techniques to toxic language detection (TLD) models to mitigate unfair censorship impacts on minority populations. We finetune RoBERTa and XLM models on a TLD task using a self-debiasing framework that doesn't require prior knowledge of biases. We evaluate these models on challenge datasets and find that debiased models show improved out-of-distribution performance and reduced gender bias, though racial bias is exacerbated. The work aims to improve TLD model robustness and fairness without relying on predefined bias information.
Inequitable Access to EV Charging Infrastructure (2021)
Hafiz Anwar Ullah Khan, Sara Price, Charalampos Avraam, Yury Dvorkin
github | arXiv
In our study, we analyzed the distribution of electric vehicle charging stations across New York City zip codes in relation to socio-demographic and transportation features. We found that charger density is not correlated with population density, but rather is skewed against low-income and Black-identifying neighborhoods while favoring areas with highways. Using public datasets, we performed statistical analyses to uncover these relationships. Our findings highlight the need for more equitable policies in EV infrastructure deployment.
Contact
Email: sara.price1461@gmail.com
LinkedIn | GitHub | Google Scholar | Twitter