On the Flip Side: Identifying Counterexamples in Visual Question Answering
Visual question answering (VQA) models respond to open-ended natural language questions about images. While VQA is an increasingly popular area of research, it is unclear to what extent current VQA architectures learn key semantic distinctions between visually-similar images. To investigate this question, we explore a reformulation of the VQA task that challenges models to identify counterexamples: images that result in a different answer to the original question. We introduce two methods for evaluating existing VQA models against a supervised counterexample prediction task, VQA-CX. While our models surpass existing benchmarks on VQA-CX, we find that the multimodal representations learned by an existing state-of-the-art VQA model do not meaningfully contribute to performance on this task. These results call into question the assumption that successful performance on the VQA benchmark is indicative of general visual-semantic reasoning abilities.
Semantic Projection: Recovering Human Knowledge of Multiple, Distinct Object Features from Word Embeddings
Word embeddings support measurement of the semantic similarity between objects via cosine similarity or euclidean distance. However, human judgments about object similarities are highly context-dependent and involve multiple, distinct semantic features. For example, dolphins and alligators appear similar in size, but differ in intelligence and aggressiveness. Could such context-dependent relationships be recovered from word embeddings? To address this issue, we introduce a powerful, domain-general solution: "semantic projection" of word-vectors onto lines that represent various object features. Our results from a large-scale Mechanical Turk study show that this method recovers human judgments across a range of object categories and properties.
Reinforcement Learning in Super Mario Bros.
We implemented a reinforcement learning agent to play Super Mario Bros. using the OpenAI Gym environment. We tested the performance of three RL algorithms: discrete Q learning, approximate Q learning, and approximate SARSA. While our agent is able to beat World 1-1, we discovered that SMB is a very difficult AI problem due to the extremely large state space. We discuss this problem and other challenges in our paper.
Generating Hallucinations with Deep Boltzmann Machines
In 1760, Swiss philosopher Charles Bonnet published a manuscript detailing the bizarre visual hallucinations his grandfather experienced after losing his sight. Charles Bonnet Syndrome occurs when the brain's homeostatic mechanisms attempt to compensate for a prolonged lack of visual input, producing spontaneous, complex hallucinations. In this project, we recapitulated a study by Reichert et al. (2013) that uses Deep Boltzmann Machines to produce a generative model of visual hallucinations.
Structure discovery with semantic vectors
Distributional semantic models represent word meanings in a high-dimensional vector space.
However, because these vector spaces typically encompass several hundred dimensions, their structure remains an active area of research.
In this project, undertaken at the Fedorenko Lab, we use unsupervised machine learning algorithm to automatically map and discover various category-based substructures within the GloVe vector space.
Perceptual annotation through eyetracking
Despite recent breakthroughs in deep learning, humans remain significantly better than machines for many problems in computer vision -- for instance, in identifying partially-obscured faces in images.
The goal of perceptual annotation research, undertaken at the Cox Lab, is to measure human performance on various computer vision problems, and to use this psychophysical data to improve the performance of machine learning algorithms.
By tracking participants' eye movements as they view images, we can better understand which features of images are most salient across different image identification tasks.
Transfer learning in convolutional neural networks
One key component of human perceptual learning is the ability to rapidly generalize acquired knowledge across categorical domains.
In contrast, modern neural networks require thousands of training iterations on vast datasets.
In this project, undertaken at Harvard, we conducted a systematic study of convolutional neural nets’ ability to generalize perceptual knowledge from one object classification task to another.
Imitating celebrity tweets with AI
We developed a software system that learns over a Twitter user's timeline to generate novel tweets in their style. Our goal was to explore the viability of using a K-dimensional Markov model to generate reasonable tweets.
Hyper-local social network for Harvard dining
Berg aims to create a better sense of community for the 1,700+ freshmen at Harvard by connecting students with their friends at mealtimes. Berg is available as an iOS and mobile web app.
Live crisis break feed for Model UN
Model UN crisis committees simulate international relations through fast-paced "crisis breaks."
I developed an open-source Live Crisis Tool, based on the meteor.js framework, that allows directors to display crisis updates in real time to their committee members via a Twitter-style UI.