People have a natural knack for focusing on what a single person is saying, even when there are competing conversations in the background or other distracting sounds. For instance, people can often make out what is being said by someone at a crowded restaurant, during a noisy party, or while viewing televised debates where multiple pundits are talking over one another. To date, being able to computationally—and accurately—mimic this natural human ability to isolate speech has been a difficult task.