We want to blacklist only known, consistent failures. We should deflake tests rather than ignoring them.