extract cornell corpus to this folder