As a first step I would probably try it against uncompressed text as a reference point.
Second step I would try it against a splittable format like bzip2 or a read-optimized format like ORCFile (best performance here + high compression). We have heard from some users that ORCFile works well on S3 FWIW.
I’m not sure what you mean by indexing. Indexing in Hive doesn’t necessarily work the way it works in other systems so this could explain the strange behavior you saw.
(P.S. c24? Apologies if that makes no sense)