Orchestration and more
Virtually any of these techniques will require some kind of orchestration, which I've covered before. I've had more than one client require me to write that tool in shell scripts or, worse, Oozie (which is Hadoop's worst-written piece of software and all copies should be taken out to the desert and turned into a Burning Man statue). Seriously, though, use an orchestration tool rather than writing your own or leaving it implicit.
Just because there are patterns doesn't mean you should write this from scratch. There are certainly ETL tools that do some or most of this.
To be fair, frequently the configuration and mapping required makes you wish you had done so in the end. You can check out anything from Talend to Zaloni that might work better than rolling your own.
The bottom line is that you can use mainframe data with Hadoop or Spark. There is no obstacle that you can't overcome, via no-install to middle-of-the-night to EBCDIC techniques.
As a result, you don't have to replace the mainframe just because you've decided to do more advanced analytics from enterprise data hubs to analyze-in-place. The mainframe team should like that.
Sign up for CIO Asia eNewsletters.