Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

How to get your mainframe's data for Hadoop analytics

Andrew C. Oliver | July 1, 2016
IT's mainframe managers don't want to give you access but do want the mainframe's data used. Here's how to square that circle

Technique 2: ODBC/JDBC

No mainframe team has ever let me do this in production, but you can connect with ODBC or JDBC direct to DB2 on the mainframe. This might work well for an analyze-in-placestrategy (especially with a distributed cache in between). Basically, you have a mostly normal database.

One challenge is that, due to how memory works on the mainframe, you are unlikely to get multiversion concurrency (which is relatively new to DB/2 anyhow) or even row-level locking. So watch for those locking issues! (Don't worry -- the mainframe team is highly unlikely to let you do this anyway.)

Technique 3: Flat-file dumps

On some interval, usually at night, you dump the tables to big flat files on the mainframe. Then you transmit them to a destination (usually via FTP). Ideally, after writing you move them to another filename so that it is clear they are done as opposed to still in transmission. Sometimes this is push and sometimes this is pull.

On the Hadoop side, you use Pig or Spark, or sometimes just Hive, to parse the usually delimited files and load them into tables. In an ideal world, these are incremental dumps but frequently they are full-table dumps. I've written SQL to diff a table against another to look for changes more times than I like to admit.

The advantage to this technique is there is usually no software install, so you can schedule this at whatever increment you prefer. It is also somewhat recoverable because you can dump a partition and reload a file whenever you like.

The disadvantage is that this technique is fairly brittle and the impact on the mainframe is bigger than is usually realized. One thing I found surprising is that the tool to do this is an option for DB2 on the mainframe, though it costs a considerable amount of money.

Technique 4: VSAM copybook files

Although I haven't seen the latest "Independence Day" movie (having never gotten over the "uploading the Mac virus to aliens" thing from the first one), I can only assume the giant plot hole was that the aliens easily integrated with defense mainframes and traversed encoding formats with ease.

Sometimes the mainframe team is already generatingVSAM/copybook file dumps on the mainframe in the somewhat native EBCDIC encoding. So, this technique has most of the same drawbacks as the flat-file dumps, with the extra burden of having to translate them as well.

There are traditional tools like Syncsort, but with some finagling the open source tool Legstar also works. However, a word of caution: If you want commercial support from Legsem (Legstar's maker), I found it doesn't respond to email or answer its phones. That said, the code is mostly straightforward.

 

Previous Page  1  2  3  Next Page 

Sign up for CIO Asia eNewsletters.