Technically Speaking

The Official Bigstep Blog

 

Offloading Mainframe Data Into Hadoop: 4 Things You Need to Know

For those who have spent the last decade steeped in all things cloud, virtualized environments, and Hadoop ecosystems, it may come as a shock that some 70 to 80 percent of the world's business transactions are still handled by the mainframe. About 71 percent of all Fortune 500 companies are customers of the tremendously successful System z, the flagship of mainframe computing. The mainframe isn't dead, and isn't likely to be anytime soon. Mainframes are incredibly stable, unbelievably secure, and deliver an impressive level of performance.

For those who have spent the last decade steeped in all things cloud, virtualized environments, and Hadoop ecosystems, it may come as a shock that some 70 to 80 percent of the world’s business transactions are still handled by the mainframe. About 71 percent of all Fortune 500 companies are customers of the tremendously successful System z, the flagship of mainframe computing. The mainframe isn’t dead, and isn’t likely to be anytime soon. Mainframes are incredibly stable, unbelievably secure, and deliver an impressive level of performance.

Still, there’s ALL THAT DATA. Even seasoned mainframers are excited about the potential for offloading mainframe data into Hadoop to get all the goodie unlocked for improved business intel, operational intel, and customer insight. But getting the data from Point A (hi, mainframe) to Point B (Hadoop) is not a trivial matter. There are several options that you can discuss with your mainframe team and big data team to come to a conclusion about the best route for offloading your mainframe data.

Option #1: Database Log Replication

If your mainframe team won’t faint when you mention installing software on their system, this is generally considered to be the best option for offloading mainframe data into Hadoop.

This option does require installing software on the mainframe (as well as a receiver Hadoop), so expect to field some questions and concerns (potentially even some wailing and gnashing of teeth) from your mainframe team. Log replication works by the database (such as DB2) writing redo logs when it writes to a table. The log-replication software reads those and translates it. It then sends a message to the receiver that is responsible for writing it to Hadoop.

Option #2: Flat-File Dumps

This is done by dumping tables to flat files on the mainframe, and then transferring those to a destination (probably FTP). Next, those flat files are moved to a different filename, so that it’s obvious that the transaction is completed and not still being transferred. This can be done either as a push or as a pull. On the Hadoop end of things, Spark, Pig, or Hive is used to parse the files and load them to tables. This process can usually be done overnight or whenever your mainframe resources are in lowest demand.

Option #3: VSAM Copybook Files

Not unlike flat-file dumps, you can alternately copy files to VSAM. VSAM files can then be imported, exported, what have you. There are several tools available to do this, including Syncsort (which has been in the biz for some time and has a lot of knowledge and reportedly excellent customer service), and Legstar (which has the reputation of being a bit more tedious and is open source, so doesn’t come with much in the way of tech support).

Option #4: ODBC/JDBC

When approaching your mainframe team, remember: it’s their job to safeguard the system that holds the entire organization together. If they’re a bit protective, that’s actually a good thing. Most mainframers realize the potential for Hadoop and want to see offloading and analytics be successful.

This option is mentioned last, because aside from requesting that the mainframe team allow you to—gasp!—install software on their precious system, this one will likely meet the most resistance. However, it is an option. In this solution, you connect with either ODBC or JDBC on the mainframe directly from your database (probably DB2). The drawback is that because of how memory works in mainframe computers, you probably won’t get multiversion concurrency, or even row-level locking. This might be a good option to toss to your mainframe team first, because it’s almost guaranteed to receive an overwhelming and passionate, “No”. Then you can proceed with offering the other options, which will sound comparatively much better. Marketers call this the “door in the face” technique, because after violently slamming the door in your face when they hear this one, they’ll feel guilty if they don’t at least give ear to your following suggestions.

Want to see how other businesses have overcome the mainframe challenges to become successful with Hadoop and big data? Read about our customer stories, and then become a success story of your own with Bigstep.

Got a question? Need advice? We're just one click away.
Sharing is caring:TwitterFacebookLinkedinPinterestEmail

Readers also enjoyed:

4 Lessons Learned from Yahoo's Massive Hadoop Cluster Setup

Yahoo! has become the largest user of Hadoop, establishing a cluster setup comprised of 10,000 CPUs located in more than 40,000 servers with 4,500 nodes.…

Hadoop: A $1 Trillion Opportunity?

Hadoop's growth and media attention over the past few years pales in comparison only to the biggest tech news, like the advancement of cloud computing…

Leave a Reply

Your email address will not be published.

* Required fields to post your comments.
Please review our Privacy Notice in order to understand how we process your personal data and what are your rights in this respect.