08 Oct 2014 13 min read Conferences

IBM University 2014 Day 1

Who the fuck starts a conference with a 6:30 am breakfast and an 8 am keynote? Still, an opening rendition of Chasing Twisters by Delta Rae was pretty cool. Good live act - good ability to reproduce their vocals live.

Opening

"We're going to show you how to grow your business and differentiate yourself from your competitors." Um, my competitors are at the conference. Maybe we can play Rock Paper Scissors for the differentiation.

OK, OK, I'm a bad corporate conference attendant.

Keynote: The Infrastructure for Cloud, Data, and Engagement

Tom Rosamilia

Presso starts out with the importance of the cellphone in Africa. This shouldn't be new to anyone reading excited articles from Wired a decade ago. "70% of Africa is unbanked." Mobile-only banking is becoming important to avoid traditional, high-overhead physical money transfer.

"Users expect responses to social inquiry in less than 5 minutes."

"80% of people are prepared to trade personal information for better service." Welp.

"80% of organisational data is unstructured; chosing the right infrastructure will let you make sense of it."

"Hybrid cloud is about combining the system of record with the system of engagement."

System of engagement: mobile; system of record: transactional; system of insight.

VISA do an annual stress test to check what they can do - the most recent example was 4 times their actual peak transaction rate (50K messages per second for the stress test).

"Replace all your storage with flash. The economics are compelling." Memorial Hermann Hospital reduced their storage footprint from 3 racks to a few U. They can do mobile CAT scans from the performance gains.

Solid state and software defined storage are the future. IBM are spending $3 billion on semiconductor research, particularly in this area; that's driven by projected roadblocks in process technology.

Dave McQueeney

Watson playing Jeopardy: no network, access to 100 GB of data. The metadata it generated to understand the core data set for the competition was an order of magnitude higher. Dave notes you should think that you may find the same thing happens once we start analysing our own unexamined data - literally one or two orders of magnitude for the metadata.

Dave is from IBM Research and specialises in cognitive computing, which he characterises as a "different way of thinking" about data. Hits the term "dark data" a lot to refer to unused and underanalysed data in organisations.

"We see problems of security as being essentially a big data problem."

"Data centric systems" - sketches out the problem of moving data from storage class to storage class (memory, disk, cache, etc).

Minimise data motion.
Move compute to the data - move processing engines closer to the data. Cites Hadoop as an example of moving the processing closer to the storage to colve problems metter.
Blue Gener/Q is about 20 petaflops. Dave projects moving to 2000 petaflops by the 2020s.
"We're not just using the petaflops to run a tight FORTRAN loop faster."
"We did an simulation of the human heart 12 times faster than any previous modelling."
Consider: NYC generates 520 TB of surveillance cameras per day. You can't shift that to a central location every day; compute needs to be at the edge.
Telcos have the same problem with their base stations.
Germany generates 50 PB of regulated financial data that needs to remain in Germany. It can't leave the country, not even to the rest of the EU.
Intelligent security
- Contexual insights, in-device, near the edge. Consider that data might be OK on a mobile device on-campus, but it should be locked down if you leave the campus.
- Cognitive models: how do you distinguish between something that looks like hostile behaviour but is actually normal operating behavious.
- Adaptive infrastructure: can I remodel my infrastructure on the fly such that attackers are facing a shifting target.
- Security shouldn't be a backward-looking audit item.
Cognitive computing: to analyse one genome requires astronishing amount of analysis just for one patient. A researcher needs to follow 23 million articles per year to stay current. These are the sorts of problems to assist medicle analysis.
Building "neurosynaptic systems", building silicon systems inspired by neutal designs, at very lowe powers.
Natural Dialogue and argument formation. Turn it lose on 4chan, not Wikipedia.

(I'd love to see Dave have a linux.conf.au keynote.)

Doug Balog

Invites Stephan Akeborg up. Stephan is from IKEA. Flatpack furniture looks more like x86 than a mainframe, I might add. Sorry, that's being a bad conference attendee again, isn't it?

Using capacity on demand on Power systems to work with cyclical patterns in their business. Scale up/scale down. Live partitions mobility and so on. Started cloud this year, using Power for cloud. Building with OpenStack on Power, which is interesting.

Back to Doug: data is a resource we are never going to run out of.

On to the OpenPower foundation. Which is a pretty neat initiative, I might add. Google, NVIDIA, etc. Now up to 60 members. We're now going to get a set of announcements around Power8/OpenPower.

IBM are big on OpenStack.
Power System S824L - Linux-only systems with NVIDIA GPUs for analytic work.
SUSE/RedHat/Ubuntu for Power. Migration from Intel is easy.
Nils Brackmann from SUSE. Seeing growth in this area of this business.
- Announcing SUSE Enterprise Linux 12 Little-endian + POWER 8 next month. Little endian is key for ease-of-porting from x86.
- Partnership with MariaDB as an analytics platform.
2 TB per server. That's nice, but my Intel servers will do that too.
New Power servers. Loud music, flashing lights and... it's a rack of servers. Woo.
Pooling across systems/capacity on demand.
Data Engine for NoSQL. Texas Memory Flash - 40 TB in a few U. Claims to replace 24 Intel boxes for in-memory data. Flash on the bus. It would be wrong to point out this sounds a lot like FusionIO, right?

Glenn Anderson

"An exciting time to be involved in the mainframe."

"Today's mainframe is a hybrid system, and you need to embrace that." Power blades, analytics accelerator for DB/2 offload, zLinux and zOS on the same kit and so on and so forth. You need to stick your head out of the sandbox.

The System Z is the system of record. We need to make the systems of record (which host processes) integrate well with the systems of engagement (which touch people). Linux on System Z is a good fit for the system of engagement, while zOS remains the system of record.

"We have always been focused on the system of record, process-centric capability." System Z needs to be able to track the move to systems of engagement - "the customer facing mainframe". Embrace these things.

So: cloud. "One of my favourite subjects. Everyone is talking about something different." It's a marketing pitch, not a technical descriptor. Well, duh.

"Let's look this up in the dictionary." Let's not.

Ask yourself:

Where does the cloud come from - public, private, hybrid?
What service am I offering - IaaS, PaaS, or SaaS?

Think about the key characteristics and benefits; for example, elasticity.

Where did this crazy interst in cloud come from? Why is everything being branded "cloud"? Because public cloud is easy, and in-house IT is hard, from the perspective of your in-house users. If your users aren't happy, they'll be looking to public cloud. He sees it as analagous to the rise of client-server computing.

(There's a case to be made here that every IT revolution is captured, Animal Farmed, and then generates a new revolution to get freedom back.)

System Z can play in the hybrid/private cloud model, so long as you have well-defined characteristics. On-demand, self-service, resource pooling, and so on, are things mainframes are supposed to do well. As such, part of the problem is how the mainframe is publicised/exposed.

zOS Connect does SOAP and REST. Perfect for the "API economy."

The mainframe can provide cloud and cloud-like services.
Understand why people are going around your tech shop to go to public cloud.
Make sure you're part of the conversation.

Steve Wehr - Mobile

Infrastructure matters for mobile apps, and System Z should be the system of record for mobile apps.

Many of the same concerns about mobile - security, workload changes - are much the same as for the web a decade or more ago. In many ways these are solved problems. We can reuse those techs for mobile.

"Can we consider Web legacy?" Maybe desktop/laptop web compared to mobile web. "New growth is going to happen in mobile."

Everyone has mobile banking - if you're a bank and you don't, you don't exist, right? "You can deposit cheques" - American banking is so cute and quaint. Cheques.

"We've been building enhancements into System Z to make it easy to expose transactions to mobile devices."

JSON
REST APIs - the CICS mobile feature pack. Relevant to my interests.
IMS Mobile Feature pack and Connect feature pack.

Yeah, good mobile apps are a bit more than "12 year olds gluing API bits together", Steve. Interesting that the previous speaker talked about being part of the conversation and easy to work with. Condescending comments ain't what that looks like.

Have zOS as the single source of truth, make it easy for systems of enagement to talk to. Steve has studies that show that systems of record are still cheap and easier than you'd think on zOS.

"You guys figured this out long before IBM did." Yup.

IBM and Apple - it's an exciting thing for IBM. System Z will see the Apple teams talking to the enterprise, with IBM getting to use Apple's interface expertise. They'll be announcing enterprise iOS apps e.g. a flight crew application to replace pilot manuals, route planning, flight crew information and so on.

"Customers aren't waiting around on IBM."

"What does it know to be a social business?" Steve went to The Onion for the answers. Steve does an amusing line in social media on donuts - "Linked In - my skills include eating donuts; Google+ - I'm a Google employee eating a donut."

Having a social media presence doesn't mean you're a social media business. It's a question of how you use social networking concepts to running your business - for example, using the collective intelligence of the business.

Conversations with clients and inside the business.
That reach can drive value in the business, listening to customers and employees.
This is not an IT project. It is a culture shift.
- But IT can help.
- Buy IBM Connections!
- IBM run it on zLinux.

Paul DiMarzio - Big Data on Z

Predictive analytics. The Mainframe has it today, albeit not for crime detection - bank fraud, retail marketing, and so on.

Paul hates the term Big Data even if it's in his job title. "I don't like the term because it's too limiting. People think of it as social stuff, but transactional data is big data." But he does like the term "decision management systems."

So what does this have to do with the mainframe? Well, you're going to hit someone who says "we don't do analytics on the mainframe". The analytic/transaction split is a product of the past - analytics now need to happen in real time for anti-fraud, for example.

After-the-fact analysis is significantly inferior for fraud - predictive analytics consistently saves money, so post facto systems are clearly missing fraud.

The basic problem with today's organisation has been segmenting data into different silos, with data being moved around for historical silos, decision-making silos, scoring silos, and so on. Latency and complexity are the killers.

(Continuing in my bad conference goer performance, I'll note this idea of single-data store, analysing transactions on the mainframe is in line with Dave McQueeny's keynote position that data needs to be analyised at the point of generation, but the idea that you'd put all your other data on the Z is diametrically opposed to it.)

A key point: merging the analytics with the transactional processing is a business change that technology enables.

Mainframe and Hadoop clusters need some clue to work together well - transformation and securing the data going into Hadoop. "Infosphere connector for System Z for Hadoop" is a resold third party product.

Storage Trends and Directions

Susan Schreitmueller

Problems

The number one complain in storage is "meeting SLAs". The second is troubleshooting storage problems. Which is interesting given my lunch buddy and I complaining that SAN storage is 20 or more years behind compute virtualisation in terms of managing IO (iop guarantees and the like).

IBM talks about "CAMS" at the moment - Cloud, Analystics, Mobile, Social. Susan thinks there should be a "Security" in there as well.

Cloud is mostly OpEx, which is a great financial driver. But the workload needs to be well-suited to cloud. Regulatory boundaries can be a big one.
Mobile users expect clients to run faster than web browsers.
Social is strongly generationally defined.

Historically IT management profiles were fairly static: regular analysis and decision-making is done slowly. Now managing performance has to be a continuous process.

The value of performance: it's quite quantifiable; e.g. Coca-Cola found a 1 second delay in page loads is a 7% drop in conversions, with flow-on into losses in customer satisfaction. Quantifying performance like this helps make funding good decisions easier.

Susan echoes common theme that flash attached by SATA/SAS doesn't unlock the full performance benefits of solid state.

Flash and New Products

Hard drives: size goes up, performance goes down in real terms (seek times haven't moved compared to sequential transfer rates, and sequential rates have risen slower than size of disk). And without good IO performance, you're just waiting faster. Susan notes that in virtual farms workloads that are individually sequentially multiplex into a random workload.

So: hard drive is 10 ms latency. SSD is 2 ms. PCIe SSD is 1 ms. IBM are marketing their flash systems as being 0.1 ms. 48 TB per 2U appliance. 8 GB/s bandwidth and 600 W of power consumption. 1.3 million IOPs. Some nice numbers.

The V840 adds a virtualisation layer.

Storwize:

Thin provisioning.
Real time compression - "Compression doesn't work on every workload" - that's refereshingly honest, because most vendors are pushing the message that their magic algorithms all work.
Tiering
Expiry policies at the filesystem layer (tape, cloud, etc), encryption. Doesn't say which filesystems. Advertised as "set-and-forget".
Clusters, including clustering over a distance.
Data replication over IP, with acceleration. Even supports replication "to cloud". Claim it's cheaper than Riverbed.
Business continuity testing without offlining.
Stats and analysis.
New SVC engine for controllers. Pleasingly it looks like they've stopped crippling the cache and processors to be dumber than the 8000-series SANs.

"I'm hoping you'll understand that flash, deployed properly, can be a cost saving in your environment."

There have been some announcements for flash in the 8870 - giving a direct PCIe connect rather than via the SAS interface. 9.2 TB per enclosure, 4 enclosures per systen.

Referring to Gartner and their ilk: I know everyone feels like they have to, but literally every vendor can tell me a segment where they're the market leader by some careful definition.

Software Defined Storage - Elastic Storage

Softlayer, aquired by IBM.

The software claims to manage any underlying storage layer. Lifecycle management for shuffling data on a policy-driven basis; aging data though tiers, backups, offsite via cloud. It tries to provide a single-pane-of-glass view of the storage

Softlayer is based on Openstack; "We have learned, it's an open standard"; "we have seen the light and must play nice with others". All the APIs are OpenStack compatible. "Cinder and Swift in storage" for block and file respectively. "All storage supports Cinder, but not all supports Swift."

A lot of this has been driven by Watson research; 200M pages of unstructured data loaded in minutes; 400% acceleration of Hadoop analytics. "I don't care what industry you're in. You can benefit from this." Medical diagnosis, fraud prevention, etc.

"Tying into Tivoli" is a sentence which sounds more like a sentance.

Runs on x86 or Power.
Runs "at scale" and based on GPFS.
Single global namespace; files can be globally unique.
Supports POSIX/NFS/HDFS/Openstack. More is coming but is under NDA for the moment. Susan told us and I can't tell you 8).

"I've worked in IBM for 18 years and I didn't expect to be saying OpenStack and IBM in the same sentence."

"I have customers in Africa with unreliable bandwidth and we're working through IP replication working well in those environments."

"We are working on performance monitoring. In my opinion - this is Susan's opinion - many of the newer companies have a vision but not an understanding of handling problems and troubleshooting."

"GPFS can improve performance of Hadoop."

The Elephant on the Mainframe - Using Hadoop to Analyze IBM System z Data

Paul Dimarzio

"Everybody's talking about it, most people don't actually know what it's good for."

Structured data - banking records for example - are the traditional data we're familiar with. It's the gold of the enterprise, where the money is.

But now there's the unstructured data - a gold mine that has to be refined to be useful. It's exploratory and dynamic. You're not asking specific questions because you aren't sure what you're looking for.

There is an overlap where you can use one to enrich the other - your structured data gives you facts without context, and the unstructured data can provide the context.

Transactional sources are the dynamic type being analysed - you can use Hadoop for that but Paul doesn't want you to.

But there's plenty of log data and machine/sensor data on the Z. Paul considers VSAM to fall into the "unstructured" bucket. Paul wants us to leave that on the Z and analyse in-place.

There's also data over in midrange. That should have the insights moved onto the Z, but not the data.

"Google invented Hadoop." Yeah, nah. But I guess "Google hyped map:reduce and Hadoop is an implementation of what people think Google build in-house" is too long for an introductory overview.

Hadoop on Z is most effective into the gigbytes and terabytes. They haven't tested this as much as Paul would like.
Use it for exploring non-traditional data that lives on Z.
Just because Hadoop was designed for commodity disk and CPU doesn't mean you need to leave it there.

Hadoop overview

HDFS: do you really need three replicas of each block? Haven't tested it, but surely DASD is more reliable than that? (Pretty sure performance is also an issue there).
MapReduce is "rocket science". Map partitions input into small chunks for distribution to worker nodes. Reduce combines the answers.

Hadoop on Z

Cloud based - but then you're shifting sensitive data to someone else.
Open source - e.g. Veristorm zDoop.
Open source with extra sauce = Paul's going to talk about InfoSphere BigInsights, which is IBM's dressed-up Hadoop.
- Has visualisation tools, connectors, and various other proprietary add-ons which supposedly make life easier.
- BigSQL is ANSI SQL frontend to Hadoop.
- BigSheets is a spreadsheet interface to Hadoop.
- Eclipse-based IDE.
- Big-R interface to R.
- Social media connectors.
- Enterprise Edition only available on zLinux.
- The GPFS and Adaptive MapReduce aren't supported on Z. GPFS is Coming Soon™.
Populating Hadoop:
- FTP? Just no.
- Streaming libraries: Covetail Co:Z - really good but JCL isn't very Data Scientist friendly.
- Veristorm vStorm Connect: nice and easy to use.
  - RACF integration.
  - SSL streaming with hardware acceleration.
  - GUI.
  - Understands COBOL copybooks.
  - Works with HiperSockets.
  - Now available as an IBM product although the name is a bit of a detritus.
Security is a concern - some people get twitchy about shunting data out to non-zOS systems. The Connect product lets you maintain the security profile on zLinux.

Two modes of deployment:

Off-Z:
- zBX blades aren't a good solution - the IO subsystems are too weak, compared to standalone Intel/Power kit.
- Lose control over the data.
- Great for PB ranges of data.
On zLinux:
- Retain security control.

DB2 can work with Hadoop from DB2 11 and on, allowing you to do joins between your unstructured data (in Hadoop) with your DB2 queries.

Dispatches JSON jobs to BigInsights, returns a reference to a Hadoop result set, which can then be joined with the relational data in DB2.
This can work either with Hadoop on x86, Power, or zLinux.