Another #TWIMLcon short with the wonderful Rosie Pongracz and Trisha Mahoney, from a Founding sponsor who you all know, IBM. Rosie is the World Wide Director of Technical Go-to-Market and Evangelism for Data Science and Trisha is a Senior Tech Evangelist. We chat about the latest IBM research, projects, and products, including AI Fairness 360, which will be the focus of Tricia’s session at TWIMLcon. The IBM booth also promises to bring the heat, with a variety of open source projects and resources for the data science community. See you there!
Sam Charrington: [00:00:00] All right everyone. I've got Rosie Pongracz and Trisha Mahoney from IBM on. Rosie is the Worldwide Director of Technical go-to-market and Evangelism for Data Science and AI and Trisha is a Senior Tech Evangelist in Machine Learning & AI and they are both instrumental in IBM's support for the TWIMLcon: AI Platforms conference. Rosie and Trisha, it's so exciting to be able to talk to you.
Rosie Pongracz: [00:00:27] We are excited to be here, Sam! So happy to be a supporter of TWIMLcon and all the great work you do.
Trisha Mahoney: [00:00:33] Thanks for having us, Sam.
Sam Charrington: [00:00:34] Absolutely. Thank you. So, I don't know if it makes sense to say who is IBM? [laughs] You know in this context, I think most people who hear this know what IBM is but, you know, maybe you can talk a little bit about the company's involvement in the AI Platform space and, you know, why you're, you know what really kind of created the interest in supporting this, this conference.
Rosie Pongracz: [00:01:00] Absolutely. So, yes, I would imagine most of the listeners already know IBM. We are long-standing, I'd say, evangelist, product producer, supporters of open source, anything for AI. And I'd say most of the current recognition goes back to Watson, of course, and the Jeopardy challenge.
But from that, IBM has evolved...what was that, almost ten years ago, to create some significant products. Not only have we made our way to the cloud should I say, and supports hybrid clouds for our clients and bringing them through the digital transformation, but we also have a good range of, of tools that help people not only do data science and machine learning but also scaled those, operationalized those, and bring them to production. I think if anything, IBM is known for its expertise in enterprise-scale and wide range of industry solutions. And that's really what we're doing. We're involved in open source. So quite a few open-source projects that are AI and data science and ML related, as well as products that can help our clients bring that AI to their business.
Sam Charrington: [00:02:16] Awesome. And I know that I've covered some of those products in our recent e-books in the platform space. Both the Fabric for Deep Learning open source project, which I talked about in our Kubernetes for ML and DL e-book, as well as the Watson Studio products which I believe came up in the ML Platforms e-book. Are there other products that IBM is kind of focused on in this space?
Rosie Pongracz: [00:02:43] I think you captured the main ones. Especially the ones my team has been involved in. There's Watson Studio, Watson Machine Learning, Watson Open Scale. And if you look at Studio, it's more or less it's an IDE of sorts for data scientists, built on Jupiter Notebooks. ML, uh Watson ML is for running those machine learning algorithms. And then Watson Open Scale is for at scale.
And actually one of the big pieces of that pipeline, if you look at all those pieces along the pipeline or the platform, if you will is one of the areas that Trisha's going to be talking about which is the AI fairness and bias, which is a really important piece of the pipeline that we're proud to be incorporating.
I think you caught all the products. There's a significant amount of open-source that we're also involved in and, like I said, bringing those into our products and also supporting those communities like the Jupiter community, like the Linux Foundation, AI. Those are also very important projects and places where IBM has been involved as well.
Sam Charrington: [00:03:53] That's right. We recently did a podcast with Luciano Resende, who is at IBM and works on the Jupiter Enterprise Hub project, I believe is the name of it?
Rosie Pongracz: [00:04:03] Yup. Jupiter Enterprise Gateway is correct. Yes.
Sam Charrington: [00:04:05] Got it. Jupiter Enterprise Gateway.
Rosie Pongracz: [00:04:07] Yeah.
Sam Charrington: [00:04:08] So in addition to all of the products and open-source that you're working on in this space, you're also out there evangelizing the whole idea of ML Ops. You ran a workshop on this topic at the OzCon Conference recently. Maybe talk a little bit about your perspective on ML Ops and why that's so interesting to you.
Rosie Pongracz: [00:04:29] Yeah. I think it goes back to where, where IBM can really make a difference is that we have, we'll have literally hundreds of years, decades, of experience in helping our enterprise clients do things at scale. And that is across industry. So if you look at all of the products that we have and you also look at something like cloud pak for data, which is bringing those containerized applications to any cloud, really, it is about giving our clients flexibility, helping them modernize. It's helping do things at scale.
Now a lot of our clients also have businesses that they're trying to transform so when you talk about ML Ops, certainly, you look at data science, I kind of look at that akin to a desktop where a developer works on. It's great to be able to develop those algorithms on your desktop and test that out on data sets, but when you really want to implement it, you're talking there's a whole kind of dev-ops cycle, if you will, applying that to AI and then machine learning.
And IBM has been there with its clients in the early days of Java. It's been there in the early days of cloud. And we're also taking that now into kind of the next realm if you will, the next era of bringing AI to businesses at scale. So how do you take your current applications and embed AI in those? Or how are you creating new ways to use your data and to modernize your business? And IBM you know, it's just near and dear to our client's heart. It's near and dear to who we are as a company in being able to do things at scale. And you have to have a platform. You have to have a way to operationalize that. It's great to run little science experiments to try things out and test things and fail fast, but when you start to operationalize, that's where the ML at scale, ML Ops, is really going to start to be important.
Sam Charrington: [00:06:25] Mm-hmm [affirmative]. I was at the last IBM Think Conference, which is its big user conference and had an opportunity to hear Rob Thomas talk about, you know, one of the key things that he sees as being a determinant of enterprises finding success in machine learning and AI is the number of experiments that they're able to run and being able to scale that so that they can run those experiments en masse.
Rosie Pongracz: [00:06:51] Yeah absolutely. That's an important piece of what IBM is helping enable our clients to do. And with our products that is definitely what we're striving for. You've got to be able to experiment. And then when you do want to operationalize, you got to be able to do that at scale.
Some of the clients we work with have some of the biggest applications running for their enterprise for their customers. And they depend on IBM to do that. So how do we bring that into, you know, this experimentation mode? Because you're absolutely right. Now it's not, you know, much more in...it's not about, you know, building one app and then releasing that. It's, as you know, the world is very much agile, you've got to fail fast. You've got to experiment. You've got to understand.
And with data science, that is absolutely sort of the MO. That's sort of the way you operate; is how do you, how do you know what works? And then if, when you...you know, you also have to retrain. So there's a lot of differences to building AI and building data science in a [inaudible] scale that is slightly different than just building applications if you will.
Sam Charrington: [00:07:55] Mm-hmm [affirmative]. Mm-hmm [affirmative].
So, Trisha, you're going be speaking at the conference. Tell us a little bit about your topic and what attendees can expect when they come to your session.
Trisha Mahoney: [00:08:06] Right. So, I'm going to be speaking on AI Fairness 360. And this is a comprehensive toolkit created by IBM researchers. And what we focus on is detecting and understanding and mitigating unwanted machine learning bias. So the toolkit is open source. It's in Python and it contains over 75 fairness metrics, ten bias mitigation algorithms, and fairness metrics with explanations. So one of the key components to this is that it has some of the most cutting edge metrics and algorithms across academia and industry today. So it's not just an IBM thing, it includes the algorithms from researchers from Google, Stanford, Cornell. That's just a few.
But what it really focuses on is teaching people how to learn to measure bias in their data sets and models. And how to apply fairness algorithms throughout the pipeline. So you know the big focus is on data science leaders, practitioners, and also legal and ethic stakeholders who would be a part of this.
So, just a few things that I'll go through in the talk is when you would apply pre-processing algorithms to manipulate your training data, in- processing algorithms, for incorporating fairness into your training algorithms itself, as well as post-processing, de-biasing algorithms. And, you know, one of the key things we wanted to get across is, I'm working on an O'Reilly book on AI fairness and bias with our researchers. So, you know, the key thing is that you know, this is a problem we think may prevent AI from reaching its full potential if we can't remove bias.
So, the thing we want to get across is that this is a long data science initiative. If you want to remove bias throughout your pipeline, so it involves a lot of stakeholders in your company, and that it can be very complex. So the way you define fairness and bias leads down into the types of metrics and algorithms you use. So, you know there are a lot of complexities. And the hope is that data science teams need to work with people throughout their org; they can't really make these decisions on their own, as they may actually break the law in some cases with their algorithms.
So, you know, I'll go into in the short period of time, kind of some of the trade-offs that data science teams have to make between model accuracy and removing bias, and talk about what they do for acceptable thresholds for each.
And the last thing on the ML Ops piece is I'll also do a demo in Watson Open Scale. And this is where you, you know, have models in production and you need to detect and remove bias from models, you know that are, aren't in an experimentation environment, right? So within Watson Open Scale, you can automatically detect fairness, issues at run time. And we essentially just do this by comparing the difference between rates at which different groups receive the same outcomes.
So are different minority groups, or men or women being approved for loans at the same time. So that's just an example. So that's kind of the top things that I'll go through on the toolkit and, I've heard many people say that others do bias talks on the problem that we have. But AI Fairness 360 is one of the few that's bringing a solution to the table on how to fix this within the machine learning pipeline.
Sam Charrington: [00:11:29] Yeah, I think that's one of the most exciting things about the talk from our perspective is that it's not just talking about the challenges that exist, but also how to integrate a concrete toolkit into your pipeline. And whether it's Fairness 360 or something else, but how to, integrate tools into your pipeline so that you can detect and mitigate bias, just very concretely as opposed to talking about it abstractly.
Trisha Mahoney: [00:11:58] Correct. And I think the bridge that this creates is, you know, there are a lot of new fairness research techniques out there, but this toolkit sort of gets them into production and accessible in a way that data scientists can use. So, I think this is considered the most comprehensive toolkit to do that on the market today.
Sam Charrington: [00:12:18] Mm-hmm [affirmative]. So Rosie in addition to Trisha's session, you'll also be exhibiting at the conference in our community hall. What can attendees expect to see at the IBM booth there?
Rosie Pongracz: [00:12:30] Yeah, we're excited to be there too. So you'll see several things. We are going to be talking about the relevant open source projects like AI Fairness 360 that Trisha mentioned and also AI Explainability 360, which is another new toolkit. And we have, actually, a whole host of, projects that I won't go into here, but we can talk through those and see where IBM is contributed and working on open source projects like the Jupiter Enterprise Gateway that you mentioned as well.
They'll also see our, our products, and how those work together in helping operationalize and bring AI platforms to reality. And we'll also be talking about our data science community, which is a place that not only can product users go and share and collaborate, but also we have some great technical solution type content on there, with the goal of that being that IBM has a lot of deep rich solutions that we're building. As I mentioned earlier, industry-specific, or transformation type of projects and those are the types of materials that we're building there.
We've heard many people, both academic and industry, say it's great to talk about all this theoretical AI and what we'd really like to see is how are people putting that to work and solutions. So that's something that we're trying to bring to life on the community with many of [our] IBM experts all across any of our implementation folks, to our research folks.
Sam Charrington: [00:14:01] Fantastic. Fantastic. Well, I'm really looking forward to seeing both of you at the event. And I am very gracious for your and IBM's support of the conference.
Rosie Pongracz: [00:14:14] We are really excited to support what you're doing, Sam. I know you and I have worked together for many years through some technology transitions, so this is really appropriate and fun and fitting that we get to work together on something as exciting as what you're doing at TWIMLcon.
Sam Charrington: [00:14:29] Absolutely. Thank you both.
Rosie Pongracz: [00:14:31] Thank you.
TWIMLcon: AI Platforms will be held on October 1st and 2nd at the Mission Bay Conference Center in San Francisco. Click here to learn more