TRANSCRIPT: What is your chat strategy? Interview with Saleem Bhatti, Professor of Computer Science at St Andrews

Dan Eccleston: Welcome to one of a series of podcasts focusing on transformative technology and financial markets, and in particular chat and messaging platforms. My name is Dan Eccleston, co-founder of Pushpull Technology, a London based solutions provider, and today we're going to focus on data and message security for chat platforms.

Dan Eccleston: I'm delighted to be joined by Professor of Computer Science at St. Andrew's University. whose specific areas of interest include design, use, and performance of computer communication systems, architectures, protocols, and applications, including performance analysis. Welcome, Saleem, and thanks again for joining us.

Saleem Bhatti: Hello, Dan. Great to be here. So you've been involved for many years in research into communication systems, obviously. What have been the big changes you've seen over that time in the ways organizations communicate? What have been the big drivers?

Saleem Bhatti: Well, of course, some of them are fairly obvious, of course, as the technology has improved in terms of network technology and what you have on your desks and what you have in your pockets.

Saleem Bhatti: And so people Are much more multimodal, if you like, in the communication. So long ago, it would have been perhaps written documents, letters and voice over the phone. But now, of course, we have a mixture of voice, video, document exchanges, video that has been used for, not just communication, but for training purposes.

Saleem Bhatti: So a whole bunch of things have happened in communication within organizations, which means there's much more diversity in the kind of modes of communication and still text, Based chat, which is one of the things I know you want to talk about is a major part of that. And of course, we'll come on to say why that is.

Saleem Bhatti: But certainly, as we progress and the technology continues to improve, that kind of multimodal approach will remain. And depending on the technology, you might get some things that are more popular than others at different times over the years.

Dan Eccleston: Yeah, absolutely, I think. In, in our area of financial markets, that most multimodal approach is, that's the key now.

Dan Eccleston: There's so many different channels people are communicating over. Coming, coming on to chat specifically. So I just have to read out these numbers because there was a bit of a shocker. So Microsoft Teams monthly active users 3 million in 2019, 32 million 2020, and by 2023, 300 million. I guess that there's a 2020 to 23.

Dan Eccleston: So, kind of the early part of that is COVID, obviously, but and, and, and work from home. But why do you think business use of chat platforms suddenly took off in the last 4 or 5 years? Was there specifically, was there some underlying technology? technical advancement that opens up the floodgates to make it make it happen.

Dan Eccleston: Was there some like major performance improvement that certainly made it more usable or was it is it just more of a social function of us getting more used to consumer tech? What do you think happened?

Saleem Bhatti: Yeah, I think certainly those years COVID had a lot to do with it of course. If you think about, I mean that is pretty much exponential growth really year on year and Some of that is bound to be due to COVID.

Saleem Bhatti: COVID not only meant that people had to do more stuff online, but the developments in the online systems happened very rapidly. And there are some things which, of course, we don't have the time to go into in detail, but Some of the technology behind how those platforms as a whole are used improved hugely over that time in terms of delivery, in terms of online presence, in terms of them being distributed systems.

Saleem Bhatti: And so the popularity of those platforms as a whole has increased. And even now coming out of COVID, of course, people haven't gone back to the pre COVID normal as it were. There is a bit of a new normal and the technology is much more integrated into more hybrid forms of working and these platforms are more widely used than they used to be.

Saleem Bhatti: And certainly text based systems historically have been around for a long time. So on Unix systems that predate the kind of desktop systems that we have today, chat based mechanisms for communication across the network have been around since the early 90s. Sorry, early 80s, in fact from the early 90s, of course, we had SMS appearing in mobile phones.

Saleem Bhatti: And so people got used to sending short messages. Text based applications are easy to implement from a technical point of view. User interfaces are easy. Key and character entry are things that people are very happy to do. And over the years, an increasing number of people can touch type plus the use of interfaces that do correction, Of text and prediction of text make them much easier to use overall.

Saleem Bhatti: So these platforms through a mixture of technical advancements and the culture of just starting to use the more through various things, for example, Covid mean that they are just more widely used. And of course, even though we have very good connectivity today in terms of networking when networking doesn't work so well, video might not work, audio might be poor quality, but text is pretty much always going to get through barring just a complete blackout of connectivity.

Saleem Bhatti: So it's still very popular.

Dan Eccleston: Yeah, interesting. Of course, there's been a more recent explosion of text based communication that we've seen in financial markets, but at the same time, there's been an explosion of kind of video conferencing, hasn't there, over COVID. And they're probably two very different technical challenges.

Dan Eccleston: I mean, text based presumably is a little bit easier to support in terms of processing power and what's required to get it. Running what what do you think like in Microsoft happen to go from the 3 million to 300 million? Any ideas what challenges they must have seen to to try and get their services to scale up in that sort of way?

Dan Eccleston: Because it's it's huge, isn't it?

Saleem Bhatti: Yeah, it is. And And certainly trying to get the scale and performance built into their systems has been a challenge, not just for them, but for the other companies as well, providing such platforms, for example, zoom. And certainly in the earlier part of COVID, when there was such a huge increase in the use of these platforms.

Saleem Bhatti: We saw problems in network congestion and just in the services, sometimes being overwhelmed. And of course, those organizations, those companies providing those services have had to adapt and change the way they provision, change the way their services are built to make them more scalable and be able to be more adaptive to the different network conditions.

Saleem Bhatti: So things like video encoding capability, And adapting the way that the application works, so changing the video quality and audio quality kind of automagically in the background for users in order to match what's available in the network has been a big part of what has had to be done within those services, so that they become more usable.

Saleem Bhatti: Of course, text doesn't suffer from that as much. The text channel already uses A fairly modest amount of network connectivity has modest requirements for resourcing, and so that has been a fairly constant background channel that has always been there, and of course, the culture of using things like WhatsApp and other instant messaging type of application means that people have been quick to pick that up for work usage.

Saleem Bhatti: away from their normal domestic usage as well. And especially when you have both of them on your own device, it's quite natural to flick between one and the other for home use and for business use.

Dan Eccleston: So that, I mean, that brings you on to really interested in In the use of text based chat platforms, and as it grows we, we, we see in financial markets, we see use of what's up in less regulated markets.

Dan Eccleston: And there's some other platforms out there. Obviously, Bloomberg, Symphony well known ones. They have different levels of security in various platforms. I'm really interested to go back to WhatsApp though. Yeah. Now obviously we saw recently you know, the government was being managed on WhatsApp in the UK, right?

Dan Eccleston: During COVID, right? Or it seemed like it and, and um, with the inquiry going on now, all the WhatsApp messages are coming out. What can you explain just high level what, what the current status in terms of. Kind of intellectual property or data protection, or who really has the rights to those messages.

Dan Eccleston: And, and, and, you know, just just a layman overview to, to, to understand how safe data really is on various platforms. Because it, because it doesn't seem very, very clear to me, but really interest from your perspective.

Saleem Bhatti: Yeah, well, there, there are several issues here. First of all, security and privacy.

Saleem Bhatti: Often they're mentioned together, but they do have distinct properties really, and when you consider them, and then there are the legal issues as in terms of ownership of content of messages in general, but that includes text messages, emails, whatever's used in a business. So first of all, if we think about security.

Saleem Bhatti: You know, security is protection of a system from, say, some sort of external actor or external threat. And so you could argue that if you have an ecosystem, for example, like the Microsoft one, the Google one or the Apple one, there is good security because there's great oversight of the ecosystem as a whole.

Saleem Bhatti: So the things that make up that ecosystem, the various components can be integrated and secured very well. But of course, if you're, Working completely within that ecosystem and all your communication is within that one ecosystem. Potentially, there is a privacy issue because. The people who run that particular ecosystem have access to all that information and can tie communication together.

Saleem Bhatti: They could potentially look at the data, not just what I will call the metadata, not just the information about who's talking to whom and at what times. And for how long, but the actual data itself. Now, I emphasize there's a potential privacy problem there because it's not necessarily that all these companies are always looking at the data all the time, but that tension does exist.

Saleem Bhatti: And I'm sure everyone can do their own web searches to see. the kind of cases that have developed over time and have been publicized about privacy invasions. So you could have a system that's very secure because it's protected from external actors and threats, but nevertheless there is still a privacy problem because of the provision of that system is done in such a way that there's still access to the data and all the information about the communication actually happening, even if you cannot see.

Saleem Bhatti: The data itself. So you don't know what's in an email. You don't know what's in a particular conversation that may be happening in video or audio. You still know when it happened, who was talking to whom, for how long it lasted, where people were. For example, if people are using mobile phones, you have information about their location, both topologically in the network and their geographical location.

Saleem Bhatti: And that can make a huge difference. With respect to privacy, and it could have knock on effects for security also.

Dan Eccleston: Yeah, and that value itself, that metadata could be as valuable as the content data itself.

Saleem Bhatti: In many respects,

Dan Eccleston: because content data can be quite transient. But the the metadata tells you where someone was at a particular time and what they were doing.

Dan Eccleston: That, you know, that could be could be very valuable. What you, you've been involved in kind of developing standards around. security and privacy in the past, right? What, what, what sort of what's government involvement in that? And, um, how are they, how are they, what, what's the view on that tension between having access to our data, but at the same time making sure that as individuals, our data is protected?

Dan Eccleston: What's, what's the current government thinking and kind of, kind of lawmakers?

Saleem Bhatti: Yeah, well, that again, that's a tricky one because of what you mentioned earlier, really about who has ownership of the content, really. So there was a 2012 high court ruling in the UK that the contents of any Email should not generally be considered property.

Saleem Bhatti: So businesses do not have a general claim of ownership over content in staff emails, for example, and the overall ruling you can find online, but I'll read out just a couple of sentences from it. So businesses cannot be said to have an email enforceable proprietary claim to the contents of emails held by staff unless the content can be considered to be confidential information belonging to a business, unless copyright subsists in the content that belongs to a business, or unless that business has a contractual right of ownership of the content.

Saleem Bhatti: Okay, so in some ways that seems fairly clear that if business can show that This belongs to them, and they have a claim on it, a legitimate claim on it. They can have ownership. But of course, you can't know that without looking at the information itself. So you have a complex situation in that in order to know if the information is confidential or not, businesses do have to perhaps look at what that information is.

Saleem Bhatti: Information might be and establish that contractual right of ownership and one way they can do that, of course, is to have a terms and conditions available in a contract for employees where pretty much an employee might have to sign their rights away for any ownership of the content so that a business can keep hold of it and can monitor what's happening, especially in something sensitive like finance, but that can also be true for government or health, for example, as well.

Saleem Bhatti: The issue there could be just in case by accident, especially with people using their own devices for work, that some private communication is accidentally captured. And so it isn't clear cut how you would implement and enforce such a scheme in a way that gives business access to the content that they rightfully have a claim on while maintaining privacy for individual users.

Dan Eccleston: Are there technological advances that are helping with this stuff or it's really driven by trying to understand a model from the, you know, the, the, the governments or, or the, the big, the big tech are trying to work out the right models at the moment still, or is the tech drive, the tech, the technology itself, is that starting to drive it?

Dan Eccleston: What's possible?

Saleem Bhatti: Well, kind of yes and no. Potentially there could be solutions based on technology, but even there, there's a tension and that you see coming out in things like what's been caused the spy clause in uh, you know, the online safety bill that's being pushed through parliament at the moment, or being discussed in the, in the various houses.

Saleem Bhatti: And again, there's the problem there that if you have, for example, end to end encryption for a communication, whether it's text or it's voice, you can't really see what's inside. So you don't know, for example, if a business has claim on it or not. And then if you weaken the encryption so that it can be broken when needed, how do you define when that encryption breaking is needed?

Saleem Bhatti: Who is able to authorize that? and who has access to that decrypted content. So those all become rather sticky problems. And also, if you do have encryption that's end to end and it's weakened in some way, so that can be broken when it's needed to be, how do you ensure that only the people who can break into it when they need to are the ones who do that?

Saleem Bhatti: If it's weakened encryption, you only need Some sort of mistake in the way, for example, that critical information like security keys are handled and then you could have a lot of information that is suddenly available that shouldn't have been.

Dan Eccleston: Securities weakened.

Dan Eccleston: Interesting. So you can't have a conversation about tech at the moment, certainly in our industry without mentioning AI. Chat GPT, obviously pretty cool way to access some very powerful AI services. It seems like, you know, the chat plus the AI was a real case of, you know, two plus two is five. If not, two plus two is ten.

Dan Eccleston: In reality, amazing jump forward into suddenly providing AI services. In terms of accessing such service, particularly if they're kind of, they're out there in the cloud, they're SaaS, they're hosted, you're passing potentially private information backwards and forwards to them, potentially even training up models that are hosted by a third party.

Dan Eccleston: whAt, what do you think the, where do you think that's going to go in terms of the ownership of the data and in terms of protecting, you know, if I, if I'm, if I'm asking you know, chat GPT to do something with some confidential information, obviously it's a third party service that the data and my, my requests are going out to this service.

Dan Eccleston: It potentially could be training a model somewhere up there. I don't really know. It's, it's all a bit of a gray, if not black box at the moment, what's happening in some of these models. WhAt do you think the issues are going to be with those as we use it more and more? Do you think that's going to be clear cut or that's going to be a bit murky for the time being?

Saleem Bhatti: Oh, it's definitely going to be murky. And it is right now. It's very murky indeed. It's, you know, it's really difficult. There are already problems with the systems that are out there. So these are all based on large language models. OK, so LLM. And all of these rely on this large corpus, this large body of data that's being used to train these systems.

Saleem Bhatti: And there are already some issues and tensions as to where this training data came from. Is it protected by copyright? And that's for the data that's already out there in use. But of course, as these systems are used, potentially there's more data being input to the systems, which is the input from the users, whether it's text based or whether it's voice.

Saleem Bhatti: All of this is new input. And who has rights, who has ownership, again, very murky. In the general space of AI, again, as you know, at this point in time, there are worldwide conversations happening, which include governments, as well as non governmental organizations in discussing things like explainable AI, responsible AI, so to be able to understand what these systems are doing, how they work, which data they use, what they're doing with it.

Saleem Bhatti: So, for the time being, it is really quite messy as to what such things as privacy, security, ownership are like in AI systems that are based on large language models as they continue to evolve, continue to take input from their users. Certainly, text based systems are very tractable to analysis. In fact, more so than audio, image, and video, although there are systems that are being developed that can work on image, audio, and video as well.

Saleem Bhatti: But certainly, text based systems are so easily processed. within computer systems. Those are why things like chat GPT and, you know, in, in other technical realms, things such as co pilot are so widely used and are having such great success because they are so much a, a low hanging fruit in this area with so much data to train them on already available.

Dan Eccleston: You've done a fair amount of work on both on performance and also energy consumption, right? In the past and looking at ways to really reduce energy or carbon footprint and what, what, what, what is your impression of what's happening with what's happening behind the scenes with this? Kind of kind of massive increase in usage.

Dan Eccleston: I mean, it's not it's not just happened overnight. Right? But it it's it's certainly accelerating exponentially and is in behind the scenes is does that mean that the, you know. The energy consumption carbon footprint is going through the roof behind the scenes, and we should all be like, um, a little bit socially responsible around that sort of thing and and try and use it less or more efficiently or anything like that.

Dan Eccleston: Sometimes I feel like the, I'll send a few, you know, um, requests off to something like chat GPT. I think the amount of processing must be going on behind there. It's absolutely phenomenal. anD, you know, where's it all, where's it all happening exactly and what, what, what damage or costs that feel like there's probably, you know, for getting for a moment, the, you know, the, the, the energy side of it, the actual cost.

Dan Eccleston: I don't think the end users are fully appreciative or bearing any of the cost yet because a lot of this free, right? So, you know, there's a huge amount of processing going on. How does that, how's that all going to pan out? Are we, are we going to start getting hit with massive costs and is there a cost ultimately to the environment we should all be thinking about if we're that way inclined?

Saleem Bhatti: Yeah, well, no, I mean, it's a great question and nobody really knows what the energy hit of using these systems are. Certainly you can see that a lot of energy is being used because you can see just from articles in the popular technical press. about data centers moving to more energy efficient hardware that is still able to perform these very large scale complex computations but does use Much less energy and that development is mostly at the hardware level at the moment.

Saleem Bhatti: So different types of processes that are being used instead of the general purpose processes that you might get in general compute clusters and complete. Compute clouds um, more hardware that is tailored to be able to run these various models. So tailored for specific algorithms that need to run in order to do the kind of processing that's required for, for example, large language models or for image recognition and so forth.

Saleem Bhatti: And so as the hardware improves and becomes more tailored, certainly there will be a lower energy usage. And of course, part of that motivation is for the providers of these services to reduce their energy costs as well as to have a lower carbon footprint. But certainly end users aren't aware of that.

Saleem Bhatti: You don't really get any feedback on that when you use these systems. If you send A prompt to chat GPT and it executes a task for you and comes back with the result for you. You're happy with the result, but there's no feedback there to say, well, use this amount of energy, for example, and that was enough for you to, you know, cook your dinner this evening or whatever.

Saleem Bhatti: But certainly something like that will be necessary, at least to make people aware about what they're using in terms of resources, energy resources. And without that. people aren't going to be able to, even if they want to, behave in a way that tries to reduce their energy usage, or at least makes them conscious of the kind of energy usage that they have when they use these systems with this huge, you know, compute power behind them.

Saleem Bhatti: So, a lot needs to happen with, for example, the user interface, the way that user interact with these systems in order to make them aware, and even if they are aware, how much will they care, unless there is some incentive for them to care, if you see what I mean. You know, Is there going to be, as you said, a direct cost to them related to not just using the service in terms of the queries they send, but also the energy usage of those queries.

Saleem Bhatti: Again, hard to know, hard to know how that will be actually monitored, how energy usage will be determined, how that could be presented to a user, how that be worked out in terms of tariffs. So a lot of research needs to be done really. In how that will work out, especially if you're talking about end to end services where you're not only using, of course, the resources on whatever cloud platform you happen to be using, but also the resources in the intervening network and all the other things in between your own ISP, et cetera.

Saleem Bhatti: Again, a really nice, big, messy research problem to one. There's no particular answer at the moment, but certainly a good question to ask.

Dan Eccleston: Yeah, I, I, I read the. Quite a cool. It's not, it's not going to, it's not going to solve the problem in itself, but quite a nice bit of thinking a company had done somewhere in the UK where they'd taken a big data center and they'd used it to basically heat a swimming pool.

Dan Eccleston: So, you know, the data center was heating the pool and the pool was cooling down the data center. So, you know, there was a nice little someone thought about it, you know, let's do this in a nice way. So I think there'll be more of that. Going on the future, hopefully, but yeah, sometimes cost isn't enough to make people fully aware of what they're using and the resources they're using.

Dan Eccleston: It's probably the best way to make people aware. But sometimes there's other ways of you know, yeah, interesting. So, so, um, just to round off, it's been really interesting just thinking about the future. So, you know, we're, we're, we're of a, we're of a certain age where our, our parents sent letters and then maybe we sent emails and our kids as they go to into the workplace, they're starting to use chat messaging and you know, more electronic.

Dan Eccleston: So communication, um, once you. Where do you think the what do you think their kids are going to be doing? I reckon if I knew that Yeah,

Saleem Bhatti: if I knew that, I'm not going to tell anybody and make a lot of it. That's what I'm going to do. But, but really, I, I think actually text based messaging and text like channels are going to be around for quite a while.

Saleem Bhatti: I mean, very recently, there has been a. Use of much more rich media in what used to be traditionally text based channels. So the rich communication services, for example, that are now supported on all the major, for example, smartphone platforms that allow you to send not just text, but augment that with images.

Saleem Bhatti: Audio, video, etc. So I think those text based channels are going to remain, but text based systems are just so easy to deal with, right? You can use them very easily. They are, as I've already said, low on their requirements for resources. You have lots of different interfaces on lots of different devices to make use of them easily.

Saleem Bhatti: There's a culture established about how text based systems can be used. As you've said, as different generations get to grips with the technology. They take up these different text based channels and make them their own. And certainly my kids are much more at home with messaging than they are with things like email.

Saleem Bhatti: Just as we perhaps of our generation were more at home with email than having phone conversations or sending letters. So, but through all of that, whether you're looking at letters or email or chat messages, text is still there. Text is still going to be really important. And also for things like accessibility, speech to text and text to speech are fairly mature technologies.

Saleem Bhatti: So again, anything in text format also helps with accessibility also. And so I can't see that text channel disappearing anytime soon. If anything, it's going to be augmented. The platforms we've mentioned so far, things like Teams, Zoom, etc. Although they do lots of other things, they always have a text channel somewhere in there.

Saleem Bhatti: And also even things like, you know, online games have a side channel that's text based. Thanks. And then you have other text only channels for everything from technical conversations, for software development teams, for example, slack, all the way to just casual messaging that people do on things like WhatsApp and so forth.

Saleem Bhatti: So those things are just not going to go away, I believe.

Dan Eccleston: Yeah. Chat, chat platforms are, are here to stay for the foreseeable at least. Aren't they? I'm sure indeed. Salim. Thanks very much. Been absolute pleasure as always. Take care. It's a pleasure for me too. Thank you, Dan. Yeah, thanks very much.

Dan Eccleston: Are we just going to carry on and then meet with them?