Generative AI from an enterprise architecture strategy perspective

Eyal Lantzman, Global Head of Architecture, AI/ML at JPMorgan, gave this presentation at the London Generative AI Summit in November 2023.

I’ve been at JPMorgan for five years, mostly doing AI, ML, and architecture. My background is in cloud infrastructure, engineering, and platform SaaS. I normally support AI/ML development, tooling processes, and use cases.

Some interesting observations have come about from machine learning and deep learning. Foundation models and large language models are providing different new opportunities and ways for regulated enterprises to rethink how they enable those things.

So, let’s get into it.

How is machine learning done? 

You have a data set and you traditionally have CPUs, although you can use GPUs as well. You run through a process and you end up with a model. You end up with something where you can pass relatively simple inputs like a row in a database or a set of features, and you get relatively simple outputs back. 

We evolved roughly 20 years ago towards deep learning and have been on that journey since. You pass more data, you use GPUs, and there are some different technology changes. But what it allows you to do is pass complex inputs rather than simple ones. 

Essentially, the deep learning models have some feature engineering components built in. Instead of sending the samples and petals and length and width for the Iris model and figuring out how to extract that from an image, you just send an image and it extracts those out automatically. 

Governance and compliance in generative AI models

Foundation models are quite interesting because first of all, you effectively have two layers, where there’s a provider of that foundation model that uses a very large data set, and you can add as many variants as you want there. They’re GPU-based and there’s quite a bit of complexity, time, and money, but the outcome is that you can pass in complex inputs and get complex outputs. 

So that’s one difference. The other is that quite a lot of them are multimodal, which means you can reuse them across different use cases, in addition to being able to take what you get out of the box for some of them and retrain or fine-tune them with a smaller data set. Again, you run additional training cycles and then get the fine-tuned models out. 

Now, the interesting observation is that the first layer on top is where you get the foundation model. You might have heard statements like, “Our data scientist team likes to fine-tune models.” Yes, but that initial layer makes it available already for engineers to use GenAI, and that’s the shift that’s upon us. 

It essentially moves from processes, tools, and things associated with the model development lifecycle to software because the model is an API. 

But how do you govern that? 

Different regulated enterprises have their own processes to govern models versus software. How do you do that with this paradigm? 

That’s where things need to start shifting within the enterprise. That understanding needs to feed into the control assurance functions and control procedures, depending on what your organization calls those things. 

This is in addition to the fact that most vendors today will have some GenAI in them. That essentially introduces another risk. If a regulatory company deals with a third-party vendor and that vendor happens to start using ChatGPT or LLM, if the firm wasn’t aware of that, it might not be part of their compliance profile so they need to uplift their third-party oversight processes. 

They also need to be able to work contractually with those vendors to make sure that if they’re using a large language model or more general AI, they mustn’t use the firm’s data.

AWS and all the hyperscalers have those opt-out options, and this is one of those things that large enterprises check first. However, being able to think through those and introducing them into the standard procurement processes or engineering processes will become more tricky because everyone needs to understand what AI is and how it impacts the overall lifecycle of software in general. 

Balancing fine-tuning and control in AI models

In an enterprise setting, to be able to use an OpenAI type of model, you need to make sure it’s protected. And, if you plan to send data to that model, you need to make sure that data is governed because you can have different regulations about where the data can be processed, and stored, and where it comes from that you might not be aware of. 

Some countries have restrictions about where you can process the data, so if you happen to provision your LLM endpoint in US-1 or US central, you possibly can’t even use it. 

So, being aware of those kinds of circumstances can require some sort of business logic.

Even if you do fine-tuning, you need some instructions to articulate the model to aim towards a certain goal. Or even if you fine-tune the hell out of it, you still need some kind of guardrails to evaluate the sensible outcomes. 

There’s some kind of orchestration around the model itself, but the interesting point here is that this model isn’t the actual deployment, it’s a component. And that’s how thinking about it will help with some of the problems raised in how you deal with vendor-increasing prices. 

What’s a component? It’s exactly like any software component. It’s a dependency you need to track from a performance perspective, cost perspective, API perspective, etc. It’s the same as you do with any dependency. It’s a component of your system. If you don’t like it, figure out how to replace it.

Now I’ll talk a bit about architecture.

Challenges and strategies for cross-cloud integration

What are those design principles? This is after analyzing different vendors, and this is my view of the world. 

Treating them as a SaaS provider and as a SaaS pattern increases the control over how we deal with that because you essentially componentize it. If you have an interface, you can track it as any type of dependency from performance cost, but also from an integration with your system. 

So if you’re running on AWS and you’re calling Azure OpenAPI endpoints, you’ll probably be paying for networking across cloud cost and you’ll have latency to pay for, so you need to take all of those things into account. 

So, having that as an endpoint brings those dimensions front and center into that engineering world where engineering can help the rest of the data science teams.

We touched on content moderation, how it’s required, and how different vendors implement it, but it’s not consistent. They probably deal with content moderation from an ethical perspective, which might be different from an enterprise ethical perspective, which has specific language and nuances that the enterprise tries to protect. 

So how do you do that? 

That’s been another consideration where I think what the vendors are doing is great, but there are multiple layers of defense. There’s defense in depth, and you need ways to deal with some of that risk 

To be able to effectively evaluate your total cost of ownership or the value proposition of a specific model, you need to be able to evaluate those different models. This is where if you’re working in a modern development environment, you might be working on AWS or Azure, but when you’re evaluating the model, you might be triggering a model in GCP. 

Being able to have those cross-cloud considerations that start getting into the network and authentication, authorization, and all those things can become extremely important to design for and articulate in the overall architecture and strategy when dealing with those models. 

Historically, there were different attempts to wrap stuff up. As cloud providers, service providers, and users of those technologies, we don’t like that because you lose the value of the ecosystem and the SDKs that those provide. 

To be able to solve those problems whilst using the native APIs, the native SDK is essential because everything runs extremely fast when it comes to AI and there’s a tonne of innovation, and as soon as you start wrapping stuff then, you’re already out in terms of dealing with that problem and it’s already pointless.

How do we think about security and governance in depth?  

If we start from the center, you have a model provider, and this is where the provider can be an open source one where you go through due diligence to get it into the company and do the scanning or adverse testing. It could be one that you develop, or it could be a third-party vendor such as Anthropic.

They do have some responsibility for encryption and transit, but you need to make sure that that provider is part of your process of getting into the firm. You can articulate that that provider’s dealing with generative AI, that provider will potentially be sent classified data, that provider needs to make sure that they’re not misusing that data for training, etc. 

You also have the orchestrator that you need to develop, where you need to think about how to prevent prompt injection in the same way that other applications deal with SQL injection and cross-site scripting. 

So, how do you prevent that? 

Those are the challenges you need to consider and solve. 

As you go to your endpoints, it’s about how you do content moderation of the overall request and response and then also deal with multi-stage, trying to jailbreak it through multiple attempts. This involves identifying the client, identifying who’s authenticated, articulating cross-sessions or maybe multiple sessions, and then being able to address the latency requirements. 

You don’t want to kill the model’s performance by doing that, so you might have an asynchronous process that goes and analyzes the risk associated with a particular request, and then it kills it for the next time around. 

Being able to understand the impact of the controls on your specific use case in terms of latency, performance, and cost is extremely important, and it’s part of your ROI. But it’s a general problem that requires thinking about how to solve it. 

You’re in the process of getting a model or building a model and creating the orchestrated testing as an endpoint and developer experience lifecycle loops, etc. 

But your model development cycle might have the same kind of complexity when it comes to data because if you’re in an enterprise that got to the level of maturity that they can train with real data, rather than a locked up, synthesized database, you can articulate the business need and you have approval taxes, classified data for model training, and for the appropriate people to see it.

This is where your actual environment has quite a lot of controls associated with dealing with that data. It needs to implement different data-specific compliance. 

So, when you train a model, you need to train this particular region, whether it’s Indonesia or Luxembourg, or somewhere else. 

Being able to kind of think about all those different dimensions is extremely important. 

As you go through whatever the process is to deploy the application to production, again, you have the same data-specific requirements. There might be more because you’re talking about production applications. It’s about impacting the business rather than just processing data, so it might be even worse. 

Then, it goes to the standard engineering, integration, testing, load testing, and chaos testing, because it’s your dependency. It’s another dependency of the application that you need to deal with. 

And if it doesn’t scale well because there’s not enough computing in your central for open AI, then this is one of your decision points when you need to think about how this would work in the real world. How would that work when I need that capacity? Being able to have that process go through all of those layers as fast as possible is extremely important. 

Threats are an ongoing area of analysis. There’s a recent example from a couple of weeks ago from apps about LLM-specific threats. You’ll see similar threats to any kind of application such as excessive agency or or denial of service. You also have ML-related risks like data poisoning and prompt object injection which are more specific to large language models.

This is one way you can communicate with your controller assurance or cyber group about those different risks, how you mitigate them, and compartmentalize all the different pieces. Whether this is a third party or something you developed, being able to articulate the associated risk will allow you to deal with that more maturely. 

Going towards development, instead of talking about risk, I’m going to talk about how I compartmentalize, how to create the required components, how to think about all those things, and how to design those systems. It’s essentially a recipe book. 

Now, the starting assumption is that as someone who’s coming from a regulated environment, you need some kind of well-defined workspace that provides security. 

The starting point for your training is to figure out your identity and access management. Who’s approved to access it? What environment? What data? Where’s the region? 

Once you’ve got in, what are your cloud-native environment and credentials?  What do they have access to? Is it access to all the resources? Specific resources within the region or across regions? It’s about being able to articulate all of those things.

When you’re doing your interactive environment, you’ll be interacting with the repository, but you also may end up using a foundation model or RAG resource to pull into your context or be able to train jobs based on whatever the model is. 

If all of them are on the same cloud provider, it’s simple. You have one or more SDLC pipelines and you go and deploy that. You can go through an SDLC process to certify and do your risk assessment, etc. But what happens if you have all of those spread across different clouds?

Essentially, you need additional pieces to ensure that you’re not connecting from a PCI environment to a non-PCI environment. 

Having an identity broker that can have additional conditions about access that can enforce those controls is extremely important. This is because given the complexity in the regulatory space, there are more and more controls and it becomes more and more complex, so pushing a lot of those controls into systems that can reason about them and enforce that is extremely important. 

This is where you start thinking about LLM and GenAI from a use case perspective to just an enterprise architecture pattern. You’re essentially saying, “How do we deal with that once and for all?” 

Well, we identify this component that deals with identity-related problems, we articulate the different data access patterns, we identify different environments, we catalog them, and we tag them. And based on the control requirements, this is how you start dealing with those problems. 

And then from that perspective, you end up with fully automated controls that you can employ, whether it’s AWS Jupyter or your VS code running in Azure to talk to Bedrock or Azure OpenAI endpoint. They’re very much the same from an architecture perspective. 

Innovative approaches to content moderation

Now, I mentioned a bit about the content moderation piece, and this is where I suggested that the vendors might have things, but: 

  1. They’re not really consistent. 
  2. They don’t necessarily understand the enterprise language. 

What is PI within a specific enterprise? Maybe it’s some specific kind of user ID that identifies the user, but it’s a letter and a number that LLM might never know about. But you should be careful about exposing it to others, etc. 

Being able to have that level of control is important because it’s always about control when it comes to risk.

When it comes to supporting more than one provider, this is essentially where we need to standardize a lot of those things and essentially be able to say, “This provider, based on our assessment of that provider can deal with data up to x, y, and z, and in regions one to five because they don’t support deployment in Indonesia or Luxembourg.” 

Being able to articulate that part of your onboarding process of that provider is extremely important because you don’t need to think about it for every use case, it’s just part of your metamodel or the information model associated with your assets within the enterprise. 

That content moderation layer can be another type of AI. It can be as simple as a kill switch that’s based on regular expression, but the opportunity is there to make something learn and adjust over time. But from a pure cyber perspective, it’s a kill switch that you need to think about, how you kill something in a point solution based on the specific type of prompt. 

For those familiar with AWS, there’s an AWS Gateway Load Bouncer. It’s extremely useful when we’re coming to those patterns because it allows you to have the controls as a sidecar rather than part of your application, so the data scientist or the application team can focus on your orchestrator. 

You can have another team that can specialize in security that creates that other container effectively and deploys and manages that as a separate lifecycle. This is also good from a biased perspective because you could have one team that’s focused on making or breaking the model, versus the application team that tries to create or extract the value out of it.

From a production perspective, this is very similar because in the previous step, you created a whole bunch of code, and the whole purpose of that code was becoming that container, one or more depending on how you think about that. 

That’s where I suggest that content moderation is yet another type of kind of container that sits outside of the application and allows you to have that separate control over the type of content moderation. Maybe it’s something that forces a request-response, maybe it’s more asynchronous and kicks it off based on the session. 

You can have multiple profiles of those content moderation systems and apply them based on the specific risk and associated model. 

Identity broker is the same pattern. This is extremely important because if you’re developing and testing in such environments, you want your code to be very similar to how it progresses. 

In the way you got your credentials, you probably want some configuration in which you can point to a similar setup in your production environment to inject those tokens into your workload. 

This is where you probably don’t have your fine-tuning in production, but you still have data access that you need to support. 

So, having different types of identities that are associated with your flow and being able to interact, whether it’s the same cloud or multi-cloud, depending on your business use case, ROI, latency, etc., will be extremely important. 

But this allows you to have that framework of thinking, Is this box a different identity boundary? Yes or no? If it’s a no, then it’s simple. It’s an IM policy in AWS as an example, versus a different cloud, how do you federate access to it? How do you rotate credential secrets?

Conclusion

To summarise, you have two types of flows. In standard ML, you have the MDLC flow where you go and train a model and containerize it. 

In GenAI, you have the MDLC only when you fine-tune. If you didn’t fine-tune, you run through pure SDLC flow. It’s a container that you just containerized that you test and do all those different steps. 

You don’t have the actual data scientists necessarily involved in that process. That’s the opportunity but also the change that you need to introduce to the enterprise thinking and the cyber maturity associated with that. 

Think through how engineers, who traditionally don’t really have access to production data, will be able to test those things in real life. Create all sorts of interesting discussions about the environments where you can do secure development with data versus standard developer environments with mock data or synthesized data.