Securing Generative AI

One of areas I have spent a lot of time researching over the past 5 months is in building cybersecurity machine learning models for a personal project, when using uncensored models it is a completly different ball game on what is achievable.

Which brings me to the topic of security of AI models, which is a engineering challenge. Having spent some time looking at strategies and tools that empower security engineers to advance innovation, I have been also looking at how we keep generative AI models secure.

We begin right at the start with AI generated code:

Testing for Insecure Coding Practices

Security engineers face the challenge of assessing insecure coding practices generated by language models (LLMs). This involves measuring how often an LLM suggests risky security weaknesses in both autocomplete and instruction contexts. To ensure secure code is not just generated but usable, a Bilingual Evaluation Understudy (BLEU) score can be utilized, evaluating the quality of text generated by the LLM.

Insecure Code Detector Tool

At the heart of testing lies the Insecure Code Detector tool, a vital component in detecting insecure coding practices. Security engineers can use this tool for test case generation and to identify insecure coding practices in LLM-generated code. It’s an engineering solution designed to pinpoint vulnerabilities within the AI model.

Ensuring Compliance with Security Requests

Attack Compliance Testing

You can also employ attack compliance tests to assess an LLM’s response to potential malicious workflows. These tests align with the industry-standard MITRE ATT&CK taxonomy of cyber attacks, evaluating the model’s ability to comply with security requests. It’s akin to configuring the security parameters of the AI model to withstand potential cybersecurity challenges.


Enter CyberSecEval, the helm steering security engineers through the cybersecurity challenges of generative AI. This evaluation suite enhances security engineers’ ability to generate secure and usable code, reducing the risk of introducing insecure code into production environments. It serves as a strategic tool in the security engineering toolkit for navigating the complexities of AI security.

Llama Guard

Llama Guard emerges as an input-output safeguard model tailored for Human-AI conversation use cases. It incorporates a safety risk taxonomy for prompt and response classification. Llama Guard, a Llama2-7b model, is instruction-tuned on a carefully curated dataset, showcasing robust performance on industry benchmarks. It functions as a language model, engaging in multi-class classification and generating binary decision scores. Its instruction fine-tuning allows for customisation of tasks, making it a versatile tool in the security engineering arsenal.

Llama Guard Model Weights Availability

To empower the security engineering community, different Llama Guard model weights available depending on the compute available to you. I had to upgrade to a very expensive MacBook Pro M3 Max with 128Gb of RAM to get the best out of the larger models and also plan for the future use in AI security research. I would encourage everyone with cybersecurity experience and an interest in AI to contribute to the Llama Guard development, ensuring it meets the evolving needs of the community for AI safety.

Looking Ahead

Looking ahead, I think even 12 months is optimistic to try and predict the path for AI, but I envision a new category of security engineers pioneering a new skill set in the cyber security toolbox. The development of self-learning algorithms resilient to emerging cybersecurity threats.

I recommend getting started now with your own research and learning if you haven’t already, it may take some investment in new equipment to get the best out of current models, but I have seen a raspberry pi5 used, but it was painfully slow, it worked, but it’s not going to make it a useable experience for your research unless you are extremly paitent.

Exploring Security Engineering Strategies

So let’s explore strategies for security engineers navigating generative AI.

Holistic Security Assessments, I recommend you look at conducting thorough assessments of generative AI models, identifying potential security vulnerabilities. Utilize tools like Insecure Code Detector to pinpoint insecure coding practices.

Compliance Mapping, align generative AI models with the MITRE ATT&CK taxonomy for comprehensive security compliance. Test the model’s response to potential attacker tactics, techniques, and procedures.

Continuous Security Iterations,implement a continuous testing and iteration cycle for AI models. Regularly update security parameters based on evolving cybersecurity landscapes.

Leverage Llama Guard’s instruction fine-tuning for tailored security tasks. Explore customization options to align with specific security use cases.

Collaboration Across Security Teams, foster collaboration between security engineering teams and AI developers, at my current client we have a really strong security champion community. We learn from each other and the security champions take learning back into their business areas. If you haven’t already, where have you been, establish secure coding practices and integrate them into the AI development lifecycle.

Dynamic Threat Modeling, develop threat models specific to generative AI models. Identify potential threats and vulnerabilities in the AI lifecycle, adapting security measures accordingly.

Integrate CyberSecEval into security workflows for comprehensive evaluations. Utilize its insights to enhance the security posture of generative AI models.

If you are a security engineer, you are the navigator of generative AI security. By adopting robust engineering practices, iterating on security measures, and leveraging tools like CyberSecEval and Llama Guard, you can engineer a future where generative AI thrives in a secure and trusted environment in your organisation.

Robert Burns

“In the cyberspace’s echoing halls, where bits and bytes dance their clandestine reels, let every algorithm be a noble guard, and encryption be the tartan that shields our digital ideals.” - { to commemorate the celebration of the famous Scottish poet, which us Scots celebrate every January 25th. }

Purple Llama:

Common Weakness Enumaration industry-standard coding practice taxonomy:


Cybersecurity Benchmarks: