Jan 2025

Fine-tuning Qwen for Java Vulnerability Detection

ai

The Goal

Security vulnerabilities in code are a massive problem. The goal was to fine-tune a large language model to automatically detect potential security issues in Java code before they make it to production.

The Dataset

I curated a dataset of Java code snippets with labeled vulnerabilities including:

Fine-tuning Process

I used the Qwen 2.5 7B model as the base and fine-tuned it using LoRA:

from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05
)

Results

After training for 3 epochs on an A100 GPU:

Key Takeaways

The model works best when given context around the vulnerable code, not just isolated snippets. Context is everything in security analysis.