Fine-tuning Qwen for Java Vulnerability Detection
ai
The Goal
Security vulnerabilities in code are a massive problem. The goal was to fine-tune a large language model to automatically detect potential security issues in Java code before they make it to production.
The Dataset
I curated a dataset of Java code snippets with labeled vulnerabilities including:
- SQL Injection vulnerabilities
- Cross-Site Scripting (XSS) issues
- Insecure deserialization
- Path traversal vulnerabilities
- Authentication bypasses
Fine-tuning Process
I used the Qwen 2.5 7B model as the base and fine-tuned it using LoRA:
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05
)
Results
After training for 3 epochs on an A100 GPU:
- Accuracy: 87% on test data
- Precision: 84%
- Recall: 89%
- F1 Score: 86.4%
Key Takeaways
The model works best when given context around the vulnerable code, not just isolated snippets. Context is everything in security analysis.