What is a soft 404?
A “soft 404” occurs when a web server returns a successful HTTP status code (typically 200) for a page that doesn’t actually exist or contains error content. Common examples include:- Custom “Page not found” pages that return 200 instead of 404
- Error pages styled to match the site design
- Placeholder pages with generic content
- Pages that redirect to a homepage or parent directory
How it works
Our AI classifier analyzes HTTP responses to distinguish between:- Legitimate pages: Real content that should be reported as findings
- Soft 404 pages: Error pages disguised as valid responses
Response comparison
Response comparison
Compares page content against known 404 response patterns for the target site.
Content analysis
Content analysis
Analyzes page content for common error indicators and patterns.
Machine learning classification
Machine learning classification
Uses trained models to classify ambiguous responses.
Usage in Website Scanner
The soft 404 detector is integrated into the Website Scanner.Enabled tests
The classifier runs automatically when the following tests are enabled in the Initial Tests section:| Test | Generated finding |
|---|---|
| Find admin consoles | Administration consoles found |
| Find sensitive files | Sensitive files found |
| Find interesting files | Interesting files found |
| Search for information disclosure | Server information disclosure |
| Software identification | Server software identified |
The soft 404 detector is enabled by default for these tests.
How it improves results
Without soft 404 detection, these tests might report hundreds of false positives, pages that appear to exist but are actually custom error pages. The ML classifier filters these out, so you only see legitimate discoveries.Usage in URL Fuzzer
The soft 404 detector is also integrated into the URL Fuzzer.How it works
When fuzzing for hidden files and directories, the URL Fuzzer sends many requests that will return error pages. The ML classifier:- Analyzes each response
- Identifies soft 404 patterns
- Filters out false positives from the results
The URL Fuzzer doesn’t generate findings directly, but its results are cleaned by the ML classifier to show only legitimate discoveries.
AI data handling
- Proprietary models: The soft 404 detector uses our own self-hosted classification models
- Secure infrastructure: Data is processed within our isolated infrastructure
- No external training: Your data is not used to train any AI models
- Predictive, not generative: These models classify data rather than generate content, eliminating “hallucination” risks