Web Application Enumeration

Overview

Web Application Enumeration focuses on identifying technologies, frameworks, hidden content, and potential vulnerabilities in web applications. This phase builds upon subdomain discovery to analyze the actual web services and applications running on discovered hosts.

Key Objectives:

  • Identify web technologies and frameworks

  • Discover hidden directories and files

  • Enumerate parameters and API endpoints

  • Analyze security headers and configurations

  • Identify CMS-specific vulnerabilities

  • Discover virtual hosts and applications


Technology Stack Identification

whatweb - Command Line Technology Detection

# Basic scan
whatweb https://example.com

# Aggressive scan with all plugins
whatweb -a 3 https://example.com

# Output to JSON format
whatweb --log-json=results.json https://example.com

# Scan multiple URLs from file
whatweb -i urls.txt

# Scan with specific user agent
whatweb --user-agent "Mozilla/5.0..." https://example.com

Wappalyzer (Browser Extension)

  • Automatically identifies technologies on visited pages

  • Shows: CMS, frameworks, libraries, servers, databases

  • Real-time analysis during browsing

BuiltWith - Web Technology Profiler

Netcraft - Web Security Services

Nikto - Web Server Scanner

Nmap HTTP Scripts for Technology Detection

Manual Header Analysis


Directory & File Enumeration

Gobuster - Directory Brute Forcing

ffuf - Fast Web Fuzzer

dirb - Recursive Directory Scanner


Virtual Host Discovery

Understanding Virtual Hosts

Virtual hosting allows web servers to host multiple websites or applications on a single server by leveraging the HTTP Host header. This is crucial for discovering hidden applications and services that might not be publicly listed in DNS.

How Virtual Hosts Work

Key Concepts:

  • Subdomains: Extensions of main domain (e.g., blog.example.com) with DNS records

  • Virtual Hosts (VHosts): Server configurations that can host multiple sites on same IP

  • Host Header: HTTP header that tells the server which website is being requested

Process Flow:

  1. Browser Request: Sends HTTP request to server IP with Host header

  2. Host Header: Contains domain name (e.g., Host: www.example.com)

  3. Server Processing: Web server examines Host header and consults virtual host config

  4. Content Serving: Server serves appropriate content based on matched virtual host

Types of Virtual Hosting

Type
Description
Advantages
Disadvantages

Name-Based

Uses HTTP Host header to distinguish sites

Cost-effective, flexible, no multiple IPs needed

Requires Host header support, SSL/TLS limitations

IP-Based

Assigns unique IP to each website

Protocol independent, better isolation

Expensive, requires multiple IPs

Port-Based

Different ports for different websites

Useful when IPs limited

Not user-friendly, requires port in URL

Example Apache Configuration

Key Point: Even without DNS records, virtual hosts can be accessed by modifying local /etc/hosts file or fuzzing Host headers directly.


gobuster - Virtual Host Enumeration

gobuster is highly effective for virtual host discovery with its dedicated vhost mode:

Basic gobuster vhost Usage

Important gobuster Flags

gobuster vhost Example Output

ffuf - Fast Virtual Host Fuzzing

ffuf provides flexible and fast virtual host discovery with powerful filtering:

Basic ffuf Virtual Host Discovery

Advanced ffuf Filtering

feroxbuster - Rust-Based Virtual Host Discovery


Virtual Host Discovery Strategies

1. Preparation Phase

2. Initial Discovery

3. Filtering Setup

4. Comprehensive Enumeration

Manual Virtual Host Testing

Local Testing with /etc/hosts


HTB Academy Lab Examples

Lab: Virtual Host Discovery

Analysis Process


Security Considerations

Detection Avoidance

Traffic Analysis

  • Virtual host discovery generates significant HTTP traffic

  • Monitor for IDS/WAF detection

  • Use proper authorization before testing

  • Document all discovered virtual hosts

False Positive Management


Defensive Measures

Server Hardening

Monitoring


Parameter Discovery

ffuf Parameter Fuzzing

Arjun - Parameter Discovery Tool

paramspider - Parameter Mining


API Enumeration

Common API Endpoints

API Fuzzing with ffuf

GraphQL Enumeration


Web Crawling & Spidering

Professional Tools:

  • Burp Suite Spider - Active crawler for web application mapping and vulnerability discovery

  • OWASP ZAP - Free, open-source web application security scanner with spider component

  • Scrapy - Versatile Python framework for building custom web crawlers

  • Apache Nutch - Highly extensible and scalable open-source web crawler

ReconSpider - HTB Academy Custom Spider

ReconSpider Results Analysis

ReconSpider saves data in results.json with the following structure:

JSON Key Analysis:

Key
Description
Security Relevance

emails

Email addresses found on domain

User enumeration, social engineering

links

URLs of links within domain

Site mapping, hidden pages

external_files

External files (PDFs, docs)

Information disclosure

js_files

JavaScript files

Endpoint discovery, sensitive data

form_fields

Form fields discovered

Parameter discovery, injection points

images

Image URLs

Metadata extraction

videos

Video URLs

Content analysis

audio

Audio file URLs

Content analysis

comments

HTML comments

Information disclosure

ReconSpider Data Mining

hakrawler - Fast Web Crawler

wget Recursive Download

Burp Suite Spider

OWASP ZAP Spider

Scrapy Custom Spider

Ethical Crawling Practices

Critical Guidelines

  1. Always obtain permission before crawling a website

  2. Respect robots.txt and website terms of service

  3. Be mindful of server resources - avoid excessive requests

  4. Implement delays between requests to prevent server overload

  5. Use appropriate scope - don't crawl beyond authorized targets

  6. Monitor impact - watch for 429 (rate limit) responses

Responsible Crawling Configuration

  • Penetration Testing Authorization - Ensure proper scope documentation

  • Rate Limiting Compliance - Don't bypass intentional restrictions

  • Data Protection - Handle discovered data responsibly

  • Service Availability - Don't impact legitimate users

  • Disclosure - Report findings through proper channels


Search Engine Discovery (OSINT)

Overview

Search Engine Discovery, also known as OSINT (Open Source Intelligence) gathering, leverages search engines as powerful reconnaissance tools to uncover information about target websites, organizations, and individuals. This technique uses specialized search operators to extract data that may not be readily visible on websites.

Why Search Engine Discovery Matters:

  • Open Source - Information is publicly accessible, making it legal and ethical

  • Breadth of Information - Search engines index vast portions of the web

  • Ease of Use - User-friendly and requires no specialized technical skills

  • Cost-Effective - Free and readily available resource for information gathering

Applications:

  • Security Assessment - Identifying vulnerabilities, exposed data, and potential attack vectors

  • Competitive Intelligence - Gathering information about competitors' products and services

  • Threat Intelligence - Identifying emerging threats and tracking malicious actors

  • Investigative Research - Uncovering hidden connections and financial transactions

Search Operators

Search operators are specialized commands that unlock precise control over search results, allowing you to pinpoint specific types of information.

Operator
Description
Example
Use Case

site:

Limits results to specific website/domain

site:example.com

Find all publicly accessible pages

inurl:

Finds pages with specific term in URL

inurl:login

Search for login pages

filetype:

Searches for files of particular type

filetype:pdf

Find downloadable PDF documents

intitle:

Finds pages with specific term in title

intitle:"confidential report"

Look for confidential documents

intext:

Searches for term within body text

intext:"password reset"

Identify password reset pages

cache:

Displays cached version of webpage

cache:example.com

View previous content

link:

Finds pages linking to specific webpage

link:example.com

Identify websites linking to target

related:

Finds websites related to specific webpage

related:example.com

Discover similar websites

info:

Provides summary information about webpage

info:example.com

Get basic details about target

define:

Provides definitions of word/phrase

define:phishing

Get definitions from various sources

numrange:

Searches for numbers within specific range

site:example.com numrange:1000-2000

Find pages with numbers in range

allintext:

Finds pages containing all specified words in body

allintext:admin password reset

Search for multiple terms in body

allinurl:

Finds pages containing all specified words in URL

allinurl:admin panel

Look for multiple terms in URL

allintitle:

Finds pages containing all specified words in title

allintitle:confidential report 2023

Search for multiple terms in title

Advanced Search Operators

Operator
Description
Example
Use Case

AND

Requires all terms to be present

site:example.com AND (inurl:admin OR inurl:login)

Find admin or login pages

OR

Includes pages with any of the terms

"linux" OR "ubuntu" OR "debian"

Search for any Linux distribution

NOT

Excludes results containing specified term

site:bank.com NOT inurl:login

Exclude login pages

*

Wildcard - represents any character/word

site:company.com filetype:pdf user* manual

Find user manuals (user guide, etc.)

..

Range search for numerical values

site:ecommerce.com "price" 100..500

Products priced between 100-500

" "

Searches for exact phrases

"information security policy"

Find exact phrase matches

-

Excludes terms from search results

site:news.com -inurl:sports

Exclude sports content

Google Dorking Examples

Finding Login Pages

Identifying Exposed Files

Uncovering Configuration Files

Locating Database Backups

Finding Sensitive Information

Directory Listings

Error Pages and Debug Information

Specialized Google Dorks

WordPress-Specific Dorks

Database-Specific Dorks

Version Control Systems

OSINT Tools and Resources

Google Hacking Database

Automated Google Dorking Tools

Search Engine Alternatives

Bing Search Operators

Practical OSINT Workflow

Phase 1: Initial Discovery

Phase 2: Deep Enumeration

Phase 3: Vulnerability Discovery

Phase 4: Intelligence Analysis

Best Practices

  1. Stay within legal boundaries - Only search publicly indexed information

  2. Respect robots.txt - Understand website crawling policies

  3. Avoid automation abuse - Don't overload search engines with requests

  4. Document findings responsibly - Handle discovered information ethically

  5. Report vulnerabilities - Follow responsible disclosure practices

Limitations

  • Not all information is indexed - Some data may be hidden or protected

  • Information may be outdated - Search engine caches may not reflect current state

  • False positives - Search results may include irrelevant information

  • Rate limiting - Search engines may limit query frequency


Web Archives (Wayback Machine)

Overview

Web Archives provide access to historical snapshots of websites, allowing reconnaissance professionals to explore how websites appeared and functioned in the past. The Internet Archive's Wayback Machine is the most prominent web archive, containing billions of web pages captured since 1996.

What is the Wayback Machine? The Wayback Machine is a digital archive of the World Wide Web operated by the Internet Archive, a non-profit organization. It allows users to "go back in time" and view snapshots of websites as they appeared at various points in their history.

How the Wayback Machine Works

The Wayback Machine operates through a three-step process:

  1. Crawling - Automated web crawlers browse the internet systematically, following links and downloading webpage copies

  2. Archiving - Downloaded webpages and resources are stored with specific timestamps, creating historical snapshots

  3. Accessing - Users can view archived snapshots through the web interface by entering URLs and selecting dates

Archive Frequency:

  • Popular websites: Multiple captures per day

  • Regular websites: Weekly or monthly captures

  • Less popular sites: Few snapshots over years

  • Factors: Website popularity, update frequency, available resources

Why Web Archives Matter for Reconnaissance

Critical Applications:

  1. Uncovering Hidden Assets - Discover old pages, directories, files, or subdomains no longer accessible

  2. Vulnerability Discovery - Find exposed sensitive information or security flaws from past versions

  3. Change Tracking - Observe website evolution, technology changes, and structural modifications

  4. Intelligence Gathering - Extract historical OSINT about target's activities, employees, strategies

  5. Stealthy Reconnaissance - Passive activity that doesn't interact with target infrastructure

Wayback Machine Usage

Basic Web Interface

URL Format Structure

Advanced Wayback Machine Techniques

Subdomain Discovery

Directory and File Discovery

Technology Evolution Tracking

Automated Wayback Machine Tools

waybackurls - URL Extraction

gau (GetAllURLs)

Wayback Machine Downloader

Historical Intelligence Gathering

Employee and Contact Discovery

Technology Stack Evolution

Sensitive Information Discovery

Manual Investigation Techniques

Timeline Analysis

Content Comparison

HTB Academy Lab Examples

Lab 6: Wayback Machine Investigation

Practical Investigation Workflow

Alternative Web Archives

Archive.today

Common Crawl

Library and Government Archives

Limitations and Considerations

Technical Limitations

  1. Not all content archived - Dynamic content, JavaScript-heavy sites may not work

  2. Incomplete captures - Some resources (images, CSS) may be missing

  3. No interaction - Forms, logins, and dynamic features don't work

  4. robots.txt respect - Some content excluded by website owners

  5. Legal restrictions - Some content removed due to legal requests

Investigation Challenges

  1. Content authenticity - Verify information with other sources

  2. Timestamp accuracy - Archive dates may not reflect actual publication dates

  3. Context missing - Surrounding events and circumstances

  4. Selective preservation - Popular sites better archived than obscure ones

Best Practices

  1. Respect copyright - Archived content still subject to intellectual property laws

  2. Privacy considerations - Personal information in archives should be handled responsibly

  3. Purpose limitation - Use archived data only for legitimate security research

  4. Disclosure responsibility - Report significant findings through proper channels

  5. Documentation - Maintain records of research methodology and sources


JavaScript Analysis

LinkFinder - Extract Endpoints from JS

JSFScan.sh - JavaScript File Scanner

Manual JavaScript Analysis


CMS-Specific Enumeration

WordPress

Joomla

Drupal


Security Headers Analysis

Security Headers Check

SSL/TLS Analysis


HTTP Methods Testing

Method Enumeration


robots.txt and Sitemap Analysis

robots.txt Enumeration

Sitemap Discovery


WAF Detection and Bypass

WAF Detection

Basic WAF Bypass Techniques


HTB Academy Lab Examples

Lab 1: Fingerprinting inlanefreight.com

WAF Detection with wafw00f

Comprehensive Scanning with Nikto

Technology Stack Analysis

Lab 2: Virtual Host Discovery

Lab 3: Directory Discovery

Lab 4: ReconSpider Web Crawling

ReconSpider Results Analysis

Lab 5: Search Engine Discovery (OSINT)

OSINT Intelligence Analysis


Automated Reconnaissance Frameworks

Overview

While manual reconnaissance can be effective, it can also be time-consuming and prone to human error. Automating web reconnaissance tasks significantly enhances efficiency and accuracy, allowing you to gather information at scale and identify potential vulnerabilities more rapidly.

Why Automate Reconnaissance?

Key Advantages:

  • Efficiency - Automated tools perform repetitive tasks much faster than humans

  • Scalability - Scale reconnaissance efforts across large numbers of targets

  • Consistency - Follow predefined rules ensuring reproducible results

  • Comprehensive Coverage - Perform wide range of tasks: DNS, subdomains, crawling, port scanning

  • Integration - Easy integration with other tools creating seamless workflows

Reconnaissance Frameworks

FinalRecon - All-in-One Python Framework

FinalRecon Features:

  • Header Information - Server details, technologies, security misconfigurations

  • Whois Lookup - Domain registration details, registrant information

  • SSL Certificate Information - Certificate validity, issuer, security details

  • Web Crawler - HTML/CSS/JavaScript analysis, internal/external links

  • DNS Enumeration - 40+ DNS record types including DMARC

  • Subdomain Enumeration - Multiple sources (crt.sh, AnubisDB, ThreatMiner, etc.)

  • Directory Enumeration - Custom wordlists and file extensions

  • Wayback Machine - URLs from last 5 years

  • Port Scanning - Fast port enumeration

FinalRecon Command Options

Option
Argument
Description

--url

URL

Specify target URL

--headers

-

Retrieve header information

--sslinfo

-

Get SSL certificate information

--whois

-

Perform Whois lookup

--crawl

-

Crawl target website

--dns

-

Perform DNS enumeration

--sub

-

Enumerate subdomains

--dir

-

Search for directories

--wayback

-

Retrieve Wayback URLs

--ps

-

Fast port scan

--full

-

Full reconnaissance scan

FinalRecon Advanced Options

Option
Default
Description

-dt

30

Number of threads for directory enum

-pt

50

Number of threads for port scan

-T

30.0

Request timeout

-w

dirb_common.txt

Path to wordlist

-r

False

Allow redirect

-s

True

Toggle SSL verification

-d

1.1.1.1

Custom DNS servers

-e

-

File extensions (txt,xml,php)

-o

txt

Export format

-k

-

Add API key (shodan@key)

FinalRecon Practical Examples

Other Reconnaissance Frameworks

Recon-ng - Modular Framework

Recon-ng Features:

  • Modular Structure - Various modules for different tasks

  • Database Integration - Store and manage reconnaissance data

  • API Integration - Multiple third-party services

  • Report Generation - HTML, XML, CSV output formats

  • Extensible - Custom module development

theHarvester - OSINT Data Gathering

theHarvester Features:

  • Email Address Discovery - Multiple search engines and sources

  • Subdomain Enumeration - Various databases and APIs

  • Employee Name Discovery - Social media and public records

  • Host Discovery - Active and passive techniques

  • Port Scanning - Basic port enumeration

  • Banner Grabbing - Service identification

SpiderFoot - OSINT Automation

SpiderFoot Features:

  • 100+ Modules - Comprehensive data source integration

  • Web Interface - User-friendly dashboard

  • API Support - RESTful API for automation

  • Real-time Analysis - Live data correlation

  • Threat Intelligence - Malware, blacklist checking

  • Social Media - Profile and relationship discovery

OSINT Framework - Tool Collection

Automation Workflow Design

Phase 1: Initial Reconnaissance

Phase 2: Deep Enumeration

Phase 3: Data Analysis

Custom Automation Scripts

Bash Automation Example

Python Automation Example

Tool Integration Strategies

API-Based Integration

Output Standardization

Best Practices for Automation

Performance Optimization

  1. Parallel Execution - Run multiple tools simultaneously

  2. Rate Limiting - Respect target server resources

  3. Caching - Store results to avoid duplicate work

  4. Threading - Use appropriate thread counts

  5. Resource Management - Monitor CPU and memory usage

Error Handling

  1. Graceful Failures - Continue execution if one tool fails

  2. Retry Logic - Implement retry mechanisms for network issues

  3. Logging - Comprehensive logging for debugging

  4. Validation - Verify tool outputs and results

  5. Backup Plans - Alternative tools for critical functions

Security Considerations

  1. API Key Management - Secure storage of credentials

  2. Network Isolation - Run in controlled environments

  3. Output Sanitization - Clean and validate results

  4. Access Controls - Restrict tool usage and access

  5. Audit Trails - Maintain records of automation activities

HTB Academy Lab Examples

Lab 7: FinalRecon Automation

Automation Workflow Example


Security Assessment

Vulnerability Indicators

  1. Exposed admin interfaces - /admin, /wp-admin, /administrator

  2. Default credentials - admin:admin, admin:password

  3. Information disclosure - Error messages, debug information

  4. Weak authentication - No rate limiting, weak passwords

  5. Missing security headers - XSS protection, CSRF tokens

  6. Outdated software - Old CMS versions, known vulnerabilities

Common Misconfigurations

  1. Directory listing enabled - Apache/Nginx misconfiguration

  2. Backup files accessible - .bak, .old, .backup files

  3. Source code exposure - .git directories, .svn folders

  4. Configuration files - .env, config.php, web.config

  5. Temporary files - Editors' backup files (~, .swp)


Defensive Measures

Web Application Hardening

  1. Remove server banners - Hide version information

  2. Implement security headers - CSP, HSTS, X-Frame-Options

  3. Disable directory listing - Prevent folder browsing

  4. Remove default files - Default pages, documentation

  5. Secure configuration - Error handling, debug modes off

Monitoring and Detection

  1. WAF implementation - Block malicious requests

  2. Access logging - Monitor enumeration attempts

  3. Rate limiting - Prevent brute force attacks

  4. Anomaly detection - Unusual request patterns

  5. Regular security assessments - Automated vulnerability scanning


Tools Summary

Tool
Purpose
Best Use Case

whatweb

Technology detection

Initial reconnaissance

nikto

Web server scanning

Comprehensive security assessment

builtwith

Technology profiling

Detailed technology stack analysis

netcraft

Web security services

Security posture assessment

gobuster

Directory/file discovery

Finding hidden content

ffuf

Web fuzzing

Parameter/vhost discovery

wpscan

WordPress security

CMS-specific testing

burp suite

Web application testing

Manual analysis

arjun

Parameter discovery

Finding hidden parameters

wafw00f

WAF detection

Security control identification

reconspider

Custom web crawling

HTB Academy reconnaissance

hakrawler

Web crawling

Content discovery

burp spider

Professional crawling

Web application mapping

owasp zap

Security scanning

Vulnerability discovery

scrapy

Custom crawling

Python framework

google dorking

OSINT reconnaissance

Search engine discovery

pagodo

Automated dorking

Google hacking database

wayback machine

Web archives

Historical website analysis

waybackurls

Archive URL extraction

Historical endpoint discovery

gau

URL aggregation

Multiple source URL collection

finalrecon

Automated framework

All-in-one Python reconnaissance

recon-ng

Modular framework

Database-driven reconnaissance

theharvester

OSINT gathering

Email, subdomain, employee discovery

spiderfoot

OSINT automation

100+ module automation platform

linkfinder

JavaScript analysis

Endpoint extraction


Key Takeaways

  1. Technology identification guides subsequent testing approaches

  2. Directory enumeration reveals hidden functionality and files

  3. Parameter discovery uncovers additional attack surface

  4. Web crawling provides comprehensive content discovery

  5. Search engine discovery exposes publicly indexed sensitive information

  6. Web archives reveal historical assets and vulnerabilities

  7. JavaScript analysis exposes client-side vulnerabilities

  8. Virtual hosts may contain additional applications

  9. Security headers indicate the security posture

  10. CMS enumeration requires specialized tools and techniques

  11. WAF detection is crucial for bypass strategy

  12. API enumeration focuses on modern application architectures

  13. OSINT techniques reveal organizational intelligence

  14. Automated frameworks significantly enhance reconnaissance efficiency

  15. Comprehensive methodology combines multiple tools and techniques


References

  • HTB Academy: Information Gathering - Web Edition

  • OWASP Web Security Testing Guide

  • SecLists: https://github.com/danielmiessler/SecLists

  • Burp Suite Documentation

  • FFUF Documentation: https://github.com/ffuf/ffuf

  • Google Hacking Database: https://www.exploit-db.com/google-hacking-database

  • Pagodo: https://github.com/opsdisk/pagodo

  • ReconSpider: https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip

  • Wayback Machine: https://web.archive.org/

  • waybackurls: https://github.com/tomnomnom/waybackurls

  • gau (GetAllURLs): https://github.com/lc/gau

  • Wayback Machine Downloader: https://github.com/hartator/wayback-machine-downloader

  • FinalRecon: https://github.com/thewhiteh4t/FinalRecon

  • Recon-ng: https://github.com/lanmaster53/recon-ng

  • theHarvester: https://github.com/laramies/theHarvester

  • SpiderFoot: https://github.com/smicallef/spiderfoot

  • OSINT Framework: https://osintframework.com/

Last updated