# Firecrawl + SearXNG on Dokploy A minimal repo for deploying [Firecrawl](https://github.com/firecrawl/firecrawl) with [SearXNG](https://github.com/searxng/searxng) on [Dokploy](https://dokploy.com) — a fully self-hosted web search and content extraction API for AI tooling. ## What this does This gives you a single API endpoint that can: - **Search the web** (`/v1/search`) — find pages matching a query, powered by SearXNG aggregating results from Google, Bing, DuckDuckGo, and others - **Scrape a page** (`/v1/scrape`) — fetch any URL and get clean markdown or structured JSON, with full JavaScript rendering via Playwright - **Crawl a site** (`/v1/crawl`) — traverse an entire website and extract content from every page - **Map a site** (`/v1/map`) — discover all URLs on a domain without scraping them The search endpoint is what ties it all together for AI use: Firecrawl sends your query to SearXNG, gets back relevant URLs, then scrapes and cleans the top results — all in one API call. ## Why this repo exists Firecrawl's official repo is a large monorepo that assumes you're building from source. SearXNG has its own separate docker-compose setup. To deploy both on Dokploy, you need compose files that: 1. Use pre-built images instead of `build:` directives 2. Join the `dokploy-network` for Traefik routing 3. Include Traefik labels for automatic HTTPS 4. Use Docker named volumes for persistence 5. Avoid explicit `container_name` declarations (breaks Dokploy logging) 6. Wire SearXNG into Firecrawl via internal Docker networking Rather than forking both repos, this repo contains only the compose file, a SearXNG settings file, and this README. When either project publishes new images, you redeploy — no merge conflicts, no carrying source code you don't touch. ## Architecture ``` ┌──────────────────────────┐ Internet ──► Traefik ──► Firecrawl API (:31329) │ │ │ │ ▼ ▼ │ Playwright Redis │ (:3000) (:6379) │ │ │ ▼ ▼ │ SearXNG PostgreSQL │ (:8080) (:5432) │ │ │ ▼ ▼ │ Google / … RabbitMQ │ (:5672) └──────────────────────────┘ ``` All services communicate over an internal Docker network. Only the Firecrawl API is exposed to the internet via Traefik. SearXNG is internal-only by default (you can optionally expose it via its own domain). ## Services | Service | Image | Purpose | Resources | |---|---|---|---| | `api` | `ghcr.io/firecrawl/firecrawl` | Main API + workers | 4 CPU / 8 GB | | `playwright-service` | `ghcr.io/firecrawl/playwright-service` | Headless browser for JS pages | 2 CPU / 4 GB | | `searxng` | `searxng/searxng` | Metasearch engine | minimal | | `postgres` | `ghcr.io/firecrawl/nuq-postgres` | NUQ job queue store (pg_cron + nuq schema) | minimal | | `rabbitmq` | `rabbitmq:3-management` | NUQ message broker | minimal | | `redis` | `redis:alpine` | Rate limiting, cache | minimal | ## Deploying on Dokploy ### 1. Prerequisites - A Dokploy instance - A DNS A record pointing your subdomain (e.g. `firecrawl.yourdomain.com`) to your server ### 2. Create the service 1. In Dokploy, create a new **Compose** service (type: Docker Compose) 2. Connect this GitHub repo as the source 3. Set the **Compose Path** to `./docker-compose.yml` 4. Set the **branch** to `main` ### 3. Configure environment variables Run `bin/prod_secrets` on your local machine to generate secrets via gopass, then paste the output into Dokploy's environment variable editor. Or set them manually: ```env # Required — domain Traefik routes to the Firecrawl API FIRECRAWL_DOMAIN=firecrawl.yourdomain.com # Recommended — protect your API with a key TEST_API_KEY=fc-your-secret-key # Recommended — change the Bull queue dashboard admin key BULL_AUTH_KEY=something-secure # Recommended — set a strong PostgreSQL password POSTGRES_PASSWORD=something-secure # SearXNG CSRF secret — auto-injected at container start SEARXNG_SECRET=something-random ``` ### 4. Deploy Hit deploy. Dokploy pulls the images, creates the containers, and Traefik generates SSL certificates. Give it ~60 seconds for all health checks to pass (PostgreSQL and RabbitMQ start first, then the API). Your API is now live at `https://firecrawl.yourdomain.com`. ### 6. Test it ```bash # Search the web (SearXNG → Firecrawl scrape → clean markdown) curl -X POST https://firecrawl.yourdomain.com/v1/search \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-your-secret-key' \ -d '{"query": "what is firecrawl", "limit": 5}' # Scrape a single page curl -X POST https://firecrawl.yourdomain.com/v1/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-your-secret-key' \ -d '{"url": "https://example.com"}' ``` Or use `bin/test https://firecrawl.yourdomain.com` to run the full test suite. ## Local development ```bash # First-time setup: generate dev secrets in gopass bin/dev_secrets # Start all services (creates dokploy-network if missing) bin/up # Run tests against local stack bin/test # Stop bin/down # Full cleanup (volumes, orphans) bin/clean # Also remove cached images bin/clean --images ``` ## SearXNG configuration The `searxng/settings.yml` file in this repo is pre-configured for API use: - **JSON format enabled** — required for Firecrawl to consume results (disabled by default in SearXNG) - **Rate limiter disabled** — since SearXNG is only accessed internally by Firecrawl, not by the public internet - **Engines enabled** — Google, Bing, DuckDuckGo, Wikipedia, GitHub You can customize the engines, categories, and other settings by editing `searxng/settings.yml` and redeploying. See the [SearXNG documentation](https://docs.searxng.org/admin/settings/index.html) for all available options. The `secret_key` is handled automatically via the `SEARXNG_SECRET` environment variable — the Docker entrypoint injects it at container start. No need to edit this file for secrets. ## Optional: exposing SearXNG publicly By default SearXNG is only accessible internally. If you also want a public search UI, uncomment the Traefik labels on the `searxng` service in `docker-compose.yml` and add `SEARXNG_DOMAIN` to your env vars. ## Optional: AI extraction features Firecrawl can use an LLM for structured data extraction via its `/extract` endpoint. Add one of these to your env vars: ```env # OpenAI OPENAI_API_KEY=sk-your-key # Or a local Ollama instance accessible from the server OLLAMA_BASE_URL=http://host.docker.internal:11434/api MODEL_NAME=deepseek-r1:7b ``` ## Resource requirements The default limits match Firecrawl's official recommendations. Total: roughly 6 CPU cores and 12 GB RAM. For light personal use you can lower these — edit the `cpus` and `mem_limit` values in `docker-compose.yml`. The API runs fine on 2 cores / 4 GB and Playwright on 1 core / 2 GB, but expect slower scraping on JS-heavy sites. PostgreSQL, RabbitMQ, SearXNG, and Redis are lightweight and don't need explicit resource limits. ## Updating Redeploy from Dokploy to pull the latest images. The compose file uses `latest` tags by default. To pin versions for stability, replace image tags with specific releases (e.g. `ghcr.io/firecrawl/firecrawl:v1.x.x`). Check Firecrawl's [releases page](https://github.com/firecrawl/firecrawl/releases) for available tags. Data persists in named volumes (`postgres-data`, `rabbitmq-data`, `redis-data`, `searxng-cache`) across redeployments. ## Troubleshooting **`/search` returns empty or errors:** SearXNG might not be ready yet, or search engines might be rate-limiting your server's IP. Check SearXNG logs in Dokploy. Try changing the enabled engines in `searxng/settings.yml`. **Playwright timeouts on JS-heavy sites:** The default 4 GB memory limit for Playwright might not be enough. Increase `mem_limit` on the `playwright-service`. **403 errors from SearXNG:** Make sure `formats: - json` is present in `searxng/settings.yml`. Without it, SearXNG blocks non-HTML requests. **Redis connection refused:** Check that the Redis container is healthy in Dokploy's dashboard. The API won't start until Redis passes its healthcheck. **PostgreSQL or RabbitMQ not ready:** The API depends on both passing their health checks before starting. Check the Deployments tab in Dokploy for health check status. PostgreSQL needs ~30 seconds on first boot to initialize the NUQ schema. ## File structure ``` . ├── docker-compose.yml # All six services, Dokploy-ready ├── searxng/ │ └── settings.yml # SearXNG config (JSON API enabled, limiter off) ├── bin/ │ ├── prod_secrets # Generate production env vars via gopass │ ├── dev_secrets # Generate dev secrets via gopass │ ├── up # Start local dev stack │ ├── down # Stop local dev stack │ ├── clean # Remove containers, volumes, images │ └── test # Test a running deployment └── README.md ```