diff options
| author | Adam Malczewski <[email protected]> | 2026-05-23 03:08:53 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-05-23 03:08:53 +0900 |
| commit | f3120565f44bdabab73bf1c83e71f9998f629efc (patch) | |
| tree | d33983afb05982863abb24165f0f859963a40129 | |
| parent | 60680e0419f96a628f9eccaf9c53d6749d0a20ca (diff) | |
| download | firecrawl-dokploy-f3120565f44bdabab73bf1c83e71f9998f629efc.tar.gz firecrawl-dokploy-f3120565f44bdabab73bf1c83e71f9998f629efc.zip | |
make compose Dokploy-ready: fix project naming, network isolation, and secret management
| -rw-r--r-- | README.md | 213 | ||||
| -rwxr-xr-x | bin/clean | 3 | ||||
| -rwxr-xr-x | bin/dev_secrets | 1 | ||||
| -rwxr-xr-x | bin/prod_secrets | 6 | ||||
| -rwxr-xr-x | bin/up | 2 | ||||
| -rw-r--r-- | docker-compose.yml | 7 | ||||
| -rw-r--r-- | searxng/settings.yml | 2 |
7 files changed, 226 insertions, 8 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..9aba8cc --- /dev/null +++ b/README.md @@ -0,0 +1,213 @@ +# Firecrawl + SearXNG on Dokploy + +A minimal repo for deploying [Firecrawl](https://github.com/firecrawl/firecrawl) with [SearXNG](https://github.com/searxng/searxng) on [Dokploy](https://dokploy.com) — a fully self-hosted web search and content extraction API for AI tooling. + +## What this does + +This gives you a single API endpoint that can: + +- **Search the web** (`/v1/search`) — find pages matching a query, powered by SearXNG aggregating results from Google, Bing, DuckDuckGo, and others +- **Scrape a page** (`/v1/scrape`) — fetch any URL and get clean markdown or structured JSON, with full JavaScript rendering via Playwright +- **Crawl a site** (`/v1/crawl`) — traverse an entire website and extract content from every page +- **Map a site** (`/v1/map`) — discover all URLs on a domain without scraping them + +The search endpoint is what ties it all together for AI use: Firecrawl sends your query to SearXNG, gets back relevant URLs, then scrapes and cleans the top results — all in one API call. + +## Why this repo exists + +Firecrawl's official repo is a large monorepo that assumes you're building from source. SearXNG has its own separate docker-compose setup. To deploy both on Dokploy, you need compose files that: + +1. Use pre-built images instead of `build:` directives +2. Join the `dokploy-network` for Traefik routing +3. Include Traefik labels for automatic HTTPS +4. Use Docker named volumes for persistence +5. Avoid explicit `container_name` declarations (breaks Dokploy logging) +6. Wire SearXNG into Firecrawl via internal Docker networking + +Rather than forking both repos, this repo contains only the compose file, a SearXNG settings file, and this README. When either project publishes new images, you redeploy — no merge conflicts, no carrying source code you don't touch. + +## Architecture + +``` + ┌──────────────────────────┐ + Internet ──► Traefik ──► Firecrawl API (:3002) + │ │ │ + │ ▼ ▼ + │ Playwright Redis + │ (:3000) (:6379) + │ │ + │ ▼ ▼ + │ SearXNG PostgreSQL + │ (:8080) (:5432) + │ │ + │ ▼ ▼ + │ Google / … RabbitMQ + │ (:5672) + └──────────────────────────┘ +``` + +All services communicate over an internal Docker network. Only the Firecrawl API is exposed to the internet via Traefik. SearXNG is internal-only by default (you can optionally expose it via its own domain). + +## Services + +| Service | Image | Purpose | Resources | +|---|---|---|---| +| `api` | `ghcr.io/firecrawl/firecrawl` | Main API + workers | 4 CPU / 8 GB | +| `playwright-service` | `ghcr.io/firecrawl/playwright-service` | Headless browser for JS pages | 2 CPU / 4 GB | +| `searxng` | `searxng/searxng` | Metasearch engine | minimal | +| `postgres` | `ghcr.io/firecrawl/nuq-postgres` | NUQ job queue store (pg_cron + nuq schema) | minimal | +| `rabbitmq` | `rabbitmq:3-management` | NUQ message broker | minimal | +| `redis` | `redis:alpine` | Rate limiting, cache | minimal | + +## Deploying on Dokploy + +### 1. Prerequisites + +- A Dokploy instance +- A DNS A record pointing your subdomain (e.g. `firecrawl.yourdomain.com`) to your server + +### 2. Create the service + +1. In Dokploy, create a new **Compose** service (type: Docker Compose) +2. Connect this GitHub repo as the source +3. Set the **Compose Path** to `./docker-compose.yml` +4. Set the **branch** to `main` + +### 3. Configure environment variables + +Run `bin/prod_secrets` on your local machine to generate secrets via gopass, then paste the output into Dokploy's environment variable editor. + +Or set them manually: + +```env +# Required — domain Traefik routes to the Firecrawl API +FIRECRAWL_DOMAIN=firecrawl.yourdomain.com + +# Recommended — protect your API with a key +TEST_API_KEY=fc-your-secret-key + +# Recommended — change the Bull queue dashboard admin key +BULL_AUTH_KEY=something-secure + +# Recommended — set a strong PostgreSQL password +POSTGRES_PASSWORD=something-secure + +# SearXNG CSRF secret — auto-injected at container start +SEARXNG_SECRET=something-random +``` + +### 4. Deploy + +Hit deploy. Dokploy pulls the images, creates the containers, and Traefik generates SSL certificates. Give it ~60 seconds for all health checks to pass (PostgreSQL and RabbitMQ start first, then the API). + +Your API is now live at `https://firecrawl.yourdomain.com`. + +### 6. Test it + +```bash +# Search the web (SearXNG → Firecrawl scrape → clean markdown) +curl -X POST https://firecrawl.yourdomain.com/v1/search \ + -H 'Content-Type: application/json' \ + -H 'Authorization: Bearer fc-your-secret-key' \ + -d '{"query": "what is firecrawl", "limit": 5}' + +# Scrape a single page +curl -X POST https://firecrawl.yourdomain.com/v1/scrape \ + -H 'Content-Type: application/json' \ + -H 'Authorization: Bearer fc-your-secret-key' \ + -d '{"url": "https://example.com"}' +``` + +Or use `bin/test https://firecrawl.yourdomain.com` to run the full test suite. + +## Local development + +```bash +# First-time setup: generate dev secrets in gopass +bin/dev_secrets + +# Start all services (creates dokploy-network if missing) +bin/up + +# Run tests against local stack +bin/test + +# Stop +bin/down + +# Full cleanup (volumes, orphans) +bin/clean + +# Also remove cached images +bin/clean --images +``` + +## SearXNG configuration + +The `searxng/settings.yml` file in this repo is pre-configured for API use: + +- **JSON format enabled** — required for Firecrawl to consume results (disabled by default in SearXNG) +- **Rate limiter disabled** — since SearXNG is only accessed internally by Firecrawl, not by the public internet +- **Engines enabled** — Google, Bing, DuckDuckGo, Wikipedia, GitHub + +You can customize the engines, categories, and other settings by editing `searxng/settings.yml` and redeploying. See the [SearXNG documentation](https://docs.searxng.org/admin/settings/index.html) for all available options. + +The `secret_key` is handled automatically via the `SEARXNG_SECRET` environment variable — the Docker entrypoint injects it at container start. No need to edit this file for secrets. + +## Optional: exposing SearXNG publicly + +By default SearXNG is only accessible internally. If you also want a public search UI, uncomment the Traefik labels on the `searxng` service in `docker-compose.yml` and add `SEARXNG_DOMAIN` to your env vars. + +## Optional: AI extraction features + +Firecrawl can use an LLM for structured data extraction via its `/extract` endpoint. Add one of these to your env vars: + +```env +# OpenAI +OPENAI_API_KEY=sk-your-key + +# Or a local Ollama instance accessible from the server +OLLAMA_BASE_URL=http://host.docker.internal:11434/api +MODEL_NAME=deepseek-r1:7b +``` + +## Resource requirements + +The default limits match Firecrawl's official recommendations. Total: roughly 6 CPU cores and 12 GB RAM. For light personal use you can lower these — edit the `cpus` and `mem_limit` values in `docker-compose.yml`. The API runs fine on 2 cores / 4 GB and Playwright on 1 core / 2 GB, but expect slower scraping on JS-heavy sites. + +PostgreSQL, RabbitMQ, SearXNG, and Redis are lightweight and don't need explicit resource limits. + +## Updating + +Redeploy from Dokploy to pull the latest images. The compose file uses `latest` tags by default. To pin versions for stability, replace image tags with specific releases (e.g. `ghcr.io/firecrawl/firecrawl:v1.x.x`). Check Firecrawl's [releases page](https://github.com/firecrawl/firecrawl/releases) for available tags. + +Data persists in named volumes (`postgres-data`, `rabbitmq-data`, `redis-data`, `searxng-cache`) across redeployments. + +## Troubleshooting + +**`/search` returns empty or errors:** SearXNG might not be ready yet, or search engines might be rate-limiting your server's IP. Check SearXNG logs in Dokploy. Try changing the enabled engines in `searxng/settings.yml`. + +**Playwright timeouts on JS-heavy sites:** The default 4 GB memory limit for Playwright might not be enough. Increase `mem_limit` on the `playwright-service`. + +**403 errors from SearXNG:** Make sure `formats: - json` is present in `searxng/settings.yml`. Without it, SearXNG blocks non-HTML requests. + +**Redis connection refused:** Check that the Redis container is healthy in Dokploy's dashboard. The API won't start until Redis passes its healthcheck. + +**PostgreSQL or RabbitMQ not ready:** The API depends on both passing their health checks before starting. Check the Deployments tab in Dokploy for health check status. PostgreSQL needs ~30 seconds on first boot to initialize the NUQ schema. + +## File structure + +``` +. +├── docker-compose.yml # All six services, Dokploy-ready +├── searxng/ +│ └── settings.yml # SearXNG config (JSON API enabled, limiter off) +├── bin/ +│ ├── prod_secrets # Generate production env vars via gopass +│ ├── dev_secrets # Generate dev secrets via gopass +│ ├── up # Start local dev stack +│ ├── down # Stop local dev stack +│ ├── clean # Remove containers, volumes, images +│ └── test # Test a running deployment +└── README.md +``` @@ -17,8 +17,9 @@ if [ "$REMOVE_IMAGES" = "true" ]; then sudo docker image rm \ ghcr.io/firecrawl/firecrawl:latest \ ghcr.io/firecrawl/playwright-service:latest \ + ghcr.io/firecrawl/nuq-postgres:latest \ docker.io/searxng/searxng:latest \ - postgres:17-alpine \ + rabbitmq:3-management \ redis:alpine 2>/dev/null || true echo "Images removed." fi diff --git a/bin/dev_secrets b/bin/dev_secrets index ebc9bb0..4301e6e 100755 --- a/bin/dev_secrets +++ b/bin/dev_secrets @@ -25,6 +25,7 @@ function ensure_secret() { ensure_secret "projects/firecrawl-dokploy/dev/api_key" "Firecrawl API Key" true ensure_secret "projects/firecrawl-dokploy/dev/bull_auth_key" "Bull Auth Key" true ensure_secret "projects/firecrawl-dokploy/dev/postgres_password" "PostgreSQL Password" true +ensure_secret "projects/firecrawl-dokploy/dev/searxng_secret_key" "SearXNG Secret Key" true ensure_secret "projects/firecrawl-dokploy/dev/openai_api_key" "OpenAI API Key (optional, press enter to skip)" false echo "Dev secrets ensured." diff --git a/bin/prod_secrets b/bin/prod_secrets index ca9f7f1..e409a56 100755 --- a/bin/prod_secrets +++ b/bin/prod_secrets @@ -22,14 +22,18 @@ function get_or_gen_secret() { gopass show -o "$path" } +FIRECRAWL_DOMAIN=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/firecrawl_domain" false "Firecrawl domain (e.g. firecrawl.yourdomain.com)") TEST_API_KEY=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/api_key" true "Firecrawl API Key") BULL_AUTH_KEY=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/bull_auth_key" true "Bull Auth Key") POSTGRES_PASSWORD=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/postgres_password" true "PostgreSQL Password") -OPENAI_API_KEY=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/openai_api_key" false "OpenAI API Key") +SEARXNG_SECRET_KEY=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/searxng_secret_key" true "SearXNG Secret Key") +OPENAI_API_KEY=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/openai_api_key" false "OpenAI API Key (optional, press enter to skip)") cat <<EOF +FIRECRAWL_DOMAIN=$FIRECRAWL_DOMAIN TEST_API_KEY=$TEST_API_KEY BULL_AUTH_KEY=$BULL_AUTH_KEY POSTGRES_PASSWORD=$POSTGRES_PASSWORD +SEARXNG_SECRET=$SEARXNG_SECRET_KEY OPENAI_API_KEY=$OPENAI_API_KEY EOF @@ -13,12 +13,14 @@ fi export TEST_API_KEY="$(gopass show -o projects/firecrawl-dokploy/dev/api_key)" export BULL_AUTH_KEY="$(gopass show -o projects/firecrawl-dokploy/dev/bull_auth_key)" export POSTGRES_PASSWORD="$(gopass show -o projects/firecrawl-dokploy/dev/postgres_password)" +export SEARXNG_SECRET="$(gopass show -o projects/firecrawl-dokploy/dev/searxng_secret_key)" export OPENAI_API_KEY="$(gopass show -o projects/firecrawl-dokploy/dev/openai_api_key || echo "")" export FIRECRAWL_DOMAIN="firecrawl.localhost" sudo TEST_API_KEY="$TEST_API_KEY" \ BULL_AUTH_KEY="$BULL_AUTH_KEY" \ POSTGRES_PASSWORD="$POSTGRES_PASSWORD" \ + SEARXNG_SECRET="$SEARXNG_SECRET" \ OPENAI_API_KEY="$OPENAI_API_KEY" \ FIRECRAWL_DOMAIN="$FIRECRAWL_DOMAIN" \ docker compose up "$@" diff --git a/docker-compose.yml b/docker-compose.yml index 44c0f05..2bd169e 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,5 +1,3 @@ -name: firecrawl - services: # ============================================================ # PostgreSQL — Firecrawl's NUQ queue store @@ -67,11 +65,11 @@ services: image: docker.io/searxng/searxng:latest networks: - backend - - dokploy-network volumes: - - ./searxng/settings.yml:/etc/searxng/settings.yml:ro + - ./searxng/settings.yml:/etc/searxng/settings.yml - searxng-cache:/var/cache/searxng:rw environment: + - SEARXNG_SECRET=${SEARXNG_SECRET:-} - SEARXNG_BASE_URL=https://${SEARXNG_DOMAIN:-searxng.localhost}/ cap_drop: - ALL @@ -102,7 +100,6 @@ services: image: ghcr.io/firecrawl/playwright-service:latest networks: - backend - - dokploy-network environment: PORT: 3000 PROXY_SERVER: ${PROXY_SERVER:-} diff --git a/searxng/settings.yml b/searxng/settings.yml index ac380be..38d1f6b 100644 --- a/searxng/settings.yml +++ b/searxng/settings.yml @@ -1,7 +1,7 @@ use_default_settings: true server: - secret_key: "change-this-to-a-random-string" + secret_key: "ultrasecretkey" limiter: false image_proxy: false port: 8080 |
