summaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
-rw-r--r--README.md213
-rwxr-xr-xbin/clean3
-rwxr-xr-xbin/dev_secrets1
-rwxr-xr-xbin/prod_secrets6
-rwxr-xr-xbin/up2
-rw-r--r--docker-compose.yml7
-rw-r--r--searxng/settings.yml2
7 files changed, 226 insertions, 8 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..9aba8cc
--- /dev/null
+++ b/README.md
@@ -0,0 +1,213 @@
+# Firecrawl + SearXNG on Dokploy
+
+A minimal repo for deploying [Firecrawl](https://github.com/firecrawl/firecrawl) with [SearXNG](https://github.com/searxng/searxng) on [Dokploy](https://dokploy.com) — a fully self-hosted web search and content extraction API for AI tooling.
+
+## What this does
+
+This gives you a single API endpoint that can:
+
+- **Search the web** (`/v1/search`) — find pages matching a query, powered by SearXNG aggregating results from Google, Bing, DuckDuckGo, and others
+- **Scrape a page** (`/v1/scrape`) — fetch any URL and get clean markdown or structured JSON, with full JavaScript rendering via Playwright
+- **Crawl a site** (`/v1/crawl`) — traverse an entire website and extract content from every page
+- **Map a site** (`/v1/map`) — discover all URLs on a domain without scraping them
+
+The search endpoint is what ties it all together for AI use: Firecrawl sends your query to SearXNG, gets back relevant URLs, then scrapes and cleans the top results — all in one API call.
+
+## Why this repo exists
+
+Firecrawl's official repo is a large monorepo that assumes you're building from source. SearXNG has its own separate docker-compose setup. To deploy both on Dokploy, you need compose files that:
+
+1. Use pre-built images instead of `build:` directives
+2. Join the `dokploy-network` for Traefik routing
+3. Include Traefik labels for automatic HTTPS
+4. Use Docker named volumes for persistence
+5. Avoid explicit `container_name` declarations (breaks Dokploy logging)
+6. Wire SearXNG into Firecrawl via internal Docker networking
+
+Rather than forking both repos, this repo contains only the compose file, a SearXNG settings file, and this README. When either project publishes new images, you redeploy — no merge conflicts, no carrying source code you don't touch.
+
+## Architecture
+
+```
+ ┌──────────────────────────┐
+ Internet ──► Traefik ──► Firecrawl API (:3002)
+ │ │ │
+ │ ▼ ▼
+ │ Playwright Redis
+ │ (:3000) (:6379)
+ │ │
+ │ ▼ ▼
+ │ SearXNG PostgreSQL
+ │ (:8080) (:5432)
+ │ │
+ │ ▼ ▼
+ │ Google / … RabbitMQ
+ │ (:5672)
+ └──────────────────────────┘
+```
+
+All services communicate over an internal Docker network. Only the Firecrawl API is exposed to the internet via Traefik. SearXNG is internal-only by default (you can optionally expose it via its own domain).
+
+## Services
+
+| Service | Image | Purpose | Resources |
+|---|---|---|---|
+| `api` | `ghcr.io/firecrawl/firecrawl` | Main API + workers | 4 CPU / 8 GB |
+| `playwright-service` | `ghcr.io/firecrawl/playwright-service` | Headless browser for JS pages | 2 CPU / 4 GB |
+| `searxng` | `searxng/searxng` | Metasearch engine | minimal |
+| `postgres` | `ghcr.io/firecrawl/nuq-postgres` | NUQ job queue store (pg_cron + nuq schema) | minimal |
+| `rabbitmq` | `rabbitmq:3-management` | NUQ message broker | minimal |
+| `redis` | `redis:alpine` | Rate limiting, cache | minimal |
+
+## Deploying on Dokploy
+
+### 1. Prerequisites
+
+- A Dokploy instance
+- A DNS A record pointing your subdomain (e.g. `firecrawl.yourdomain.com`) to your server
+
+### 2. Create the service
+
+1. In Dokploy, create a new **Compose** service (type: Docker Compose)
+2. Connect this GitHub repo as the source
+3. Set the **Compose Path** to `./docker-compose.yml`
+4. Set the **branch** to `main`
+
+### 3. Configure environment variables
+
+Run `bin/prod_secrets` on your local machine to generate secrets via gopass, then paste the output into Dokploy's environment variable editor.
+
+Or set them manually:
+
+```env
+# Required — domain Traefik routes to the Firecrawl API
+FIRECRAWL_DOMAIN=firecrawl.yourdomain.com
+
+# Recommended — protect your API with a key
+TEST_API_KEY=fc-your-secret-key
+
+# Recommended — change the Bull queue dashboard admin key
+BULL_AUTH_KEY=something-secure
+
+# Recommended — set a strong PostgreSQL password
+POSTGRES_PASSWORD=something-secure
+
+# SearXNG CSRF secret — auto-injected at container start
+SEARXNG_SECRET=something-random
+```
+
+### 4. Deploy
+
+Hit deploy. Dokploy pulls the images, creates the containers, and Traefik generates SSL certificates. Give it ~60 seconds for all health checks to pass (PostgreSQL and RabbitMQ start first, then the API).
+
+Your API is now live at `https://firecrawl.yourdomain.com`.
+
+### 6. Test it
+
+```bash
+# Search the web (SearXNG → Firecrawl scrape → clean markdown)
+curl -X POST https://firecrawl.yourdomain.com/v1/search \
+ -H 'Content-Type: application/json' \
+ -H 'Authorization: Bearer fc-your-secret-key' \
+ -d '{"query": "what is firecrawl", "limit": 5}'
+
+# Scrape a single page
+curl -X POST https://firecrawl.yourdomain.com/v1/scrape \
+ -H 'Content-Type: application/json' \
+ -H 'Authorization: Bearer fc-your-secret-key' \
+ -d '{"url": "https://example.com"}'
+```
+
+Or use `bin/test https://firecrawl.yourdomain.com` to run the full test suite.
+
+## Local development
+
+```bash
+# First-time setup: generate dev secrets in gopass
+bin/dev_secrets
+
+# Start all services (creates dokploy-network if missing)
+bin/up
+
+# Run tests against local stack
+bin/test
+
+# Stop
+bin/down
+
+# Full cleanup (volumes, orphans)
+bin/clean
+
+# Also remove cached images
+bin/clean --images
+```
+
+## SearXNG configuration
+
+The `searxng/settings.yml` file in this repo is pre-configured for API use:
+
+- **JSON format enabled** — required for Firecrawl to consume results (disabled by default in SearXNG)
+- **Rate limiter disabled** — since SearXNG is only accessed internally by Firecrawl, not by the public internet
+- **Engines enabled** — Google, Bing, DuckDuckGo, Wikipedia, GitHub
+
+You can customize the engines, categories, and other settings by editing `searxng/settings.yml` and redeploying. See the [SearXNG documentation](https://docs.searxng.org/admin/settings/index.html) for all available options.
+
+The `secret_key` is handled automatically via the `SEARXNG_SECRET` environment variable — the Docker entrypoint injects it at container start. No need to edit this file for secrets.
+
+## Optional: exposing SearXNG publicly
+
+By default SearXNG is only accessible internally. If you also want a public search UI, uncomment the Traefik labels on the `searxng` service in `docker-compose.yml` and add `SEARXNG_DOMAIN` to your env vars.
+
+## Optional: AI extraction features
+
+Firecrawl can use an LLM for structured data extraction via its `/extract` endpoint. Add one of these to your env vars:
+
+```env
+# OpenAI
+OPENAI_API_KEY=sk-your-key
+
+# Or a local Ollama instance accessible from the server
+OLLAMA_BASE_URL=http://host.docker.internal:11434/api
+MODEL_NAME=deepseek-r1:7b
+```
+
+## Resource requirements
+
+The default limits match Firecrawl's official recommendations. Total: roughly 6 CPU cores and 12 GB RAM. For light personal use you can lower these — edit the `cpus` and `mem_limit` values in `docker-compose.yml`. The API runs fine on 2 cores / 4 GB and Playwright on 1 core / 2 GB, but expect slower scraping on JS-heavy sites.
+
+PostgreSQL, RabbitMQ, SearXNG, and Redis are lightweight and don't need explicit resource limits.
+
+## Updating
+
+Redeploy from Dokploy to pull the latest images. The compose file uses `latest` tags by default. To pin versions for stability, replace image tags with specific releases (e.g. `ghcr.io/firecrawl/firecrawl:v1.x.x`). Check Firecrawl's [releases page](https://github.com/firecrawl/firecrawl/releases) for available tags.
+
+Data persists in named volumes (`postgres-data`, `rabbitmq-data`, `redis-data`, `searxng-cache`) across redeployments.
+
+## Troubleshooting
+
+**`/search` returns empty or errors:** SearXNG might not be ready yet, or search engines might be rate-limiting your server's IP. Check SearXNG logs in Dokploy. Try changing the enabled engines in `searxng/settings.yml`.
+
+**Playwright timeouts on JS-heavy sites:** The default 4 GB memory limit for Playwright might not be enough. Increase `mem_limit` on the `playwright-service`.
+
+**403 errors from SearXNG:** Make sure `formats: - json` is present in `searxng/settings.yml`. Without it, SearXNG blocks non-HTML requests.
+
+**Redis connection refused:** Check that the Redis container is healthy in Dokploy's dashboard. The API won't start until Redis passes its healthcheck.
+
+**PostgreSQL or RabbitMQ not ready:** The API depends on both passing their health checks before starting. Check the Deployments tab in Dokploy for health check status. PostgreSQL needs ~30 seconds on first boot to initialize the NUQ schema.
+
+## File structure
+
+```
+.
+├── docker-compose.yml # All six services, Dokploy-ready
+├── searxng/
+│ └── settings.yml # SearXNG config (JSON API enabled, limiter off)
+├── bin/
+│ ├── prod_secrets # Generate production env vars via gopass
+│ ├── dev_secrets # Generate dev secrets via gopass
+│ ├── up # Start local dev stack
+│ ├── down # Stop local dev stack
+│ ├── clean # Remove containers, volumes, images
+│ └── test # Test a running deployment
+└── README.md
+```
diff --git a/bin/clean b/bin/clean
index a87e386..9ab648c 100755
--- a/bin/clean
+++ b/bin/clean
@@ -17,8 +17,9 @@ if [ "$REMOVE_IMAGES" = "true" ]; then
sudo docker image rm \
ghcr.io/firecrawl/firecrawl:latest \
ghcr.io/firecrawl/playwright-service:latest \
+ ghcr.io/firecrawl/nuq-postgres:latest \
docker.io/searxng/searxng:latest \
- postgres:17-alpine \
+ rabbitmq:3-management \
redis:alpine 2>/dev/null || true
echo "Images removed."
fi
diff --git a/bin/dev_secrets b/bin/dev_secrets
index ebc9bb0..4301e6e 100755
--- a/bin/dev_secrets
+++ b/bin/dev_secrets
@@ -25,6 +25,7 @@ function ensure_secret() {
ensure_secret "projects/firecrawl-dokploy/dev/api_key" "Firecrawl API Key" true
ensure_secret "projects/firecrawl-dokploy/dev/bull_auth_key" "Bull Auth Key" true
ensure_secret "projects/firecrawl-dokploy/dev/postgres_password" "PostgreSQL Password" true
+ensure_secret "projects/firecrawl-dokploy/dev/searxng_secret_key" "SearXNG Secret Key" true
ensure_secret "projects/firecrawl-dokploy/dev/openai_api_key" "OpenAI API Key (optional, press enter to skip)" false
echo "Dev secrets ensured."
diff --git a/bin/prod_secrets b/bin/prod_secrets
index ca9f7f1..e409a56 100755
--- a/bin/prod_secrets
+++ b/bin/prod_secrets
@@ -22,14 +22,18 @@ function get_or_gen_secret() {
gopass show -o "$path"
}
+FIRECRAWL_DOMAIN=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/firecrawl_domain" false "Firecrawl domain (e.g. firecrawl.yourdomain.com)")
TEST_API_KEY=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/api_key" true "Firecrawl API Key")
BULL_AUTH_KEY=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/bull_auth_key" true "Bull Auth Key")
POSTGRES_PASSWORD=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/postgres_password" true "PostgreSQL Password")
-OPENAI_API_KEY=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/openai_api_key" false "OpenAI API Key")
+SEARXNG_SECRET_KEY=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/searxng_secret_key" true "SearXNG Secret Key")
+OPENAI_API_KEY=$(get_or_gen_secret "projects/firecrawl-dokploy/prod/openai_api_key" false "OpenAI API Key (optional, press enter to skip)")
cat <<EOF
+FIRECRAWL_DOMAIN=$FIRECRAWL_DOMAIN
TEST_API_KEY=$TEST_API_KEY
BULL_AUTH_KEY=$BULL_AUTH_KEY
POSTGRES_PASSWORD=$POSTGRES_PASSWORD
+SEARXNG_SECRET=$SEARXNG_SECRET_KEY
OPENAI_API_KEY=$OPENAI_API_KEY
EOF
diff --git a/bin/up b/bin/up
index 1e39076..2fff656 100755
--- a/bin/up
+++ b/bin/up
@@ -13,12 +13,14 @@ fi
export TEST_API_KEY="$(gopass show -o projects/firecrawl-dokploy/dev/api_key)"
export BULL_AUTH_KEY="$(gopass show -o projects/firecrawl-dokploy/dev/bull_auth_key)"
export POSTGRES_PASSWORD="$(gopass show -o projects/firecrawl-dokploy/dev/postgres_password)"
+export SEARXNG_SECRET="$(gopass show -o projects/firecrawl-dokploy/dev/searxng_secret_key)"
export OPENAI_API_KEY="$(gopass show -o projects/firecrawl-dokploy/dev/openai_api_key || echo "")"
export FIRECRAWL_DOMAIN="firecrawl.localhost"
sudo TEST_API_KEY="$TEST_API_KEY" \
BULL_AUTH_KEY="$BULL_AUTH_KEY" \
POSTGRES_PASSWORD="$POSTGRES_PASSWORD" \
+ SEARXNG_SECRET="$SEARXNG_SECRET" \
OPENAI_API_KEY="$OPENAI_API_KEY" \
FIRECRAWL_DOMAIN="$FIRECRAWL_DOMAIN" \
docker compose up "$@"
diff --git a/docker-compose.yml b/docker-compose.yml
index 44c0f05..2bd169e 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,5 +1,3 @@
-name: firecrawl
-
services:
# ============================================================
# PostgreSQL — Firecrawl's NUQ queue store
@@ -67,11 +65,11 @@ services:
image: docker.io/searxng/searxng:latest
networks:
- backend
- - dokploy-network
volumes:
- - ./searxng/settings.yml:/etc/searxng/settings.yml:ro
+ - ./searxng/settings.yml:/etc/searxng/settings.yml
- searxng-cache:/var/cache/searxng:rw
environment:
+ - SEARXNG_SECRET=${SEARXNG_SECRET:-}
- SEARXNG_BASE_URL=https://${SEARXNG_DOMAIN:-searxng.localhost}/
cap_drop:
- ALL
@@ -102,7 +100,6 @@ services:
image: ghcr.io/firecrawl/playwright-service:latest
networks:
- backend
- - dokploy-network
environment:
PORT: 3000
PROXY_SERVER: ${PROXY_SERVER:-}
diff --git a/searxng/settings.yml b/searxng/settings.yml
index ac380be..38d1f6b 100644
--- a/searxng/settings.yml
+++ b/searxng/settings.yml
@@ -1,7 +1,7 @@
use_default_settings: true
server:
- secret_key: "change-this-to-a-random-string"
+ secret_key: "ultrasecretkey"
limiter: false
image_proxy: false
port: 8080