FireCrawl: Efficient Web Crawling and Information Retrieval

2025-11-26 11:11:24 - No Comments

First we need to install some dependencies.

sudo apt install -y postgresql git curl wget postgresql-15-cron

Next we need to do a couple of small changes to the PostgreSQL configuration.

sudo vi /etc/postgresql/15/main/postgresql.conf

Small changes, enabling cron for PostgreSQL and also allowing cron to run in the firecrawl database.

shared_preload_libraries = 'pg_cron'
cron.database_name = 'firecrawl'

After change we restart PostgreSQL.

sudo systemctl restart postgresql
sudo systemctl status postgresql

Redis

We will then install Redis using steps from their webpage.

sudo apt-get install -y lsb-release gpg

curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg

sudo chmod 644 /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list

sudo apt-get update
sudo apt-get install -y redis

sudo systemctl enable redis-server
sudo systemctl start redis-server

Go

Installing a newer version of Go then available in the Debian system.

wget https://go.dev/dl/go1.25.4.linux-amd64.tar.gz
sudo rm -rf /usr/local/go
sudo tar -C /usr/local -xzf go1.25.4.linux-amd64.tar.gz
rm go1.25.4.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
go version

Next we need to create a new user and home directory.

sudo -H useradd --shell /bin/bash --system --home-dir "/opt/firecrawl" --comment 'Firecrawl' firecrawl

sudo mkdir -p /opt/firecrawl
sudo chown -R firecrawl:firecrawl /opt/firecrawl

sudo -H -u firecrawl -i

NodeJS

Using this new user we will install Node Version Manager and pnpm (performant npm).

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
\. "$HOME/.nvm/nvm.sh"

nvm install 24
node -v
corepack enable pnpm
pnpm -v

Next we download the FireCrawl repository.

git clone https://github.com/firecrawl/firecrawl.git

To get the API configured correctly we will add a environment file.

cd /opt/firecrawl/firecrawl/apps/api
vi .env

The configuration in this environment file is mostly default using a local database, Redis and expose port 3002 to the network.

# ===== Required ENVS ======
NUM_WORKERS_PER_QUEUE=8
PORT=3002
HOST=0.0.0.0
REDIS_URL=redis://localhost:6379
REDIS_RATE_LIMIT_URL=redis://localhost:6379

## To turn on DB authentication, you need to set up supabase.
USE_DB_AUTHENTICATION=false

## Using the PostgreSQL for queuing -- change if credentials, host, or DB is different
NUQ_DATABASE_URL=postgres://firecrawl:qwerty@localhost:5432/firecrawl

# ===== Optional ENVS ======

# Supabase Setup (used to support DB authentication, advanced logging, etc.)
SUPABASE_ANON_TOKEN=
SUPABASE_URL=
SUPABASE_SERVICE_TOKEN=

# Other Optionals
TEST_API_KEY= # use if you've set up authentication and want to test with a real API key
OPENAI_API_KEY= # add for LLM dependednt features (image alt generation, etc.)
BULL_AUTH_KEY= @
PLAYWRIGHT_MICROSERVICE_URL=  # set if you'd like to run a playwright fallback
LLAMAPARSE_API_KEY= #Set if you have a llamaparse key you'd like to use to parse pdfs
SLACK_WEBHOOK_URL= # set if you'd like to send slack server health status messages
POSTHOG_API_KEY= # set if you'd like to send posthog events like job logs
POSTHOG_HOST= # set if you'd like to send posthog events like job logs

Another part of the build process needs rust so we will install it here.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
. "$HOME/.cargo/env"

Next we will run all the build steps in the API directory using PNPM.

cd /opt/firecrawl/firecrawl/apps/api
pnpm install

Now we will configure the database, switch to the PostgreSQL user and enter database as root.

sudo -i -u postgres
psql

Add new FireCrawl database, user and give privileges. Add the extension for cron and add FireCrawl user to that schema.

CREATE DATABASE firecrawl;
CREATE USER firecrawl WITH PASSWORD 'qwerty';
GRANT ALL PRIVILEGES ON DATABASE firecrawl TO firecrawl;
\c firecrawl
\dn+
CREATE EXTENSION pg_cron;
GRANT ALL PRIVILEGES ON SCHEMA cron TO firecrawl;
\q

Exit the database and run the bootstrap script using the FireCrawl user.

psql -h localhost -U firecrawl -d firecrawl -f /opt/firecrawl/firecrawl/apps/nuq-postgres/nuq.sql

Next we create a start script for the tool as it needs a bunch of dependencies.

sudo vi /usr/local/bin/firecrawl.sh

The script below will add NodeJS, Rust and Go into the environment, switch to the API directory and start the service.

#!/bin/bash
. "$HOME/.cargo/env"
. "$HOME/.nvm/nvm.sh"
export PATH=$PATH:/usr/local/go/bin
cd /opt/firecrawl/firecrawl/apps/api
pnpm start 2>&1 

Make the script executable and give it the right user and group privileges.

sudo chmod +x /usr/local/bin/firecrawl.sh
sudo chown firecrawl:firecrawl /usr/local/bin/firecrawl.sh

Open the service file.

sudo vi /etc/systemd/system/firecrawl.service

Add a service definition with the script we defined above, standard script requiring network, creating a PID file and trying to restart if startup fails.

[Unit]
Description=Firecrawl service
Wants=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
RuntimeDirectory=firecrawl
Type=simple
User=firecrawl
Group=firecrawl
TimeoutStartSec=0
Restart=always
RestartSec=10s
PIDFile=/run/firecrawl/firecrawl.pid
ExecStart=/usr/local/bin/firecrawl.sh

Reload the daemon registry, start, enable and check the status of the service. Then we can follow the log using journalctl to see when it has started.

sudo systemctl daemon-reload
sudo systemctl start firecrawl
sudo systemctl enable firecrawl
sudo systemctl status firecrawl

sudo journalctl -u firecrawl -f

Lastly you here have a couple of curl calls you can use to test the service.

curl -X GET http://localhost:3002/test

curl -X POST http://localhost:3002/v1/crawl \
    -H 'Content-Type: application/json' \
    -d '{
      "url": "https://mendable.ai"
    }'

Be the first to leave a comment!


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.