Trillo Document AI VectorDB Setup

Prerequisites:

  • A Google Cloud Project with the Cloud SQL API enabled.

  • A deployer VM instance where you have SSH access.

Steps:

SSH into Deployer VM:

  • Connect to your deployer VM instance using SSH.

Create Setup Files:

  • Create a directory named docai.

  • Inside docai, create two files:

    • install.sh

    • vectorembedding.sql

Populate install.sh:

  • Paste the following code into install.sh, ensuring proper indentation:

set -e
INSTANCE_NAME=trillo-pgvector
REGION=us-central1
DB_TIER=db-custom-4-15360

# Create PostgreSQL instance (adjust REGION and DB_TIER as needed)
gcloud sql instances create $INSTANCE_NAME \
--database-version=POSTGRES_15 --edition=ENTERPRISE --tier=$DB_TIER \
--region=$REGION --availability-type ZONAL --no-assign-ip --network=default

# Get instance IP address
IP_ADDRESS=$(gcloud sql instances describe $INSTANCE_NAME --format='value(ipAddresses.ipAddress)')

# Set PostgreSQL password (a random password will be generated)
export PGPASSWORD=$(openssl rand -base64 8)
gcloud sql users set-password postgres --instance=$INSTANCE_NAME --password="$PGPASSWORD"

# Create database and install extensions
psql -h "$IP_ADDRESS" -U postgres -d postgres -c "CREATE DATABASE pgvector;"
psql -h "$IP_ADDRESS" -U postgres -d pgvector -c "CREATE EXTENSION vector;"

# Load data schema
psql -h "$IP_ADDRESS" -U postgres -d pgvector -f vectorembedding.sql

# Connection details
echo "Database URL: jdbc:postgresql://$IP_ADDRESS:5432/pgvector"
echo "Database Username: postgres"
echo "Database Password: $PGPASSWORD"

Populate vectorembedding.sql:

  • Paste the provided SQL code directly into vectorembedding.sql

create table vectorembedding_tbl
(
id varchar(255) not null
primary key,
createdat bigint,
updatedat bigint,
deleted boolean,
deletedat bigint,
folderid bigint,
docid bigint,
pagenumber integer,
tenantid bigint,
documentname varchar(255),
filename varchar(255),
contenttype varchar(100),
idofuser bigint,
userid varchar(255),
author varchar(255),
gcsfileurl varchar(1024),
embeddedimageurls varchar(4096),
content text,
embedding vector(768)
);

create index _folderid_index_ on vectorembedding_tbl (folderid);
create index _idofuser_index_ on vectorembedding_tbl (idofuser);
create index vectorembedding_tbl_embedding_idx on vectorembedding_tbl using hnsw (embedding vector_l2_ops);

Execute the Script:

  • Run install.sh from within the docai directory: ./install.sh

  • Important: Note down the generated database URL, username, and password.

Workbench UI Configuration:

  • Open the Trillo Document AI Workbench UI.

  • If a data source doesn't exist, create one using the connection details from step 5.

  • Navigate to the appropriate section within the UI and enter the database URL, username, and password.

  • Save your changes.

Last updated