Deploy Specify7 to an EC2 instance
These are the detailed instructions to get Specify7 up and running on an EC2 instance.
This is a temporary solution for deployment, with an ECS solution coming in the future.
These instruction require access to the docker-compositions
private repo.
For this kind of AWS deployment, it is easiest to use the specifycloud
deployment,
even with just one client instance. I used the specifycloud
deployment for my AWS
EC2 instance, but if you want to include things like the asset server, you can create a
separate instance of the asset server deployment (instructions here <https://github.com/specify/web-asset-server>
). You could also try to use the
all-in-one
deployment instead.
Specify7 was not originally designed to be cloud native, but since I have joined the team and picked up the project, I have been working on developing a modern cloud native solution. I hope to soon have a set of CDK scripts that will make deployment to ECS, S3, and RDS simple and scalable.
Spin Up EC2 Instance
For the Amazon Machine Image (AMI), choose the default Ubuntu Server (Ubuntu Server 22.04 LTS) with the 64-bit x86 architecture.
For the instance type, start with the t3.small or t3.medium. Upgrade to a better instance if needed.
In Network Settings, make sure to allow HTTP and HTTPS traffic.
Setup your Key Pair for logging in.
Launch Instance.
S3 Setup
Create an S3 bucket for storing some static files.
You can either upload the GitHub repo to s3, or you can clone the repo directly in the
EC2 instance. If you are cloning inside the EC2 instance, note that you will need a
GitHub access token to clone the private repo, instructions here <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token>
_.
Make sure when configuring the token that you allow it to read from private repos,
otherwise cloning will not work.
Upload your SQL file to the S3 bucket for the initial database upload.
There are two config files that you need to create for specifycloud:
spcloudservers.json ->
{
"servers": {
"archireef": {
"sp7": "edge",
"sp6": "specify6803",
"https": false,
"database": "archireef"
}
},
"decommissioned": [],
"sp6versions": {
"specify6803": "6.8.03"
}
}
Your need to edit the version of Specify6 for your database’s schema version (latest is 6.8.03). For sp7, input the GitHub tag that you want to use in the repo (latest is always edge). For your first run, keep https set to false. Change it to true only after you have setup SSL/TLS. The database value is the name of your database.
Config file
Edit the DATABASE_HOST, MASTER_PASSWORD, ASSET_SERVER_URL, ASSET_SERVER_KEY, and RRHOST_PRIVATE_IP
defaults.env ->
DATABASE_HOST=<YOUR_DATABASE_HOST>
DATABASE_PORT=3306
MASTER_NAME=master
MASTER_PASSWORD=<YOUR_MASTER_PASSWORD>
SECRET_KEY=temp
ASSET_SERVER_URL=https://assets1.specifycloud.org/web_asset_store.xml
ASSET_SERVER_KEY=<YOUR_ASSET_SERVER_KEY>
REPORT_RUNNER_HOST=<RRHOST_PRIVATE_IP>
REPORT_RUNNER_PORT=8080
CELERY_BROKER_URL=redis://redis/0
CELERY_RESULT_BACKEND=redis://redis/1
LOG_LEVEL=WARNING
SP7_DEBUG=false
Configure Specify7
Once the EC2 instance is up and running, ssh into the EC2 instance.
Download your AWS access key
chmod 600 YOUR-AWS-ACCESS-KEY.pem
On the EC2 Instance webpage, click connect, choose ssh, and copy the ssh command.
Make sure you are in the directory with the access key.You can try using AWS CloudShell if you don’t want to ssh from your local machine.
Here are the commands to run to setup Specify7. Note that for the AWS access key, you can use the AWS Secrets manager if you prefer. Also note that after some of these commands, SystemD services will restart. When that happens, just press enter once or twice to you get back to the shell.
Make sure to fill in all variables (starting with $
) and YOUR_AWS_ACCESS_KEY* in the
following script before running commands
$SPECIFY_CLOUD_BUCKET_NAME
$REGION
YOUR_AWS_ACCESS_KEY_ID
YOUR_AWS_ACCESS_KEY_SECRET
# Avoid services restarting during apt updates
sudo sed -i "s/#\$nrconf{kernelhints} = -1;/\$nrconf{kernelhints} = -1;/g" /etc/needrestart/needrestart.conf;
# Run apt installs
sudo apt update;
sudo apt upgrade -y;
sudo apt install -y apt-transport-https ca-certificates git gh curl software-properties-common wget python3-pip awscli mysql-client j2cli;
# Configure AWS
aws configure set aws_access_key_id "YOUR_AWS_ACCESS_KEY_ID";
aws configure set aws_secret_access_key "YOUR_AWS_ACCESS_KEY_SECRET";
aws configure set default.region $REGION;
aws configure set default.output json;
# Install Docker
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg;
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null;
sudo apt update;
apt-cache policy docker-ce;
sudo apt install -y docker-ce;
docker --version; # Check to make sure it installed correctly
# Install docker compose
mkdir .docker;
mkdir .docker/cli-plugins;
curl -L "https://github.com/docker/compose/releases/download/v2.17.2/docker-compose-linux-$(uname -m)" -o ~/.docker/cli-plugins/docker-compose;
chmod +x ~/.docker/cli-plugins/docker-compose;
docker compose version; # Check to make sure it installed correctly
# Copy files from S3, only need to copy the file that you need
aws s3 cp s3://$SPECIFY_CLOUD_BUCKET_NAME/repo-snapshots/docker-compositions/ ./ --recursive;
aws s3 cp s3://$SPECIFY_CLOUD_BUCKET_NAME/config/github-access-key.txt ./;
aws s3 cp s3://$SPECIFY_CLOUD_BUCKET_NAME/config/spcloudservers.json ./specifycloud/;
aws s3 cp s3://$SPECIFY_CLOUD_BUCKET_NAME/config/defaults.env ./specifycloud/;
# Optional, need to clone if you did not copy repo from S3
# Clone private repo, need to use github cli
gh auth login --with-token < github-access-key.txt;
gh auth setup-git;
gh repo clone https://github.com/specify/docker-compositions.git;
Spin Up RDS Instance
For choosing the RDS engine type, I have been experimenting with using Aurora MySQL 5.7 to handle dynamic scaling, but the default option for Specify7 on AWS RDS is MariaDB 10.6.10.
For Templates, I would suggest starting with Dev/Test instead of production.
Setup Master password.
For testing, I would suggest using either the
db.t3.medium
ordb.t3.large
instances. You might feel the need to upgrade to thedb.m5.large
instance or something similar.Decide on storage size to accommodate your data size. The general purpose SSD storage option should suffice, but you can try the provisioned IOPS SSD option if needed.
Important, make sure in the connectivity options, select “Connect to an EC2 compute resource” and add your EC2 instance.
Create database.
Upload Data
Get the database host from the AWS RDS webpage. In the connectivity tab of your RDS instance, copy the test marked ‘Endpoint’. Then run the following commands to upload your data to the database.
Make sure to fill in all variables (starting with $
) in the
following script before running commands
$SPECIFY_CLOUD_BUCKET_NAME
$SPECIFY_CLOUD_DATABASE_NAME
$DB_IDENTIFIER
$REGION
# Database setup
cd specifycloud;
mkdir seed-databases;
aws s3 cp s3://$SPECIFY_CLOUD_BUCKET_NAME/seed-database/archireef.sql ./seed-databases/;
mysql --host $SPECIFY_CLOUD_DATABASE_NAME.$DB_IDENTIFIER.$REGION.rds.amazonaws.com \
--port 3306 -u master -p -e "create database archireef;";
mysql --host $SPECIFY_CLOUD_DATABASE_NAME.$DB_IDENTIFIER.$REGION.rds.amazonaws.com \
--port 3306 -u master -p specify < ./seed-databases/archireef.sql;
rm -f ./seed-databases/archireef.sql;
Deploy Specify7
Here are the remaining commands on the EC2 instance to setup the database connection and start running Specify7. Replace the S3 and RDS urls with your own.
# Run Specify Network
cd specifycloud;
make;
sudo docker compose up -d;
Go to the AWS console webpage and navigate to the EC2 instance page. For your instance, select the networking tab, then copy the the public IPv4 DNS address and open if in your browser. Note to change the url from https to http if you haven’t setup SSL/TLS yet (https notes in the Concluding notes section).
Docker Container Dependencies
Containers:
specify7
django application of specify7
depends on connections to the webpack, specify7-worker, redis, asset-server, nginx, and report-runner containers
depends on files being created by the specify6 container
webpack
holds static files for the front-end
the nginx server depends on this container
independent container
specify7-worker
python celery worker that runs long running tasks for specify7
depends on connection to specify7 container
redis
acts as the job queue broker between the specify7 and specify7-worker containers
could possible be replaced by AWS SQS in the future
depends on connections to both the specify7 and specify7-worker containers
asset-server
serves the assets to the specify7 container
independent container
specify6
generates files and database schema version for specify7
the conainter stops after the files are created
independent container
this container can be removed if the static files are moved to S3 and then copied into the specify7 container on startup
nginx
webserver for specify7 django application
depends on connection to specify7 and webpack containers
could possibly be replaced with AWS CloudFront
report-runner
java application the generates reports for specify7
independent container
Concluding Notes
You can use ‘AWS Certificate Manager’ or the tool ‘CertBot’ for setting up HTTPS
SSL/TLS certificates. Who will need to setup a domain name through AWS Route 53.
You will probably want to setup an elastic ip address for the EC2 instance through AWS
as well. Here are the instructions for using certbot:
sudo apt install -y certbot python3-certbot-apache;
sudo mkdir /var/www/archireef;
sudo certbot --webroot -w /var/www/archireef -d <YOUR_EC2_DOMAIN_NAME> certonly;
certbot certificates;
vim spcloudservers.json # change line to `https: true`
make;
sudo docker compose up -d;
sudo docker exec -it specifycloud-nginx-1 nginx -s reload; # Maybe needed
You might want to extend the time of your ssh connection: SSH Client: vim ~/.ssh/config
Host *
ServerAliveInterval 20
#TCPKeepAlive no
SSH Server: sudo vim /etc/ssh/sshd_config
ClientAliveInterval 1200
ClientAliveCountMax 3
Then run sudo systemctl reload sshd