Skip to content

Monitoring stack

main source code is here This will mostly serve as a complimentary guide that will go more in depth and hopefully provide step by step instructions for how to set it up from scratch. It will also hopefully insight into some of the design decisions that were made and why they were made.

outline of steps:

  1. OCI setup
    1. Create a VM (monitoring node)
    2. Configure VM on OCI
    3. Open ports from VM to bastion
  2. Configure the domain (cais.club)
  3. Configure the bastion node (node with data source)
    1. install and setup docker
    2. install weka-mon
    3. install slurm Prometheus exporter
    4. install Victoria metrics
  4. Configure monitoring node
    1. setup nginx
    2. setup https with certbot
    3. setup backend (expressjs)
  5. Configure Google workspace
  6. Continue Configuring monitoring node
    1. deploy test site
    2. deploy docs.cais.club
    3. install and setup grafana
  7. re-test everything one last time
  8. drink up a nice glass of hot cocoa with marshmallows