Latest revision as of 08:18, 3 March 2018

Project Overview

The 2018 data project is an ongoing effort to figure out how to set up "painless" dashboards for monitoring metrics.

Stage 1: Collecting System Data

See 2018/Data Project/Stage 1

Stage 2: Spy

dahak-spy project:

lightweight server (may want larger disk, okay if non-free)
running mongodb
running mongoexpress
running prometheus
running grafana
running netdata

additional components before real world testing:

netdata on the build node
netdata python plugin from another process, monitoring.....???
metrics:
- is snakemake running (binary yes/no)
- current stage of snakemake (adjust snakemake file to write into a dotfile)
- cpu/memory/network/disk io

netdata python plugin workflow?

does it need to be installed and netdata restarted, or can it push data into netdata?

real yeti:

get a yeti node
debug the snakemake file one step at a time using already-downloaded files (faster step)
let the snakemake file run with netdata and friends running

Flags

@@ Line 5: / Line 5: @@
 ==Stage 1: Collecting System Data==
-===COMPLETED Phase 1a: Netdata===
+See [[2018/Data Project/Stage 1]]
-First, we set up [[Netdata]].
+==Stage 2: Spy==
-* '''Pros:''' Netdata has a fantastic dashboard with all kinds of stuff all ready to go.
-* '''Cons:''' Netdata is custom-built for monitoring compute nodes, and not for general visualization.
-* [[Netdata]]
-* Link to Netdata scripts: https://charlesreid1.com:3000/data/netdata
-Netdata is a useful tool for monitoring an individual machine instance remotely and it works excellent.
+dahak-spy project:
+* lightweight server (may want larger disk, okay if non-free)
+* running mongodb
+* running mongoexpress
+* running prometheus
+* running grafana
+* running netdata
-===NOPE Phase 1b: Prometheus===
+additional components before real world testing:
+* netdata on the build node
-<s>Second, we set up [[Netdata]] to dump to a [[Prometheus]] database.
+* netdata python plugin from another process, monitoring.....???
-* '''Pros:''' Prometheus was fairly easy to integrate with Netdata.
+* metrics:
-* '''Cons:''' Prometheus was not a particularly outstanding tool, don't know much about how to use it.
+** is snakemake running (binary yes/no)
-* [[Prometheus]]
+** current stage of snakemake (adjust snakemake file to write into a dotfile)
+** cpu/memory/network/disk io
-Need to get more involved with Prometheus and/or Grafana to monitor more than one machine.</s>
-===COMPLETED Phase 2a: MongoDB and MongoExpress===
-We then set up [[MongoDB]] and [[MongoExpress]] in Docker containers. MongoDB listens for incoming data on the VPN. MongoExpress is connected to MongoDB and exposes a web interface to interact with MongoDB. We used MongoDB to store edit history and page graph data from the charlesreid1 wiki.
-Pros and cons:
-* '''Pros:''' MongoDB is a containerized solution with persistent data. MongoDB had (has?) a high setup barrier, but a low usage barrier. Very easy to do basic CRUD operations, make new databases as needed, etc.
-* '''Cons:''' No visualization tools baked in, need to define own tools. Collectd cannot dump to MongoDB because of a bunch of installation stupidity.
-Links:
-* [[MongoDB]]
-* [[MongoExpress]]
-* [[Pywikibot]]
-* Link to MongoDB docker files: https://charlesreid1.com:3000/docker/d-mongodb
-* Link to MongoExpress docker files: https://charlesreid1.com:3000/docker/d-mongoexpress
-* Link to wiki scraping scripts: https://charlesreid1.com:3000/wiki/charlesreid1-wiki-data
-===NOPE Phase 2b: Collectd===
-<s>We struggled a LOT with [[Collectd]], mainly because we wanted to use the collectd plugin to write to MongoDB. Unfortunately, this was the only plugin that seemed impossible to install.
-See [[Collectd]] page.
-(This is all installation stupidity. I tried installing collectd with aptitude, no plugins. Then the core, no plugins. Then installing from source, and MongoDB plugin did not work. Struggling to get collectd to link to MongoDB. Needed custom config or something. Then I just gave up, and re-installed collectd core, and the library was there, but it was complaining it couldn't find it. In the end, I totally abandoned the attempt to get collectd to talk to mongodb. Could probably use a collectd docker and fix this whole issue.)</s>
-===NOPE Phase 3a: Graphite===
-<s>Next, we deployed a [[Graphite]] container to hold time series from [[Collectd]].
-Pros and cons:
-* '''Pros:''' Containerized solution. Collectd graphite plugin worked fine.
-* '''Cons:''' Graphite comes with Carbon (web interface), which is utter awful. It provides the absolute bare minimum, and it looks like it's trapped in a miserable 1998 computer prison.
-Links:
-* [[Graphite]]
-* Link to Graphite docker files: https://charlesreid1.com:3000/docker/d-graphite</s>
-===NOPE Phase 3b: Visualizing Graphite===
-<s>A few years back we explored [[Cubism]] and Cube (difference?) as a way of visualizing time series from Graphite. It took some effort to get a basic dashboard, and Cubism is (ultimately) D3, the most frustratingly stupidly over-designed and over-complicated library ever, implemented in a totally irrational programming language.
-So, no.
-We're going to focus on Mongo, which is more transparent and more flexible for all purposes.</s>
-===Stage 1 Conclusion: Netdata, Not Collectd===
-All of the struggle to get collectd working with mongo was a waste of effort, and led to the graphite distraction in the first place. A broken build procedure (collectd) led to an unknown, mediocre tool (graphite).
-Ultimately, if we need to run collectd, interface with the collectd API via Python: https://collectd.org/wiki/index.php/Plugin:Python
-==Stage 2: Data Collection System==
-Re-investigate Prometheus. Arbitrary time windows (adds only), more powerful query syntax, more easily connects to Grafana.
-Problem we're solving: collect periodic data (frequency of ~ 1 Hz - 1/60 Hz), dump into time series database
-* Query data for visualization
-* Load data into Python for analysis
-* Visualization in Grafana
-* Flat, copy-able files
-* Easy to dump and restore
+netdata python plugin workflow?
+* does it need to be installed and netdata restarted, or can it push data into netdata?
+real yeti:
+* get a yeti node
+* debug the snakemake file one step at a time using already-downloaded files (faster step)
+* let the snakemake file run with netdata and friends running
 =Flags=

2018/Data Project: Difference between revisions

From charlesreid1

Latest revision as of 08:18, 3 March 2018

Contents

Project Overview

Stage 1: Collecting System Data

Stage 2: Spy

Flags