diff --git a/blog/pageview-time-experiment-2019-08-19.markdown b/blog/pageview-time-experiment-2019-08-19.markdown new file mode 100644 index 0000000..ba4dff5 --- /dev/null +++ b/blog/pageview-time-experiment-2019-08-19.markdown @@ -0,0 +1,72 @@ +--- +title: Pageview Time Experiment +date: 2019-08-19 +--- + +# Pageview Time Experiment + +My blog has a lot of content in a lot of diverse categories. In order to help me +decide which kind of content I should publish next, I have created a very +simple method to track pageview time and enabled it for all of my blogposts. I'll +go into detail of how it works and potential risks of it below. + +The high level idea is that I want to be able to know what kind of content has +people's attention for the longest amount of time. I am using the time people +have the page open as a particularly terrible proxy for that value. I wanted to +make this data anonymous, simplistic and (reasonably) public. + +## How It Works + +Here is how it works: + +![A diagram on how this works](/static/img/pageview_flowchart.png) + +When the page is loaded, a [javascript file records the start time](/static/js/pateview_timer.js). +This then sets a [pagehide handler](https://developer.mozilla.org/en-US/docs/Web/API/Window/pagehide_event) +to send a [navigator beacon](https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon) +containing the following data: + +- The path of the page being viewed +- The start time +- The end time recorded by the pagehide handler + +This information is asynchronously pushed to [`/api/pageview-timer`](https://github.com/Xe/site/blob/91d7214b341088edba7a37a83a753e75ed02d7ad/cmd/site/pageview.go) +and added to an in-memory prometheus histogram. These histograms can be checked at +[`/metrics`](/metrics). This data is not permanently logged. + +## Security Concerns + +I believe this data is anonymous, simplistic and public for the following reasons: + +I believe this data is anonymous because there is no way for me to correlate users +to histogram entries, nor is there a way for me to view all of the raw histogram +entries. This site records the bare minimum for what I need in order to make sure +everything is functioning normally, and all data is stored in ephemeral in-memory +containers as much as possible. This includes any logs that my service produces. + +I believe this data is simplistic because it only has a start time, a stop time +and the path that is being looked at. This data doesn't take into account things +like people leaving a page open for hours on end idly, and that could skew the +numbers. The API endpoint is also fairly unprotected, meaning that falsified data +could be submitted to it easily. I think that this is okay though. + +I believe this data is public because I have the percentile views of the histograms +present on [`/metrics`](/metrics). I have no reason to hide this data, and I do not +intend to use it for any moneymaking purposes (though I doubt it could be to begin +with). + +I fully respect the [do not track](https://allaboutdnt.com) header and flag in browsers. +If [`pageview_timer.js`](/static/js/pateview_timer.js) detects the presence of +do not track in the browser, it stops running immediately and does not set the pagehide +handler. If that somehow fails, the server looks for the presence of the `DNT` header +set to `1` and instantly discards the data and replies with a 404. + +Like always, if you have any questions or concerns please reach out to me. I +want to ensure that I am creating useful views into how people use my blog +without violating people's rights to privacy. + +I intend to keep this up for at least a few weeks. If it doesn't have any practical +benefit in that timespan, I will disable this and post a follow-up explaining how +I believe it wasn't useful. + +Thanks and be well. diff --git a/static/img/pageview_flowchart.png b/static/img/pageview_flowchart.png new file mode 100644 index 0000000..99b1ee0 Binary files /dev/null and b/static/img/pageview_flowchart.png differ