Featured image of post Self-Hosting Privacy Analytics: From Matomo to Plausible

Self-Hosting Privacy Analytics: From Matomo to Plausible

Complete guide for self-hosting Matomo and Plausible with Docker. Learn how to deploy, configure and transition to lighter, GDPR-compliant web tracking.


❯  My personal need of web analytics

Web analytics consists of gathering, measuring, and analyzing data concerning user behavior on a website. Measuring this data is crucial for companies building products used by end customers, to monitor product popularity, usage peaks, measure the impact of new features, etc. - the specifics strongly depending on the audience and the type of product developed.

This is not my case: I have a blog, and I simply want to keep an eye on the number of visits. My goal is to get a rough idea of my blog’s popularity, see which articles are trending, and have some basic insights about the browsers used by visitors, their nationality, and so on.

When looking at the different solutions for conducting such analyzes, the undisputed leader is, by far, Google Analytics. However, as expressed in this article, this solution is losing market share, notably due to fundamental incompatibility with personal data protection regulations that are becoming the norm worldwide. Google Analytics (4) was even banned in some jurisdictions because it didn’t comply with rules established by the CNIL regarding GDPR between 2020 and 2023. In parallel, several GDPR-compliant solutions are continuously gaining popularity. One of the most popular is Matomo: a comprehensive alternative to Google Analytics.

I won’t go into too many details about Matomo, since I’m definitely not an expert in the domain, and my use case is extremely basic: having an overview of the number of visitors on my website. I decided to adopt it for two major reasons: its GDPR compliance, which makes Matomo one of the rare solutions that doesn’t require displaying a cookie banner on the website; and the possibility to self-host it. In addition to my passion concerning the self-hosting of services, it adds another layer to guarantee good control over how visitors’ data is effectively used.

In the rest of this article, I will detail how to install Matomo and make it work on your website. Then I’ll explain why I decided to look for alternatives to Matomo and the reasons that led me to consider, then adopt, Plausible as my new solution. Finally, I’ll provide instructions to make it work on your website.


❯  Matomo

According to their official website, Matomo is the “Google Analytics alternative that protects your data and your customers’ privacy.” It’s used by 1 million websites across the world and is recognized as GDPR-compliant by the Commission Nationale de l’Informatique et des Libertés (CNIL), an independent French administration that ensures companies respect personal data privacy laws.

The company behind Matomo offers both a SaaS solution that companies can directly use, and a self-hostable “on-premise” solution. I chose the latter option. It’s available in multiple formats: as a web server to install on a dedicated server, as a WordPress plugin, or via Docker. I’ll focus on the Docker implementation since that’s what I used. Specifically, I’ll detail how to install with Docker Compose, using MariaDB as the database. Note that I deployed this on a Raspberry Pi 5 running Raspbian OS Lite 64-bit.


❯  Installation

Here is the corresponding docker-compose.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
networks:
  matomo_default:

services:
  matomo-db:
    container_name: "matomo-db"
    environment:
      - "MARIADB_AUTO_UPGRADE=1"
      - "MARIADB_DISABLE_UPGRADE_BACKUP=1"
      - "MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}"
      - "LANG=C.UTF-8"
    expose:
      - "3306/tcp"
    image: "mariadb:latest"
    networks:
      - "matomo_default"
    restart: "always"
    volumes:
      - "matomo-db:/var/lib/mysql"

  matomo-app:
    container_name: "matomo-app"
    image: "matomo:latest"
    networks:
      - "matomo_default"
    environment:
      - "MATOMO_DATABASE_HOST=matomo-db"
    ports:
      - "8080:80/tcp"
    restart: "always"
    volumes:
      - "matomo-data:/var/www/html"

volumes:
    matomo-db:
    matomo-data:

Additionally, you need to create a .env file containing a strong root password for MariaDB:

1
MYSQL_ROOT_PASSWORD=a_super_strong_and_secret_password

Alternatively, you may want to bind volumes to local folders on your host. In this case, you need to assign the correct ownership for the folder used by Matomo. If you want to use a folder named matomo-data, execute this command:

1
sudo chown -R www-data:www-data matomo-data

To launch the stack, simply run:

1
docker compose up -d

Next, configure your DNS to point the public domain name you’ve chosen for Matomo to the port 8080 of the host where Matomo is running (via DNS entries, port forwarding on your router, and configuring your reverse proxy). If you’re not familiar with this specific part of the setup, I suggest checking a previous article I wrote with detailed instructions.

I strongly recommend doing the entire Matomo setup phase by accessing it through this public address. This will prevent you from having to manually change the code snippet that Matomo will provide at the end of the setup and from having to manually edit PHP config files to whitelist your domain. Even though this isn’t a huge amount of extra work, trust me, you want to keep it as simple as possible!

First step of the Matomo setup
First step of the Matomo setup

Follow the configuration steps, which will ask you to fill in the database host (prefilled with matomo_db), the database username (root), the corresponding password (the value from your .env file), database prefix (matomo_), the adapter (PDO/MYSQL), and the database engine (MariaDB).

Second step of the Matomo setup, asking to fill database information
Second step of the Matomo setup, asking to fill database information

On the next screen, you’ll create your admin credentials (username, password, and email address). Then, provide a label and the website address that you want to measure. Finally, Matomo will share the code snippet that you’ll need to include in your website, typically in the header section.

Last step of the Matomo setup funnel, including the code snippet to include to the website to track
Last step of the Matomo setup funnel, including the code snippet to include to the website to track

Add this snippet to the header of your website pages, and after deployment, Matomo should be able to detect your website and start measuring its traffic.

If you’ve configured Content Security Policy (CSP) headers on your website, you have one more step to complete: you need to add some rules to these headers to ensure Matomo functions properly:

1
2
3
4
5
6
7
8
script-src-elem
    ...
    https://matomo.laromierre.com/matomo.js
    https://matomo.laromierre.com/index.php

connect-src
    ...
    https://matomo.laromierre.com/

I recommend checking this article if you want more information on CSP headers and how to configure them.


❯  Addendum: Including Matomo in Your Privacy Policy

Depending on your country’s legislation, having a privacy policy section ranges from recommended to mandatory. If you chose Matomo instead of Google Analytics-like solutions, it’s probably because you care about your users’ personal data. Therefore, being transparent about which analytics solution you’re using and how personal data might be used is a logical extension of this approach.

I recommend taking inspiration from Matomo’s privacy policy to draft your own, and reading Matomo’s guide on creating such a privacy notice. You can also check this page to include opt-in/opt-out functionality, similar to what exists on Matomo’s own privacy policy page. In short, this involves adding the following code snippet to your privacy policy page:

1
2
<div id="matomo-opt-out"></div>
<script src="https://my-matomo-site.org/index.php?module=CoreAdminHome&action=optOutJS&div=matomo-opt-out></script>\

Replace my-matomo-site.org with the public address of your Matomo instance.


❯  Moving from Matomo to an alternative

Matomo is an excellent solution. After trying it for a few months, I appreciated its reliability and efficiency for analytics. However, I found it somewhat overkill for my modest needs, which are extremely basic: knowing how many people visit my pages and which pages they visit. That’s all - I’m not interested in collecting more information about visitors, and I want to minimize the data I collect to the strict minimum. Following this logic, I looked for alternatives that better fit my needs, with some specific requirements: a lightweight solution, ideally suited for Docker instances, Docker Swarm-friendly, and ridiculously easy to set up on both server and client sides. After researching various existing solutions, I found what I was looking for: Plausible.


❯  The choice of Plausible

Plausible’s tagline, according to their website, is “Easy to use and privacy-friendly Google Analytics alternative.” This catchphrase already matches two of my criteria. Additionally, Plausible is an open-source project available on GitHub and self-hostable, as we’ll see in the next section. It’s also simple to use, remarkably fast, and lightweight (less than 1 KB, according to their GitHub project description). Thanks to its simplicity, it goes even further than Matomo in terms of user-friendliness by not collecting any cookies at all. I suggest looking at on this interesting comparison between Plausible and Matomo for more information (while acknowledging that it’s made by the Plausible team, so it may be somewhat biased 🙂).


❯  Installation and configuration


❯  Back-end side

Let’s start from the Plausible community edition project’s GitHub page. If you want to deploy Plausible with Docker Compose, simply follow the instructions in the repository’s README: clone the repository, create a .env file containing the public URL from which Plausible will be accessible and a secret key, and create a compose.override.yml file to specify the ports to be exposed: 80 if you plan to use a reverse proxy like Traefik, or 443 to expose the service directly. I strongly recommend configuring Traefik if you don’t have it yet—it’s definitely the right approach when exposing services publicly on the internet.

I also recommend renaming compose.yml to docker-compose.yml (the default filename for Docker Compose projects), which simplifies the command syntax for managing the stack (eliminating the need to specify -f compose.yml when starting, stopping, pulling, updating, etc.). Also, using an override file is overkill since it’s only used to add exposed ports: you can update the docker-compose.yml file directly.

If you apply these advices, the corresponding file will typically look like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
services:
  plausible_db:
    image: postgres:16-alpine
    restart: always
    volumes:
      - db-data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=postgres
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      start_period: 1m

  plausible_events_db:
    image: clickhouse/clickhouse-server:24.3.3.102-alpine
    restart: always
    volumes:
      - event-data:/var/lib/clickhouse
      - event-logs:/var/log/clickhouse-server
      - ./clickhouse/logs.xml:/etc/clickhouse-server/config.d/logs.xml:ro
      - ./clickhouse/ipv4-only.xml:/etc/clickhouse-server/config.d/ipv4-only.xml:ro
      - ./clickhouse/low-resources.xml:/etc/clickhouse-server/config.d/low-resources.xml:ro
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD-SHELL", "wget --no-verbose --tries=1 -O - http://127.0.0.1:8123/ping || exit 1"]
      start_period: 1m

  plausible:
    image: ghcr.io/plausible/community-edition:v2.1.5
    restart: always
    ports:
      - "80:80"
    command: sh -c "/entrypoint.sh db createdb && /entrypoint.sh db migrate && /entrypoint.sh run"
    depends_on:
      plausible_db:
        condition: service_healthy
      plausible_events_db:
        condition: service_healthy
    volumes:
      - plausible-data:/var/lib/plausible
    ulimits:
      nofile:
        soft: 65535
        hard: 65535
    environment:
      - TMPDIR=/var/lib/plausible/tmp
      - BASE_URL=${BASE_URL}
      - SECRET_KEY_BASE=${SECRET_KEY_BASE}
      - HTTP_PORT=80

volumes:
  db-data:
  event-data:
  event-logs:
  plausible-data:

Run the stack with:

1
docker compose up -d

You can now access the dashboard at http://<host-ip>. I suggest configuring access to the same dashboard via its public domain over HTTPS at this point.

The first Plausible setup funnel screen displayed when we reach the dashboard for the first time
The first Plausible setup funnel screen displayed when we reach the dashboard for the first time

❯  Using an Existing PostgreSQL Database

If you want to use a PostgreSQL database that’s already running in your environment, add the following line to the Plausible environment entries in your docker-compose.yml:

1
- DATABASE_URL=postgres://${DB_USER}:${DB_PASSWORD}@${DB_HOST}:${DB_PORT}/${DB_DATABASE_NAME}

Then add these five environment variables with values corresponding to your PostgreSQL configuration:

1
2
3
4
5
DB_HOST=XXX
DB_PORT=XXX
DB_USER=XXX
DB_PASSWORD=XXX
DB_DATABASE_NAME=XXX

You’ll need to create the user, password, and database by connecting to your PostgreSQL instance. For example, with DB_USER=plausible_user, DB_PASSWORD=plausible_password, and DB_DATABASE_NAME=plausible_database, execute these SQL commands in psql:

1
2
3
4
5
CREATE USER plausible_user WITH PASSWORD 'plausible_password';
CREATE DATABASE plausible_database;
\c plausible_database
GRANT ALL PRIVILEGES ON DATABASE plausible_database TO plausible_user;
GRANT ALL PRIVILEGES ON SCHEMA public TO plausible_user;

Now, remove the plausible_db service from the docker-compose.yml file and launch the stack with docker compose up -d. Visit http://<host-ip> to verify that everything is working as expected.


❯  Deploying Plausible in a Docker Swarm Cluster

Good news—there are no major changes needed! The main requirement is that all swarm nodes where you want to deploy Plausible have access to the same shared folders. I’ve set up a Ceph cluster that provides a distributed file system accessible to all hosts on my local network and beyond (thanks to WireGuard, which also allows cloud instances to access this distributed file system).

Here’s the corresponding docker-stack.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
services:
  plausible_events_db:
    image: clickhouse/clickhouse-server:24.3.3.102-alpine
    user: 1000:1000
    volumes:
      - clickhouse-event-data:/var/lib/clickhouse
      - clickhouse-event-logs:/var/log/clickhouse-server
      - clickhouse-logs.xml:/etc/clickhouse-server/config.d/logs.xml:ro
      - clickhouse-ipv4-only.xml:/etc/clickhouse-server/config.d/ipv4-only.xml:ro
      - clickhouse-low-resources.xml:/etc/clickhouse-server/config.d/low-resources.xml:ro
    networks:
      - plausible_internal
    deploy:
      restart_policy:
        condition: any
        delay: 5s
        window: 10s
      resources:
        limits:
          cpus: '1'
          memory: 1G

  plausible:
    image: ghcr.io/plausible/community-edition:v2.1.5
    volumes:
      - plausible-data:/var/lib/plausible
    networks:
      - traefik_network
      - plausible_internal
    deploy:
      restart_policy:
        condition: any
        delay: 5s
        window: 10s
    environment:
      - TMPDIR=/var/lib/plausible/tmp
      - BASE_URL=${BASE_URL}
      - SECRET_KEY_BASE=${SECRET_KEY_BASE}
      - HTTP_PORT=80
      - DATABASE_URL=postgres://${DB_USER}:${DB_PASSWORD}@${DB_HOST}:${DB_PORT}/${DB_DATABASE_NAME}

volumes:
  clickhouse-event-data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /path/to/shared/volume
  clickhouse-event-logs:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /path/to/shared/volume
  clickhouse-logs.xml:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /path/to/shared/volume
  clickhouse-ipv4-only.xml:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /path/to/shared/volume
  clickhouse-low-resources.xml:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /path/to/shared/volume
  plausible-data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /path/to/shared/volume

networks:
  traefik_network:
    external: true
  plausible_internal:
    driver: overlay
    attachable: false
    internal: true

Note a few details in this configuration: First, replace /path/to/shared/volume with the path appropriate for your setup. Alternatively, you can use bind volumes instead. This is what I did initially during setup and testing, before switching to named volumes as shown in this docker-stack.yaml file.

I’ve defined two networks: traefik_network, which is also used by Traefik and allows it to be aware of Plausible to route incoming traffic; and plausible_internal, which allows plausible and plausible_events_db to communicate with each other but doesn’t allow inbound or outbound communications. Note that I didn’t mention PostgreSQL—in this example, PostgreSQL runs outside the Docker Swarm, and Plausible interacts with it in the standard way.

Last, to launch these services, execute the following command:

1
docker stack deploy -c docker-stack.yaml plausible

❯  Other cases

You may have noticed that the environment section in my Docker Compose file is much shorter than the one provided in the GitHub repository. This is mainly due to my very basic needs, which allowed me to simplify the configuration. Don’t hesitate to check their comprehensive wiki for a full list of Plausible’s capabilities!


❯  Configure Your Website on Plausible’s Dashboard

Back on the first Plausible setup funnel screen, with fields completed
Back on the first Plausible setup funnel screen, with fields completed

After a straightforward onboarding flow, Plausible will ask you to choose a label for your website (they ask for the website’s domain, but you can use any representative label) and the reporting timezone. On the second screen, you’ll be asked to choose which information you want to track. I recommend not checking everything by default: first, to align with the minimalist philosophy I’ve expressed throughout this article; and second, because most options require extra configuration. At least initially, I suggest sticking to the minimal configuration. Copy the code snippet provided at this step and click “Start collecting data.” The dashboard configuration is now complete, and we can move to the final step: front-end configuration.

Here is the code snippet shared at the end of the setup:

1
<script defer data-domain="${label}" src="${plausible_domain}/js/script.js"></script>
"Awaiting your first pageview ..." message, after having clicked on the button "Start collecting data"
"Awaiting your first pageview ..." message, after having clicked on the button "Start collecting data"

with ${plausible_domain} the domain of Plausible, and ${label} the label that you choose for your website to track. Note that the script referenced in this snippet will change depending on the optional measurements that you will choose.


❯  Front-end side

At this point, you’ve completed the hardest part of configuring Plausible for your website. The quickest way to get it working is to add the code snippet displayed by Plausible to the header section of your website. After deploying your changes, you’ll start receiving data about visits to your site.

Plausible and its community have created an incredibly wide collection of libraries for many types of websites, particularly to make integration cleaner and easier, and to facilitate activating the different options that I advised against enabling in the initial configuration. For my blog, I used plausible-hugo, and the configuration was indeed “dead simple”:

  • I added it as a Hugo module to my blog by adding:
1
2
3
[module]
   [[module.imports]]
   path = "github.com/divinerites/plausible-hugo"

and

1
2
3
4
[plausible]
    enable = true
    selfhosted_domain = "myplausible.example.com"
    domain = "my-domain-id"

to my config.toml. Note that the value of domain must match the label you chose for your website during configuration on the Plausible dashboard. Also, the selfhosted_domain value must match the domain of your self-hosted Plausible instance, without the https://.

  • In ran:
1
2
hugo mod init github.com/$pseudo-on-github/$repo-on-github
hugo mod get -u
  • I added:
1
{{ partial "plausible_head.html" . }}

to the header of my website, either by editing my theme or by creating a partial in my project.

Since I run my website on Netlify and have set up CSP headers, I needed to add a configuration item:

1
2
3
[plausible]
...
    proxy_netlify = true

and add two redirection rules:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
[[redirects]]
from = "/misc/js/script.js"
to = "https://plausible.io/js/script.js"
status = 200
force = true
log = true

[[redirects]]
from = "/misc/api/event"
to = "https://plausible.io/api/event"
status = 200
force = true
log = true

to my netlify.toml. These modifications prevent blocking by browsers and ad-blockers. Finally, I added the following CSP rules:

1
2
script-src-elem
    https://plausible.laromierre.com/js/script.file-downloads

and

1
2
connect-src
    https://plausible.laromierre.com

to authorize the execution of Plausible scripts.

Once these modifications are added, deploy your website and visit it to verify that your dashboard has successfully recorded your first visit!

Back on the Plausible dashboard, you should start to visualize the first visits
Back on the Plausible dashboard, you should start to visualize the first visits

❯  Conclusion

In this article, I’ve shared my journey from Matomo to Plausible for web analytics. While both solutions are excellent GDPR-compliant alternatives to Google Analytics, I found Plausible to be a better fit for my minimal needs. It’s lightweight, easy to set up, and collects only the data I actually need.

The configuration process, while requiring some technical knowledge, is straightforward for both the backend and frontend sides. Whether you’re running a simple Docker setup or a more complex Docker Swarm environment, Plausible can be adapted to your infrastructure.

If you’re looking for a privacy-friendly analytics solution that doesn’t overwhelm you with data you don’t need, Plausible might be the right choice for you too.