Automate log cleanup for GDPR: the Sentry case

With the General Data Protection Regulation (GDPR) enforced by European Union logs have to be cleaned regularly to delete IP addresses and other information about visitors. This can be interpreted as a way to protect an emerging and discussed right, the right to be forgotten.

This new regulation is impacting every automated log system out of there. Since Sentry is a good open source error monitoring software* and it’s widely used, this guide will show how to clean Sentry logs on Linux systems according to GDPR using the sentry cleanup command line utility.

Set a time limit for logs

Before starting discover the maximum time limit a log can be kept according to the service policy you’re working on.

In the below examples, the max time a log can be kept is 26 months, one of the sizes proposed by Google Analytics on cleanup settings.

A 26 months limit for stored logs in sentry are set like this:

env SENTRY_CONF='/usr/local/etc/sentry' sentry cleanup --days 749

where /usr/local/etc/sentry is the directory where config.yml and sentry.conf.py are located or

env SENTRY_CONF='/usr/local/etc/sentry' sentry cleanup --days 749 --project 5

where 5 is the id of the project you can find in Project settings > Client Keys (DSN) as the very last part of the DSN path (always an integer number).

749 days are calculated like this:

30 days × 26 month = 780 days – 31 days = 749

31 days are a margin to safely delete logs the same day of each month.

Apparently, sentry cleanup needs to be root to access to postgres user and thus all sentry database tables so we have to put it on the cron for root.

Schedule the cleanup

  1. Login as root with su – or sudo bash
  2. crontab -e
  3. add a command line like this
. /usr/local/etc/virtualenvs/sentry/bin/activate && env SENTRY_CONF='/usr/local/etc/sentry' sentry cleanup --days 758 --project 5 && deactivate

leading dot . is an alternative for source available on /bin/sh (environment of cron) and not only by /bin/bash. This avoid to set the environment variable SHELL=’/bin/bash’ on crontab.

The resulting cron entry would be:

20 3 28 * * . /usr/local/etc/virtualenvs/sentry/bin/activate && env SENTRY_CONF='/usr/local/etc/sentry' sentry cleanup --days 749 --project 5 && deactivate

It isn’t a bad idea to add a fallback cleanup command the day after, so if you forget to cleanup logs for a specific project it will be done automatically:

20 3 29 * * . /usr/local/etc/virtualenvs/sentry/bin/activate && env SENTRY_CONF='/usr/local/etc/sentry' sentry cleanup --days 749 && deactivate

Now even your Sentry logs are GDPR compliant. The power of this method is that you can set a different cleanup limit for every project, according to its policies. And you haven’t to use any proprietary software to do this, just free/libre open source software.

If you are in a hurry to publish privacy policies and you have a dedicated hosting, give a try to JournaKit legalazy on GitHub.

* Plus it’s written on top of Django.

Advertisements

Mass delete old email on Gmail preserving Special and Tagged ones

To mass delete old emails on Gmail type this search query in the search box of mail.google.com (or Gmail for Business):

after:2017/01/01 before:2017/31/12 -has:userlabels -is:starred

You can use these filters in any language but remember to use the YYYY/DD/MM format for the data (Year/Day/Month) for the after and before filters.

This search will show you all emails between January, 1st and December, 31st 2017 that:

  • Haven’t any User Label
  • Aren’t starred (without Star)

Change dates according to the time period you want to cover and select the select all checkbox inside the header to select all items from the Gmail dashboard.

Optionally, you can select them all using the dedicated link that appears after the step above.

These two criteria are usually enough to don’t delete important e-mails but you can add more exclusion criteria adding a minus sign before any new filter, e.g. unread. However, if you don’t use Stars and Labels you have to double-check email in the list before deletion to prevent to delete useful data.

This approach is very useful in these two scenarios:

  • To free space on the Gmail mailbox when it’s almost full.
  • To delete old emails to comply with regulations like GDPR at the end of their usable life.

Happy houseworks!