Ch-ch-ch-ch-changes

Tim Wienk — 2017-03-04

My (new) website has been online for a few months, I'm fairly satisfied with how it turned out and therefore want to explain how this website came to be, covering reasons for and technical details of some decisions.

Curriculum vitae

Every now and then you are reminded of your CV, either by yourself or because of people around you looking for a job and/or updating theirs. In most cases I decide that I should update mine as well and then in the end I don't. However, this time, maybe because many things are (looking to be) changing around me in the near future, I got myself to actually do it.

My outdated CV wasn't actually as bad as I expected. What I found was a document with a half finished LaTeX layout, using the default (not very pretty) Latin Modern font, and work experience dating back a few years. After redoing parts of the CV, I finished the layout and managed to update its contents, too.

Next up, I wanted to store my CV on my personal website for easy access. Great idea, except my website looked... "old". It hardly contained any relevant information and it didn't even look decent.

My previous website was originally built with Mark "keeto" Obcena's Raccoon framework, which was very interesting at the time, but after moving things around servers, my website ended up being stripped down and served by an old test installation of the platform we develop at the company I work at: not ideal.

Since I was now happy with my "new" CV, I decided to use that as base concept for a new website. The website didn't have to look great, it just had to look decent enough, be simple, and most importantly it should be care free.

Care free

Because of my history maintaining my own website, I wanted my new website to be simple and to not require any maintenance other than adding content (just in case I want to do that). However, I still wanted to host it myself, so I don't have to trust others to handle things right.

This had me thinking for a bit.

It ruled out in advance any third party services or using something experimental, requiring regular security updates or something that could be unstable otherwise. It basically meant I could not use most pieces of website creation/hosting software out there.

I concluded that what I needed was a static website. Luckily having a static website doesn't mean it can't be generated by something less static, and there happens to be a decent amount of software out there to choose from.

A static website would provide and easy and efficient website, without needing anything special that could potentially break. As long as the web server works, the website works, which is great since I have other reasons to adminsiter and maintain my servers anyway, so it's basically zero effort.

All I needed to do now was to find static website generating software that suits my needs and wishes for flexibility.

Pelican

After searching and comparing, I settled for a Python based static website generator called Pelican. It seems rather stable and decently organised, it's very extensible and is relatively easy to use and configure.

The way it implements multiple languages for pages and articles works pretty nicely and is flexible enough to fit in templates the way I like it, which is important because I want my website to have its content both in English and Dutch.

Installation

My web server has Python 2.7 installed with PIP 1.5.6, the versions available in Debian Jessie. Installing the python-pip package does the trick if it isn't installed yet.

To get started with Pelican I created a virtualenv and then installed the necessary Python packages:

python -m virtualenv ~/website/env
. ~/website/env/bin/activate

pip install pelican Markdown

The in Debian Jessie available version of PIP does show two compilation errors when installing, while newer versions of PIP silence them. The errors are not important and won't affect you.

The situation here is that Jinja2 has some optional features that use syntax only valid in Python 3.5+, because all files are precompiled during installation, the files containing these features (asyncsupport.py and asyncfilters.py) generate errors.

Generating the website

I set up the website in ~/website/srv (more about the configuration below). When the configuration is done and content is available, generating the website is nothing more than calling pelican from inside the virtualenv.

. ~/website/env/bin/activate
cd ~/website/srv
pelican

However, since I don't want to have to remember to activate the virtualenv and since I like to be able to generate the website from any directory I happen to be working in, I set up a simple run script.

#!/home/tim/website/env/bin/python
# -*- coding: utf-8 -*-
import os
import sys
import pelican

if __name__ == '__main__':
    directory = os.path.dirname(os.path.realpath(__file__))
    if directory != os.getcwd():
        os.chdir(directory)

    sys.argv[0] = 'pelican'
    sys.exit(pelican.main())

To generate the website, I can now just call:

~/website/srv/run

Configuration

After the installation, the initial configuration is very easily done with pelican-quickstart:

mkdir ~/website/srv
cd ~/website/srv

pelican-quickstart

I answered "no" to questions about extra scripts, and was left with the following files:

content (directory)
output (directory)
pelicanconf.py
publishconf.py

I did not see a reason for a separate publish configuration in my case, so I added the settings from publishconf.py to pelicanconf.py and got rid of the former.

The documentation about the configuration is extensive and very good and teh configuration is rather logical by itself. You can find my configuration on Github, but I'll go into details about it here as well (mostly in case I do end up needing a reminder myself - but perhaps it helps someone else as well).

To start off I created a plugins directory in addition to the existing directories and set the following base settings:

AUTHOR = 'Tim Wienk'
SITENAME = 'Tim.Wienk.name'
SITEURL = 'https://tim.wienk.name'
RELATIVE_URLS = False

PATH = 'content'
PLUGIN_PATHS = ['plugins']
OUTPUT_PATH = 'output/'
DELETE_OUTPUT_DIRECTORY = True

TIMEZONE = 'Europe/Amsterdam'
DEFAULT_LANG = 'en'
DEFAULT_DATE = 'fs'
DEFAULT_DATE_FORMAT = '%Y-%m-%d'

I wanted the website to be in both English and Dutch and I decided that I did not care for a lot of Pelican's default pages.

It seemed most logical to me to just separate the website in an en and an nl section, have all pages as Markdown files in their respective section directories, and I wanted a place to store "private" data for use with potential plugins.

I also figured it would be easiest to have all pages (and articles) saved as index.html files in their own output directories. If I wanted to, I could then easily rewrite paths in the web server configuration later to strip the / and /index.html suffixes.

To achieve these things, I added the following configuration:

PAGE_PATHS = ['en', 'nl']
ARTICLE_PATHS = ['en/articles', 'nl/articles']
STATIC_PATHS = ['']
STATIC_EXCLUDES = ['data']

ARTICLE_URL = '{lang}/articles/{slug}'
ARTICLE_SAVE_AS = '{lang}/articles/{slug}/index.html'
ARTICLE_LANG_URL = ARTICLE_URL
ARTICLE_LANG_SAVE_AS = ARTICLE_SAVE_AS
DRAFT_URL = '{lang}/articles/{slug}/draft.html'
DRAFT_SAVE_AS = '{lang}/articles/{slug}/draft.html'
DRAFT_LANG_URL = DRAFT_URL
DRAFT_LANG_SAVE_AS = DRAFT_SAVE_AS
PAGE_URL = '{lang}/{slug}'
PAGE_SAVE_AS = '{lang}/{slug}/index.html'
PAGE_LANG_URL = PAGE_URL
PAGE_LANG_SAVE_AS = PAGE_SAVE_AS
ARCHIVES_SAVE_AS = ''
AUTHOR_SAVE_AS = ''
INDEX_SAVE_AS = ''
AUTHORS_SAVE_AS = ''
CATEGORY_SAVE_AS = ''
CATEGORIES_SAVE_AS = ''
TAG_SAVE_AS = ''
TAGS_SAVE_AS = ''

Next, I did not want to have to have to configure more per-page data than necessary, but I did want full control over it.

I disabled some options for categories and I hoped to be able to fetch all relevant data from the path using the PATH_METADATA setting. Of course not everything turned out to be available this way, so there is some extra metadata configured, too.

USE_FOLDER_AS_CATEGORY = False
DISPLAY_CATEGORIES_ON_MENU = False
DEFAULT_PAGINATION = False
DEFAULT_CATEGORY = 'article'

SLUG_SUBSTITUTIONS = (
    ('&', 'and'),
)

PATH_METADATA = '(?P<lang>[a-z]{2})/(?:articles/(?P<date>\d{4}\d{2}\d{2})\.)?(?P<slug>.*)\.[a-z]{1,4}'
EXTRA_PATH_METADATA = {
    'en/index.md': {'save_as': 'en/index.html', 'url': 'en'},
    'nl/index.md': {'save_as': 'nl/index.html', 'url': 'nl'},
    'en/error/400.md': {'save_as': 'error/400.html', 'status': 'hidden'},
    'en/error/401.md': {'save_as': 'error/401.html', 'status': 'hidden'},
    'en/error/403.md': {'save_as': 'error/403.html', 'status': 'hidden'},
    'en/error/404.md': {'save_as': 'error/404.html', 'status': 'hidden'},
    'en/error/410.md': {'save_as': 'error/410.html', 'status': 'hidden'},
    'en/error/500.md': {'save_as': 'error/500.html', 'status': 'hidden'},
}

Obviously, all of this just working would have been too easy. I ran into a problem where Pelican would error when generating pages without a date in the path, even though it's an optional item in the regular expression.

To work around this problem, I monkey patched the relevant function in the run script I created to wrap the website generation. (Edit - I recently submitted this patch to the Pelican project as well.)

import os
import re
import pelican

def parse_path_metadata(source_path, settings=None, process=None):
    metadata = {}
    dirname, basename = os.path.split(source_path)
    base, ext = os.path.splitext(basename)
    subdir = os.path.basename(dirname)
    if settings:
        checks = []
        for key, data in [('FILENAME_METADATA', base),
                          ('PATH_METADATA', source_path)]:
            checks.append((settings.get(key, None), data))
        if settings.get('USE_FOLDER_AS_CATEGORY', None):
            checks.append(('(?P<category>.*)', subdir))
        for regexp, data in checks:
            if regexp and data:
                match = re.match(regexp, data)
                if match:
                    # .items() for py3k compat.
                    for k, v in match.groupdict().items():
                        k = k.lower()  # metadata must be lowercase
                        if k not in metadata and v is not None:
                            if process:
                                v = process(k, v)
                            metadata[k] = v
    return metadata

pelican.readers.parse_path_metadata = parse_path_metadata

With the basics working, I went through the (pretty extensive) collection of plugins. I added "neighbors," for next/previous article functionality, and "touch", so files have a relevant modified date (useful for browser caches). I created my own theme, creatively named "theme" and set some Markdown, Jinja2, feed and sitemap settings.

For sitemap generation, the projects page and support for HTML sections I did some more specific work:

Sitemap

To generate a sitemap, I took the "sitemap" plugin and changed a few things to add more differentiation for the reported change frequencies and priorities based on the type of page or file.

Github contributions

An additional feature I wanted, to make the "projects" page look a bit more interesting, was a list of actual Github project contributions, rather than just a list of projects or a projects' activity. For this I created a simple "githubcontributions" plugin, which fetches (and caches) the relevant data.

The reason for the cache is that I ran into a situation with inactive repositories, where the first call won't return any data and a next call would have to wait a few seconds, making fetching data for all projects is really slow, which in turn makes generating the website really slow.

HTML sections

I created a simple "customsection" extension for Markdown, loosely based on an existing "sections" extension, to wrap content in sections and do some heading renumbering.

PLUGINS = ['neighbors', 'githubcontributions', 'touch', 'sitemap']
THEME = 'theme'
THEME_STATIC_DIR = ''

MARKDOWN = {
    'extensions': ['extra', 'headerid', 'meta', 'plugins.markdown_customsection'],
    'extension_configs': {
        'codehilite': {
            'linenums': False,
            'guess_lang': False,
            'css_class': 'code',
        },
    },
    'output_format': 'xhtml5',
}
JINJA_ENVIRONMENT = {
    'extensions': ['jinja2.ext.i18n'],
}

FEED_ALL_ATOM = 'atom.xml'
CATEGORY_FEED_ATOM = None
AUTHOR_FEED_ATOM = None
AUTHOR_FEED_RSS = None
TRANSLATION_FEED_ATOM = '%s/atom.xml'

SITEMAP = {
    'format': 'xml',
    'extra_files': ['media/cv/timwienk-cv-nederlands.pdf', 'media/cv/timwienk-cv-english.pdf']
}

All that remains is some other configuration that is either used by plugins or in the theme templates. For completeness:

from datetime import date

SOCIAL = (
    ('GitHub', 'https://github.com/timwienk', '/timwienk'),
    ('LinkedIn', 'https://www.linkedin.com/in/timwienk', '/in/timwienk'),
    ('Twitter', 'https://twitter.com/timwienk', '@timwienk'),
    ('Facebook', 'https://www.facebook.com/timwienk', '/timwienk'),
    ('Google+', 'https://plus.google.com/+TimWienk', '+timwienk'),
)
GITHUB_USER = 'timwienk'
FACEBOOK_USER = 'timwienk'
TWITTER_USER = 'timwienk'

FIRST_YEAR = 1989
LAST_YEAR = date.today().year

GOOGLE_ANALYTICS = 'UA-7267499-1'
GOOGLE_ANALYTICS_ID_SCRIPT = '/scripts/id'

Theme

Pelican themes consist of a collection of static files and Jinja2 templates. I found it very easy to set up, and since you can set specific templates for specific pages, it's easy to incorporate differences for "special" pages using the template extends and block features.

I decided against adding things like gettext to actually organise translations properly, since my website is rather small I don't expect to add more languages or translatable fixed text. Perhaps I will add that in the future, but for now just doing language dependent things in the template will do.

I don't think it helps to go in-depth about the theme, you can find the theme on Github if you're interested in the details.

Apache HTTP Server

To serve my website I stuck with the Apache HTTP Server. Even though I do like Nginx's approach and it's proven itself very fast and efficient, I didn't think these aspects were very important in my case, and since I already had the Apache HTTP Server running on my webserver for other projects, the choice wasn't very hard.

Most of the required configuration was just setting up a VirtualHost with the correct settings:

<VirtualHost *:443>
    ServerName tim.wienk.name
    ServerAdmin webmaster@localhost

    DocumentRoot /srv/www/tim.wienk.name/http

    ErrorLog  ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined

    SSLEngine on
    SSLHonorCipherOrder   on
    SSLUseStapling        on
    SSLCertificateFile    /srv/acme/certs/tim.wienk.name.crt
    SSLCertificateKeyFile /srv/acme/private/tim.wienk.name.key
    BrowserMatch "MSIE [2-6]" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0
    BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

I generally try to store data in their FHS defined locations, it keeps everything clear and interoperable. That being said, in this case /srv/www/tim.wienk.name/http is a symbolic link to /home/tim/website/srv/output to keep things in my home directory.

For SSL I opted to use the Let's Encrypt ACME setup, although I do have the certificate management separated from the webserver as much as possible.

As part of the Pelican setup I generated a bunch of simple error documents, I configured those here as well:

    ErrorDocument 400 /error/400.html
    ErrorDocument 401 /error/401.html
    ErrorDocument 403 /error/403.html
    ErrorDocument 404 /error/404.html
    ErrorDocument 410 /error/410.html
    ErrorDocument 500 /error/500.html

Next I also wanted to add some extra headers for good measure: X-Content-Type-Options and X-Frame-Options for all pages, and a Link header to the "home page" to explain non-human visitors what's going on:

    <IfModule mod_headers.c>
        Header always set Link "<https://tim.wienk.name/>; rel=canonical; hreflang=x-default" "expr=%{REQUEST_URI}=='/'"
        Header always append Link "<https://tim.wienk.name/>; rel=alternate; hreflang=x-default" "expr=%{REQUEST_URI}=='/'"
        Header always append Link "<https://tim.wienk.name/en>; rel=alternate; hreflang=en" "expr=%{REQUEST_URI}=='/'"
        Header always append Link "<https://tim.wienk.name/nl>; rel=alternate; hreflang=nl" "expr=%{REQUEST_URI}=='/'"
        Header always set X-XSS-Protection "1; mode=block"
        Header always set X-Content-Type-Options "nosniff"
        Header always set X-Frame-Options "DENY"
    </IfModule>

And lastly I added rewrite rules so all pages are accessible without / or /index.html suffix, with a special condition based on the Accept-Language header for the home page and a few extra redirects in case people try to access content without language prefix:

    RewriteEngine on

    RewriteCond %{HTTP:Accept-Language} ^nl [NC]
    RewriteRule ^/?$                       /nl             [R=302,QSA,L]
    RewriteRule ^/?$                       /en             [R=302,QSA,L]

    RewriteRule ^/about/?$                 /en/about       [R=301,QSA,L]
    RewriteRule ^/articles/?$              /en/articles    [R=301,QSA,L]
    RewriteRule ^/contact/?$               /en/contact     [R=301,QSA,L]
    RewriteRule ^/cv/?$                    /en/cv          [R=301,QSA,L]
    RewriteRule ^/projects/?$              /en/projects    [R=301,QSA,L]

    RewriteRule ^/(.*)/(index.html)?$      /$1             [R=301,QSA,L]

    RewriteCond /srv/www/tim.wienk.name/http/%{REQUEST_URI}/index.html -f
    RewriteRule ^/(.*)$                    /$1/index.html  [QSA,L]
</VirtualHost>

The end result is what you see now, a working, simple website. There still isn't a lot of content, but perhaps that will come in the future, at least I now have the option without having to worry about the website itself too much.