Ch-ch-ch-ch-changes
Tim Wienk —
My (new) website has been online for a few months, I'm fairly satisfied with how it turned out and therefore want to explain how this website came to be, covering reasons for and technical details of some decisions.
Curriculum vitae
Every now and then you are reminded of your CV, either by yourself or because of people around you looking for a job and/or updating theirs. In most cases I decide that I should update mine as well and then in the end I don't. However, this time, maybe because many things are (looking to be) changing around me in the near future, I got myself to actually do it.
My outdated CV wasn't actually as bad as I expected. What I found was a document with a half finished LaTeX layout, using the default (not very pretty) Latin Modern font, and work experience dating back a few years. After redoing parts of the CV, I finished the layout and managed to update its contents, too.
Next up, I wanted to store my CV on my personal website for easy access. Great idea, except my website looked... "old". It hardly contained any relevant information and it didn't even look decent.
My previous website was originally built with Mark "keeto" Obcena's Raccoon framework, which was very interesting at the time, but after moving things around servers, my website ended up being stripped down and served by an old test installation of the platform we develop at the company I work at: not ideal.
Since I was now happy with my "new" CV, I decided to use that as base concept for a new website. The website didn't have to look great, it just had to look decent enough, be simple, and most importantly it should be care free.
Care free
Because of my history maintaining my own website, I wanted my new website to be simple and to not require any maintenance other than adding content (just in case I want to do that). However, I still wanted to host it myself, so I don't have to trust others to handle things right.
This had me thinking for a bit.
It ruled out in advance any third party services or using something experimental, requiring regular security updates or something that could be unstable otherwise. It basically meant I could not use most pieces of website creation/hosting software out there.
I concluded that what I needed was a static website. Luckily having a static website doesn't mean it can't be generated by something less static, and there happens to be a decent amount of software out there to choose from.
A static website would provide and easy and efficient website, without needing anything special that could potentially break. As long as the web server works, the website works, which is great since I have other reasons to adminsiter and maintain my servers anyway, so it's basically zero effort.
All I needed to do now was to find static website generating software that suits my needs and wishes for flexibility.
Pelican
After searching and comparing, I settled for a Python based static website generator called Pelican. It seems rather stable and decently organised, it's very extensible and is relatively easy to use and configure.
The way it implements multiple languages for pages and articles works pretty nicely and is flexible enough to fit in templates the way I like it, which is important because I want my website to have its content both in English and Dutch.
Installation
My web server has Python 2.7 installed with PIP 1.5.6, the versions
available in Debian Jessie. Installing the python-pip
package does the
trick if it isn't installed yet.
To get started with Pelican I created a virtualenv and then installed the necessary Python packages:
python -m virtualenv ~/website/env . ~/website/env/bin/activate pip install pelican Markdown
The in Debian Jessie available version of PIP does show two compilation errors when installing, while newer versions of PIP silence them. The errors are not important and won't affect you.
The situation here is that Jinja2 has some optional features that use
syntax only valid in Python 3.5+, because all files are precompiled
during installation, the files containing these features
(asyncsupport.py
and asyncfilters.py
) generate errors.
Generating the website
I set up the website in ~/website/srv
(more about the configuration
below). When the configuration is done and content is available,
generating the website is nothing more than calling pelican
from
inside the virtualenv.
. ~/website/env/bin/activate
cd ~/website/srv
pelican
However, since I don't want to have to remember to activate the
virtualenv and since I like to be able to generate the website from any
directory I happen to be working in, I set up a simple run
script.
#!/home/tim/website/env/bin/python # -*- coding: utf-8 -*- import os import sys import pelican if __name__ == '__main__': directory = os.path.dirname(os.path.realpath(__file__)) if directory != os.getcwd(): os.chdir(directory) sys.argv[0] = 'pelican' sys.exit(pelican.main())
To generate the website, I can now just call:
~/website/srv/run
Configuration
After the installation, the initial configuration is very easily done
with pelican-quickstart
:
mkdir ~/website/srv
cd ~/website/srv
pelican-quickstart
I answered "no" to questions about extra scripts, and was left with the following files:
content
(directory)output
(directory)pelicanconf.py
publishconf.py
I did not see a reason for a separate publish configuration in my case,
so I added the settings from publishconf.py
to pelicanconf.py
and
got rid of the former.
The documentation about the configuration is extensive and very good and teh configuration is rather logical by itself. You can find my configuration on Github, but I'll go into details about it here as well (mostly in case I do end up needing a reminder myself - but perhaps it helps someone else as well).
To start off I created a plugins
directory in addition to the existing
directories and set the following base settings:
AUTHOR = 'Tim Wienk' SITENAME = 'Tim.Wienk.name' SITEURL = 'https://tim.wienk.name' RELATIVE_URLS = False PATH = 'content' PLUGIN_PATHS = ['plugins'] OUTPUT_PATH = 'output/' DELETE_OUTPUT_DIRECTORY = True TIMEZONE = 'Europe/Amsterdam' DEFAULT_LANG = 'en' DEFAULT_DATE = 'fs' DEFAULT_DATE_FORMAT = '%Y-%m-%d'
I wanted the website to be in both English and Dutch and I decided that I did not care for a lot of Pelican's default pages.
It seemed most logical to me to just separate the website in an en
and
an nl
section, have all pages as Markdown files in their respective
section directories, and I wanted a place to store "private" data for
use with potential plugins.
I also figured it would be easiest to have all pages (and articles)
saved as index.html
files in their own output directories. If I wanted
to, I could then easily rewrite paths in the web server configuration
later to strip the /
and /index.html
suffixes.
To achieve these things, I added the following configuration:
PAGE_PATHS = ['en', 'nl'] ARTICLE_PATHS = ['en/articles', 'nl/articles'] STATIC_PATHS = [''] STATIC_EXCLUDES = ['data'] ARTICLE_URL = '{lang}/articles/{slug}' ARTICLE_SAVE_AS = '{lang}/articles/{slug}/index.html' ARTICLE_LANG_URL = ARTICLE_URL ARTICLE_LANG_SAVE_AS = ARTICLE_SAVE_AS DRAFT_URL = '{lang}/articles/{slug}/draft.html' DRAFT_SAVE_AS = '{lang}/articles/{slug}/draft.html' DRAFT_LANG_URL = DRAFT_URL DRAFT_LANG_SAVE_AS = DRAFT_SAVE_AS PAGE_URL = '{lang}/{slug}' PAGE_SAVE_AS = '{lang}/{slug}/index.html' PAGE_LANG_URL = PAGE_URL PAGE_LANG_SAVE_AS = PAGE_SAVE_AS ARCHIVES_SAVE_AS = '' AUTHOR_SAVE_AS = '' INDEX_SAVE_AS = '' AUTHORS_SAVE_AS = '' CATEGORY_SAVE_AS = '' CATEGORIES_SAVE_AS = '' TAG_SAVE_AS = '' TAGS_SAVE_AS = ''
Next, I did not want to have to have to configure more per-page data than necessary, but I did want full control over it.
I disabled some options for categories and I hoped to be able to fetch
all relevant data from the path using the PATH_METADATA
setting. Of
course not everything turned out to be available this way, so there is
some extra metadata configured, too.
USE_FOLDER_AS_CATEGORY = False DISPLAY_CATEGORIES_ON_MENU = False DEFAULT_PAGINATION = False DEFAULT_CATEGORY = 'article' SLUG_SUBSTITUTIONS = ( ('&', 'and'), ) PATH_METADATA = '(?P<lang>[a-z]{2})/(?:articles/(?P<date>\d{4}\d{2}\d{2})\.)?(?P<slug>.*)\.[a-z]{1,4}' EXTRA_PATH_METADATA = { 'en/index.md': {'save_as': 'en/index.html', 'url': 'en'}, 'nl/index.md': {'save_as': 'nl/index.html', 'url': 'nl'}, 'en/error/400.md': {'save_as': 'error/400.html', 'status': 'hidden'}, 'en/error/401.md': {'save_as': 'error/401.html', 'status': 'hidden'}, 'en/error/403.md': {'save_as': 'error/403.html', 'status': 'hidden'}, 'en/error/404.md': {'save_as': 'error/404.html', 'status': 'hidden'}, 'en/error/410.md': {'save_as': 'error/410.html', 'status': 'hidden'}, 'en/error/500.md': {'save_as': 'error/500.html', 'status': 'hidden'}, }
Obviously, all of this just working would have been too easy. I ran into a problem where Pelican would error when generating pages without a date in the path, even though it's an optional item in the regular expression.
To work around this problem, I monkey patched the relevant function in
the run
script I created to wrap the website generation. (Edit - I
recently submitted this patch to the Pelican project as well.)
import os import re import pelican def parse_path_metadata(source_path, settings=None, process=None): metadata = {} dirname, basename = os.path.split(source_path) base, ext = os.path.splitext(basename) subdir = os.path.basename(dirname) if settings: checks = [] for key, data in [('FILENAME_METADATA', base), ('PATH_METADATA', source_path)]: checks.append((settings.get(key, None), data)) if settings.get('USE_FOLDER_AS_CATEGORY', None): checks.append(('(?P<category>.*)', subdir)) for regexp, data in checks: if regexp and data: match = re.match(regexp, data) if match: # .items() for py3k compat. for k, v in match.groupdict().items(): k = k.lower() # metadata must be lowercase if k not in metadata and v is not None: if process: v = process(k, v) metadata[k] = v return metadata pelican.readers.parse_path_metadata = parse_path_metadata
With the basics working, I went through the (pretty extensive) collection of plugins. I added "neighbors," for next/previous article functionality, and "touch", so files have a relevant modified date (useful for browser caches). I created my own theme, creatively named "theme" and set some Markdown, Jinja2, feed and sitemap settings.
For sitemap generation, the projects page and support for HTML sections I did some more specific work:
Sitemap
To generate a sitemap, I took the "sitemap" plugin and changed a few things to add more differentiation for the reported change frequencies and priorities based on the type of page or file.
Github contributions
An additional feature I wanted, to make the "projects" page look a bit more interesting, was a list of actual Github project contributions, rather than just a list of projects or a projects' activity. For this I created a simple "githubcontributions" plugin, which fetches (and caches) the relevant data.
The reason for the cache is that I ran into a situation with inactive repositories, where the first call won't return any data and a next call would have to wait a few seconds, making fetching data for all projects is really slow, which in turn makes generating the website really slow.
HTML sections
I created a simple "customsection" extension for Markdown, loosely based on an existing "sections" extension, to wrap content in sections and do some heading renumbering.
PLUGINS = ['neighbors', 'githubcontributions', 'touch', 'sitemap'] THEME = 'theme' THEME_STATIC_DIR = '' MARKDOWN = { 'extensions': ['extra', 'headerid', 'meta', 'plugins.markdown_customsection'], 'extension_configs': { 'codehilite': { 'linenums': False, 'guess_lang': False, 'css_class': 'code', }, }, 'output_format': 'xhtml5', } JINJA_ENVIRONMENT = { 'extensions': ['jinja2.ext.i18n'], } FEED_ALL_ATOM = 'atom.xml' CATEGORY_FEED_ATOM = None AUTHOR_FEED_ATOM = None AUTHOR_FEED_RSS = None TRANSLATION_FEED_ATOM = '%s/atom.xml' SITEMAP = { 'format': 'xml', 'extra_files': ['media/cv/timwienk-cv-nederlands.pdf', 'media/cv/timwienk-cv-english.pdf'] }
All that remains is some other configuration that is either used by plugins or in the theme templates. For completeness:
from datetime import date SOCIAL = ( ('GitHub', 'https://github.com/timwienk', '/timwienk'), ('LinkedIn', 'https://www.linkedin.com/in/timwienk', '/in/timwienk'), ('Twitter', 'https://twitter.com/timwienk', '@timwienk'), ('Facebook', 'https://www.facebook.com/timwienk', '/timwienk'), ('Google+', 'https://plus.google.com/+TimWienk', '+timwienk'), ) GITHUB_USER = 'timwienk' FACEBOOK_USER = 'timwienk' TWITTER_USER = 'timwienk' FIRST_YEAR = 1989 LAST_YEAR = date.today().year GOOGLE_ANALYTICS = 'UA-7267499-1' GOOGLE_ANALYTICS_ID_SCRIPT = '/scripts/id'
Theme
Pelican themes consist of a collection of static files and Jinja2
templates. I found it very easy to set up, and since you can set
specific templates for specific pages, it's easy to incorporate
differences for "special" pages using the template extends
and block
features.
I decided against adding things like gettext to actually organise translations properly, since my website is rather small I don't expect to add more languages or translatable fixed text. Perhaps I will add that in the future, but for now just doing language dependent things in the template will do.
I don't think it helps to go in-depth about the theme, you can find the theme on Github if you're interested in the details.
Apache HTTP Server
To serve my website I stuck with the Apache HTTP Server. Even though I do like Nginx's approach and it's proven itself very fast and efficient, I didn't think these aspects were very important in my case, and since I already had the Apache HTTP Server running on my webserver for other projects, the choice wasn't very hard.
Most of the required configuration was just setting up a VirtualHost with the correct settings:
<VirtualHost *:443> ServerName tim.wienk.name ServerAdmin webmaster@localhost DocumentRoot /srv/www/tim.wienk.name/http ErrorLog ${APACHE_LOG_DIR}/error.log CustomLog ${APACHE_LOG_DIR}/access.log combined SSLEngine on SSLHonorCipherOrder on SSLUseStapling on SSLCertificateFile /srv/acme/certs/tim.wienk.name.crt SSLCertificateKeyFile /srv/acme/private/tim.wienk.name.key BrowserMatch "MSIE [2-6]" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0 BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown
I generally try to store data in their FHS defined locations, it keeps
everything clear and interoperable. That being said, in this case
/srv/www/tim.wienk.name/http
is a symbolic link to
/home/tim/website/srv/output
to keep things in my home directory.
For SSL I opted to use the Let's Encrypt ACME setup, although I do have the certificate management separated from the webserver as much as possible.
As part of the Pelican setup I generated a bunch of simple error documents, I configured those here as well:
ErrorDocument 400 /error/400.html ErrorDocument 401 /error/401.html ErrorDocument 403 /error/403.html ErrorDocument 404 /error/404.html ErrorDocument 410 /error/410.html ErrorDocument 500 /error/500.html
Next I also wanted to add some extra headers for good measure:
X-Content-Type-Options and X-Frame-Options for all pages, and a
Link
header to the "home page" to explain non-human visitors what's
going on:
<IfModule mod_headers.c> Header always set Link "<https://tim.wienk.name/>; rel=canonical; hreflang=x-default" "expr=%{REQUEST_URI}=='/'" Header always append Link "<https://tim.wienk.name/>; rel=alternate; hreflang=x-default" "expr=%{REQUEST_URI}=='/'" Header always append Link "<https://tim.wienk.name/en>; rel=alternate; hreflang=en" "expr=%{REQUEST_URI}=='/'" Header always append Link "<https://tim.wienk.name/nl>; rel=alternate; hreflang=nl" "expr=%{REQUEST_URI}=='/'" Header always set X-XSS-Protection "1; mode=block" Header always set X-Content-Type-Options "nosniff" Header always set X-Frame-Options "DENY" </IfModule>
And lastly I added rewrite rules so all pages are accessible without /
or /index.html
suffix, with a special condition based on the
Accept-Language
header for the home page and a few extra redirects in
case people try to access content without language prefix:
RewriteEngine on RewriteCond %{HTTP:Accept-Language} ^nl [NC] RewriteRule ^/?$ /nl [R=302,QSA,L] RewriteRule ^/?$ /en [R=302,QSA,L] RewriteRule ^/about/?$ /en/about [R=301,QSA,L] RewriteRule ^/articles/?$ /en/articles [R=301,QSA,L] RewriteRule ^/contact/?$ /en/contact [R=301,QSA,L] RewriteRule ^/cv/?$ /en/cv [R=301,QSA,L] RewriteRule ^/projects/?$ /en/projects [R=301,QSA,L] RewriteRule ^/(.*)/(index.html)?$ /$1 [R=301,QSA,L] RewriteCond /srv/www/tim.wienk.name/http/%{REQUEST_URI}/index.html -f RewriteRule ^/(.*)$ /$1/index.html [QSA,L] </VirtualHost>
The end result is what you see now, a working, simple website. There still isn't a lot of content, but perhaps that will come in the future, at least I now have the option without having to worry about the website itself too much.