Samuel Tardieu @ rfc1149.net

There must be a better way

,

Since I now use Jekyll to generate this web site, I had to find a way to convert tag names into nice ASCII-only-lowercase symbols. For example, Free Software would become free-software and Éducation would become education.

One solution I came up with is a slugify filter which uses the unicode ruby gem. After converting the string to lower case and decomposing æ and œ to ae and oe respectively, it uses the unicode normalization form KD which separates individual characters from accentuation marks as shown in this figure. Then only plain ASCII letters are kept, spaces are replaced by hyphens, and the string is reassembled.

# -*- coding: utf-8 -*-
module Slugify

  require 'unicode'

  def slugify(input)
    t = Unicode::nfkd(input.downcase.gsub('æ', 'ae').gsub('œ', 'oe'))
    t.gsub(/[^\w\s-]/, '').gsub(/[\s-]+/, '-').downcase
  end

end

and

{% assign tn = '{{ tag | slugify }}'{% assign t = '{{ tag }}' %}

This way, I can link to the tag page using <a href="/blog/tag/{{ tn }}">{{ t }}</a> without fearing that some software chokes on the URL. It works well and I am now satisfied with this function, so I removed the questions that were there in previous instances of this post. The only thing I dislike is the double downcase call, due to the fact that some entities cannot be downcased without knowing more about the used language.

Edit: updated to match the name and behaviour of Django’s slugify as per Ricardo Buring comment with an additional “æ” to “ae” and “œ” to “OE” translations.

blog comments powered by Disqus