Monday 11 March 2013, 19:12
When I was preparing the Tsukurimashou 0.7 release, I
had to build the entire package several times from scratch, to verify that
all the necessary pieces really were included in what I was preparing to
ship. When I run the build on my development machine it normally re-uses a
lot of previously-built components, only updating the parts I have recently
changed. That kind of incremental compilation is one of the main functions
of GNU Make. But if I'm shipping a package for others to use, it has to
work on their systems which don't have a previous history of successful
builds; so I need to verify that it will actually build successfully in such
an environment, and verifying that means copying the release-candidate
package into a fresh empty directory on my own system and checking that the
entire package (including all optional features) can build there.
Tsukurimashou is a big, complicated package. It's roughly 92,000 lines
of code, which may not sound like so much. For comparison, the current
Linux kernel is about 15,000,000. Tsukurimashou's volume of code is roughly
equivalent to an 0.99 version of Linux (not clear which one - I couldn't
find numbers I trusted on the Web just now, and am not motivated to go
downloading old kernel sources just to count the lines). However, as
detailed in one of my earlier
articles, Tsukurimashou as a font meta-family is structured much
differently from an orthodox software package. Things in the Tsukurimashou
build tend to multiply rather than adding; and one practical consequence is
that building from these 92,000 lines of code, when all the optional
features are enabled, produces as many output and intermediate files and
takes as much computation as we might expect of a much larger package. A
full build of Tsukurimashou maxes out my quad-core computer for six or eight
hours, and fills about 4G of disk space.
So after a few days of building over and over, it occurred to me that I'd
really like to know where all the time was going. I had a pretty good
understanding of what the build process was doing, because I
created it myself; but I had no quantitative data on the relative resource
consumption of the different components, I had no basis to make even
plausible guesses about that, and quantitative data would be really useful.
In software development we often study this sort of thing on the tiny scale,
nanoseconds to milliseconds, using profiling tools that measure the time
consumption of different parts of a program. What I really wanted for my
build system was a coarse-grained profiler: something that could analyse the
eight-hour run of the full build and give me stats at the level of processes
and Makefile recipes.
I couldn't find such a tool ready-made, so I built one.
Monday 21 January 2013, 14:49
It's a very common pattern in the Han writing system that a character
will be made of two parts that are themselves characters, or at least
elements resembling characters, placed one above the other or one next to
the other. For instance, 音 (sound) can be split into 立 (stand up) above
日 (day); and 村 (village) can be split into 木 (tree) next to 寸 (inch).
This kind of structure can be nested, as in 語 (language).
One can do a sort of gematria with the meanings, (what exactly
is the deep significance of "village = tree + inch"?) but that's not the
direction I'm interested in going today.
Here's the thing: in the Tsukurimashou
project, these two ways of constructing characters each correspond to a
piece of code that's invoked many times throughout the system, and I thought
it would be interesting to look at how often the different parameter values
Saturday 19 May 2012, 10:58
A few days ago, I ran Arch Linux's update process and it pulled down and installed a new version of GIMP, version 2.8. This version incorporates some changes in the user interface which apparently were under development for a long time, but only very recently finally put into the "stable" distribution stream.
The one that interests me may appear on the surface to be very small, but it is and is meant to be a really significant shift in the entire definition of what GIMP is. GIMP used to be, as the name "GNU Image Manipulation Program" implies, an image editor. With version 2.8, GIMP has become an XCF file editor with the ability to read and write other formats.
Tuesday 13 March 2012, 11:04
I took the plunge and created an account on the world's worst Internet dating site. This is mostly so that I can participate in the KanjiVG project, which has decided to host there; git and thereby Github remain not my favourite systems. However, I've established a mirror of Tsukurimashou in my new Github space, so people who do like git and Github can find it there now too.
Saturday 4 February 2012, 09:27
I only have limited faith in software testing, partly because of my lack of faith in software engineering in general. Most professionally-written code is crap, and the more people use "methodologies," the worse their code seems to be. I'm inclined to think that the best way to remove bugs from code is to not put them in in the first place. Nonetheless, writing tests is fun. It's an interesting way to avoid doing real work, and some of you might enjoy reading about some test-related things I tried on a couple of my recent projects.
Wednesday 11 January 2012, 11:41
Not too long ago a free software project I'm peripherally involved in
decided it was time to replace its old and not broken version control system
with something new and broken, and the lead maintainer conducted a straw
poll of what the new system should be. My suggestion of "anything, as long
as it's not distributed" was shouted down by the chorus of "anything, as
long as it's distributed." Having lost the argument in that forum, I'm going
to post my thoughts on why distributed version control sucks here in my own
space where it's harder for me to be shouted down.
Monday 5 December 2011, 14:56
I encountered an interesting problem on the Tsukurimashou project recently, and some inquiries with friends confirmed my suspicion that if anyone has solved it, they've done it in a language-specific way for ridiculous languages. It appears I have to solve it again myself. Here are some notes.
Monday 4 April 2011, 10:09
I use the Alpine email software, which is successor to Pine. I mostly like it, but its implementation of "sort by subject" is broken and annoying.
It is documented that Alpine will strip "Re: " and variations from the start of a subject line before sorting, and that seems like something I would reasonably want: replies end up getting sorted with the things they are replies to, instead of all being grouped confusingly under "R". However, what is undocumented and unwelcome is that Alpine will also look for and remove strings enclosed in square brackets, which are typically used to identify mailing list messages. I subscribe to several mailing lists that identify themselves by square-bracketed tags at the start of the subject line while leaving the From: headers unchanged (messages are from the person who sent them instead of from the list). If subject sort worked, then as a natural consequence of how string sorting works, I could group all messages from the list together, sorted within the group by the rest of the subject. But because square-bracketed tags are silently ignored, I can't do that, and there is no way to group the mailing list messages together. There is no option to make subject sort sort on the actual subject, no really, the string that is in the Subject: header and not a munged version.
Fixed by deleting lines 4562 to 4565 of imap/c-client/mail.c in the Alpine 2.00 distribution, which check for square brackets and invoke mail_strip_subject_blob().
Monday 6 December 2010, 13:18
I was up until 3 this morning trying to figure out how to make OpenType glyph substitution work. That, in itself, is not news. Anyone who has tried to write substitution rules for OpenType fonts has probably gone through something similar. What is unusual, though, is that I not only succeeded, but also figured out the undocumented underlying principle so that I can predictably succeed in the future; as far as I can tell, the more usual practice is to just try things at random until one eventually either gets it working by accident, or gives up, without having learned anything useful either way.
The purpose of this entry is to provide the important information that I wasn't able to find on the Net and wish I had had. There is one important point I call the Terrible Secret, which makes all the difference to getting it to work; but rather than jump to that immediately I'm going to give the needed background first. I'll be using the terms that make sense to me, rather than the "easy" but uselessly vague simplified style used by all existing documentation I found.
Saturday 4 September 2010, 18:45
There are many things I like about the JED text editor, and for a number of years it has been my preferred editor for working on C code. However, it has a number of misfeatures that make it unacceptable for other tasks for which I need a text editor, so I have generally been using JED only for C code, and JOE for most other things (including, notably, English-language writing of both fiction and nonfiction in LaTeX and flat text). Just recently I had occasion to try to edit some C code on my laptop, which had a fresh default installation of JED, and it was a horrible experience, and I realized that I had, years ago, made a number of customizations to JED that I'd long since forgotten about.
For my own future reference, and anyone who might be facing a similar situation, here are some notes on changes I made. I decided while I was at it to try to not only bring the laptop's installation up to the desktop's standard so I could use it for C, but also fix as many as possible of the issues keeping me from using JED for other things on both installations, so that I could at least consider adopting it as my general editor instead of mostly using JOE. It remains to be seen whether JED will be able to serve as my all-purpose editor, but so far I've been liking it once I sorted out these issues.
Friday 18 June 2010, 09:41
So, you've got an audio signal contaminated with a continuous tone at about 233 Hz, with a really strong second harmonic at 465 and some others throughout the audio spectrum. It sounds like a swarm of angry bees and makes the main signal hard to listen to. You can notch it out - that means applying a filter that simply removes the frequencies in question - but since 465 Hz is right in the important part of the speech band, the result is going to sound really bad. Any simple frequency-notch filter that blocks the interference is going to destroy things you want to keep, too. Look around the Net these past few days and you can read a lot of broadcast audio people complaining about this issue.
Monday 5 April 2010, 13:19
Re-posting of an article first posted in May 2007.
Okay, here's a game sketch. This idea is supposed to be a game that
could live on a Web site somewhere, support a large number of players, but
be fun to participate in even if you are brand new, or only connect
occasionally, or if there are few or no other players. Kind of like
Wikipedia - except that my idea would actually know it's a game, unlike
Wikipedia which thinks it's an encyclopedia. I'm posting this here to make
it harder for anyone to patent.
Friday 2 April 2010, 19:16
Re-posting of an article first posted in September 2008.
You are an officer, say a commodore, in
the military-diplomatic-exploration organization of an interplanetary nation
with United Federation of Planets (UFP) membership. You've been tasked with
asserting your nation's interests with respect to a certain out-of-the-way
planet that happens to be rich in natural resources. Unfortunately, it's
already inhabited, by a race of disgusting natives we will call the Filthy