Kleknev: a coarse-grained profiler for build systems

Monday 11 March 2013, 19:12

When I was preparing the Tsukurimashou 0.7 release, I had to build the entire package several times from scratch, to verify that all the necessary pieces really were included in what I was preparing to ship. When I run the build on my development machine it normally re-uses a lot of previously-built components, only updating the parts I have recently changed. That kind of incremental compilation is one of the main functions of GNU Make. But if I'm shipping a package for others to use, it has to work on their systems which don't have a previous history of successful builds; so I need to verify that it will actually build successfully in such an environment, and verifying that means copying the release-candidate package into a fresh empty directory on my own system and checking that the entire package (including all optional features) can build there.

Tsukurimashou is a big, complicated package. It's roughly 92,000 lines of code, which may not sound like so much. For comparison, the current Linux kernel is about 15,000,000. Tsukurimashou's volume of code is roughly equivalent to an 0.99 version of Linux (not clear which one - I couldn't find numbers I trusted on the Web just now, and am not motivated to go downloading old kernel sources just to count the lines). However, as detailed in one of my earlier articles, Tsukurimashou as a font meta-family is structured much differently from an orthodox software package. Things in the Tsukurimashou build tend to multiply rather than adding; and one practical consequence is that building from these 92,000 lines of code, when all the optional features are enabled, produces as many output and intermediate files and takes as much computation as we might expect of a much larger package. A full build of Tsukurimashou maxes out my quad-core computer for six or eight hours, and fills about 4G of disk space.

So after a few days of building over and over, it occurred to me that I'd really like to know where all the time was going. I had a pretty good understanding of what the build process was doing, because I created it myself; but I had no quantitative data on the relative resource consumption of the different components, I had no basis to make even plausible guesses about that, and quantitative data would be really useful. In software development we often study this sort of thing on the tiny scale, nanoseconds to milliseconds, using profiling tools that measure the time consumption of different parts of a program. What I really wanted for my build system was a coarse-grained profiler: something that could analyse the eight-hour run of the full build and give me stats at the level of processes and Makefile recipes.

I couldn't find such a tool ready-made, so I built one.

Tsukurimashou 0.7

Thursday 7 March 2013, 08:36

I'm very happy to announce the release of version 0.7 of Tsukurimashou, my Japanese-language font project. That is a link to the release page for the source code package on SourceForge.JP; see also the complete list of downloadable files and the project home page. This has been almost nine months in the making, and as I said on Twitter, the yak hair is thick on the floor. Release notes below the cut.

Cycle counting: the next generation

Wednesday 30 January 2013, 18:44

Here are the slides (PDF) and an audio recording (MP3, 25 megabytes, 54 minutes) from a talk I gave today about one of my research projects. You'll get more out of it if you have some computer science background, but I hope it'll also be accessible and interesting to those of my readers who don't. I managed to work in Curious George, Sesame Street, electronics, XKCD, the meaning of "truth," and a piece of software called ECCHI. I plan to distribute the "Enhanced Cycle Counter and Hamiltonian Integrator" publicly at some point in the future. Maybe not until after the rewrite, though.

Abstract for the talk:

It is a #P-complete problem to find the number of subgraphs of a given labelled graph that are cycles. Practical work on this problem splits into two streams: there are applications for counting cycles in large numbers of small graphs (for instance, all 12.3 million graphs with up to ten vertices) and software to serve that need; and there are applications for counting the cycles in just a few large graphs (for instance, hypercubes). Existing automated techniques work very well on small graphs. In this talk I review my own and others' work on large graphs, where the existing results have until now required a large amount of human participation, and I discuss an automated system for solving the problem in large graphs.

Where do I draw the line?

Monday 21 January 2013, 14:49

It's a very common pattern in the Han writing system that a character will be made of two parts that are themselves characters, or at least elements resembling characters, placed one above the other or one next to the other. For instance, 音 (sound) can be split into 立 (stand up) above 日 (day); and 村 (village) can be split into 木 (tree) next to 寸 (inch). This kind of structure can be nested, as in 語 (language). One can do a sort of gematria with the meanings, (what exactly is the deep significance of "village = tree + inch"?) but that's not the direction I'm interested in going today. Here's the thing: in the Tsukurimashou project, these two ways of constructing characters each correspond to a piece of code that's invoked many times throughout the system, and I thought it would be interesting to look at how often the different parameter values are used.

The fundamental attribution error

Saturday 12 January 2013, 16:23

Here's a quote.

We see a sloppily-parked car and we think "what a terrible driver," not "he must have been in a real hurry." Someone keeps bumping into you at a concert and you think "what a jerk," not "poor guy, people must keep bumping into him." A policeman beats up a protestor and we think "what an awful person," not "what terrible training." The mistake is so common that in 1977 Lee Ross decided to name it the "fundamental attribution error": we attribute people’s behavior to their personality, not their situation.

今始まる物語

Wednesday 2 January 2013, 10:26

The title is a song lyric; it means "the story that starts now," and that's more or less where I feel I'm at. A lot has happened between mid-November and now, and I'm hoping that this will mark a boundary or change in the conditions around me.

Back to Slack?

Sunday 11 November 2012, 15:26

I've decided to stop using Arch Linux, because I believe in The Arch Way. I'm tempted to leave it at that, but more detail is below the cut.

New releases: OCR and Genjimon

Tuesday 2 October 2012, 10:15

I recently updated my OCR and Genjimon font packages, a process which included merging them into the Tsukurimashou Project's build system as what I'm calling "parasite" packages (IDSgrep has also become such). They now come included automatically (but not built by default) when you download the full Tsukurimashou package, or they can each be downloaded as a separate distributed package. Some bugs are fixed, and Genjimon has two new styles added, one of which is shown below.

[Genjimon Round Outline]

Having the OCR package listed as a download on the Sourceforge.JP site immediately boosted Tsukurimashou's rankings, because 15 or 20 people download it every day. I'm happy to have the added visibility, but I wish that visibility could be coming from popularity of the main Tsukurimashou project instead of this minor spinoff.

Head in the clouds

Tuesday 11 September 2012, 15:47

As part of my efforts to be ready for wherever my next employment takes me, I've shifted my email home. For a long time my usual practice has been for email to end up delivered to my home computer, which I log into remotely from wherever I am. The way I see it is that my personal email is mission-critical, and I don't want my email home to be on any computer I don't control, especially not one belonging to an employer or to Google. I have had content in my email subject to a court case before, the other side in that case wasn't able to interrupt my email because they had no right to and it was all routed through systems controlled by people who understood that, and I'd like to keep things that way.

Running my own email service requires my home computer to be accessible on the Net at all times, and I've now had a couple of adventures in which it or its Net connection stopped working while I was away from home and I had to switch to less useful backup systems. So, as of today, my email is now going to a leased server elsewhere on the Net. I can connect to it remotely from wherever I have a good connection, even if my home computer doesn't. This may be especially useful if, as seems quite possible, my current home computer goes into storage for a while and I end up spending a lot of time without an operational home computer of my own at any fixed location.

The fluorescent tube organ, part II

Monday 3 September 2012, 19:18

This is part II in a series. You can start from the beginning, and you can pick up a package of Qucs files to follow along on your own simulated workbench.