I've posted some notes on the computer-science aspects of reporting rounded numbers in financial statements and hoping for the totals to add up correctly. You've doubtless seen footnotes on things that say "Totals may not add due to rounding." Can that be avoided? Here's a detailed examination of the question.
I have posted a detailed set of notes (PDF file) describing the theory behind my Black Swan Suite, detailing the endless chase of Elmer and Daffy across Penrose, pinwheel, and other nonperiodic tilings of the plane. Fans of music and computational geometry may find the document interesting. At the very least, it was fun to typeset.
In January of 2011, I had recently arrived at the University of Manitoba to work as a postdoc with Stephane Durocher. One of the first things he asked me to do was find out how many cycles there are in an n-dimensional hypercube graph, for general n. At the time, he and I both assumed that that meant spending maybe half an hour in the library looking up the answer.
Since then it's been more than three years; thousands of lines of computer code and months of CPU time; we added two more co-authors; we didn't solve the original problem, and didn't completely solve the other problem that we ended up working on, either; but here's a journal paper, anyway, and I think it's pretty interesting. The official definitive version will be free to access until mid-December and then will go behind Elsevier's paywall; the authors' accepted manuscript on arXiv is permanently accessible.
Read the paper if you're interested in the math details; in this entry I'm going to try to tell the story behind the paper, in relatively non-mathematical terms. I'm hoping it'll be of interest to my Web log readers and give you some idea, because I often get asked this, of what I actually do at work.
The text below pretty much speaks for itself. Bold highlighting and numbered footnotes in [square brackets] are mine; all the rest is as I received it. Some irregularities of spacing and punctuation, visible in the original email, aren't obvious in the HTML. Names of the students are redacted because (after finding several more copies on the Web) I imagine the students are relatively innocent victims of bad advice. Name of the institution not redacted because I hope others who receive such letters and look for them on the Web will be able to easily find this posting.
Back in 2009, I posted some notes I'd prepared on hypergeometric tail inequalities to this Web site - mostly so that I could find them easily myself, should I need to in the future. In the years since, that set of unpublished notes has become one of my most-cited works. I'm not sure how I feel about that, but whatever; I have stuck it on arXiv to make future citations easier and increase the chance that they'll spell my name right.
I just got back from a trip to multiple conferences in Ontario, and that makes it a good time to update my publications page. Most people interested in my academic work are likely to find out about it from other sources, but I'm going to post some brief and relatively non-technical introductions here as well for my general Web site readers. The official versions of these papers are mostly behind paywalls, but there are also unofficial "preprint" versions available to all.
Here are the slides (PDF) and an audio recording (MP3, 25 megabytes, 54 minutes) from a talk I gave today about one of my research projects. You'll get more out of it if you have some computer science background, but I hope it'll also be accessible and interesting to those of my readers who don't. I managed to work in Curious George, Sesame Street, electronics, XKCD, the meaning of "truth," and a piece of software called ECCHI. I plan to distribute the "Enhanced Cycle Counter and Hamiltonian Integrator" publicly at some point in the future. Maybe not until after the rewrite, though.
Abstract for the talk:
It is a #P-complete problem to find the number of subgraphs of a given labelled graph that are cycles. Practical work on this problem splits into two streams: there are applications for counting cycles in large numbers of small graphs (for instance, all 12.3 million graphs with up to ten vertices) and software to serve that need; and there are applications for counting the cycles in just a few large graphs (for instance, hypercubes). Existing automated techniques work very well on small graphs. In this talk I review my own and others' work on large graphs, where the existing results have until now required a large amount of human participation, and I discuss an automated system for solving the problem in large graphs.
I've just released the first packaged version of IDSgrep, which is an implementation of the ideas I posted last month about Ideographic Description Sequences. It's meant to bring the user-friendliness of grep to kanji dictionaries. Compiling it will require the usual Unix tools, and using it effectively will require a copy of KanjiVG, but you can look at the screenshot of it in action on the SF.JP site.
It'd be really nice if I could publish a paper about this. I'm looking at some academic-type computer science conferences, but it might actually be more on-topic for the more industrial or open-source type of meeting. If any readers have suggestions on what might be a good venue, I'd like to hear them.
I encountered an interesting problem on the Tsukurimashou project recently, and some inquiries with friends confirmed my suspicion that if anyone has solved it, they've done it in a language-specific way for ridiculous languages. It appears I have to solve it again myself. Here are some notes.
Suppose you want to draw a curve with an Etch-a-Sketch. The idea is that you can move the pointer up, down, left, or right, by a precisely controlled amount, but you aren't coordinated enough to turn both knobs at once in a precise way to create a diagonal or curved line. So you want to approximate your desired curve, with one made up of stair-steps; a piecewise linear curve made up of segments that each go in a direction chosen from a small finite set of directions (in this case, the four directions up, down, left, and right, but I'm interested in allowing any arbitrary small finite set of directions).
How can you do this so as to get the best possible approximation?
Anyone who's studied computer science should be familiar with the Landau asymptotic notation, or "big-oh" notation. But how well do you really know it?
Given two functions it seems intuitive that one at least one ought to be big-oh of the other; in some cases maybe they are both big-oh of each other, in which case they are theta of each other and in a certain sense can be said to be equivalent. It sure feels like a total order, with a place for everything and everything in its place, and it is a total order on all the kinds of functions we normally care about. I've certainly marked down plenty of students for incorrectly claiming that n log n can't be compared either way to things like n and n2.But today I had occasion to wonder if it's really true that O() induces a total order on all functions for which it is applicable. I looked on the Net and couldn't find anyone discussing this issue; I did find this archived final exam in which someone asked it of students as a true/false question and apparently was looking for the answer "true." Unfortunately, the correct answer is "false."
Counterexample below the cut. If you've got a computer science background you might want to think about it a bit yourself before going further. It shouldn't really be hard if you look carefully at the definition.
Here's the comment thread for my Bonobo Conspiracy archive posting.
In 2005 through 2008 I wrote and posted a Web comic called Bonobo Conspiracy. I posted a new strip with a new joke every single day for just over 1000 days, chronicling the lives, thwarted romances, and mad science of Matt, Sun-Moon, Dr. Klaun, Algea, and a host of special guests. Although most strips were designed to stand on their own, I also gradually developed each character over the course of the run, and I ran a few multi-strip specials and storylines.
Here are a few notes on the current state of my life.
Very soon I'll be taking down and packing my main desktop computer. Although I'll still be able to read and write plain text email and make Web log postings after that, I won't really have full connectivity again until early January. In particular, you should not expect me to be able to see anything sent as an attached file, such as photos or video clips. That means you, Mom.
It's not like I don't have enough projects to work on already. Nonetheless, I had an idea I thought was pretty cool, and I'm going to at least describe it here, whether I end up implementing it or not.
Okay, so: kerning. If you're setting type, you need to know where to put each glyph in relation to the previous one. In the old old days, it was easy because each glyph came on a little metal block and you'd just set them right next to each other and clamp them in place. But a computer (and, earlier, a phototypesetting machine) has the opportunity to make a decision. And if you just have a fixed bounding box for each glyph and set them side by side, you run into problems when you have situations like a capital A next to a capital V. "AV". Using the bounding boxes that would be correct for those letters in other contexts, you end up putting too much space between them. You need to squeeze them together so that part of the V projects into the space that the A, if it were next to something else, would reserve for itself. This squeezing together is called "kerning."
I wrote before about the writing style analysis toy; at that time I said the "blogosphere" wasn't ready for such technology, and I still believe that, but I recently did something sort of related that might interest you, and the stakes are a little lower, so I'm going to share it here.
The thing is, in my novel draft, there are 45 chapters, and some of them are deliberately written in different styles from others. I thought it'd be interesting to see if I could detect that statistically. I apologize for not posting the actual text here - you'll have to wait until the book is published - but I'll at least give you the raw numbers to play with and walk you through the analysis.
I've reached agreement in principle with people at the University of Manitoba for me to go there as a postdoctoral researcher. Details are still to be worked out, but it looks like that'll be my next career step. At least until I learn enough Japanese to pass the entrance exams for TouDai, thereby fulfilling a childhood promise.
Hi! I'm a scientific researcher. I have a PhD in computer science. My doctoral dissertation is mostly about the mathematical background of "similarity search." That means looking at things to find other things they are similar to. I've travelled the world to present my work on similarity search at scientific conferences - and some very smart people with very limited funds chose to use those funds to pay for me to do that.
Argument from authority has its limitations, but I would like to make very clear: I am an expert in the specific area of how computers can answer questions of the form "Which thing does this thing most resemble?" Gee, why would I mention this right now?
I just got back from a week in Sweden at ACL 2010. It went pretty well. I presented my paper, which you can read online as a PDF; I didn't get a lot of response or questions right at the presentation, but I said what I wanted to say and at least they didn't throw things, and I'm told there was a lot of interest in it offline. In one of the workshops I actually found several people who wanted to talk to me about my dissertation research, and it'd be really cool if I could somehow redeem some of the years of work I put into that, so that's good. I took photos and some may end up posted here eventually.