Colour, social beings, and undecidability

9 August 2004 - updated 13 May 2008
Tags for this page: 200408 200805 colour compsci law
[Site traffic Strip-O-Meter]

Click to censor the Strip-O-Meter.

Okay, it's been about two months since I posted my piece about colourful bits, and I really should have posted a follow-up before now, but better late than never.  First of all, here are ten other places that carried the story, in no particular order:

For the most part, it seems like most people liked what I had to say, and most of them understood it.  The reactions of the people who didn't understand what I was saying, or who understood at least part of it but didn't agree, were actually more interesting to me.  I drew a certain amount of criticism for referring to two classes of people as "lawyers" and "computer scientists".  It's true, I didn't mean all lawyers or all computer scientists.  Maybe if I had consistently referred to the viewpoints of law and computer science, instead of to the people holding those viewpoints, I would have given less offense.  I still think, though, that the two viewpoints I described are so widely-held among, and so characteristic of, lawyers and computer scientists, that those terms are worth using.

I heard the opinion expressed several times that "Colour" was the wrong thing to call what I'm talking about.  People said "Colour is actually the same thing as X other thing, and we already understand X, so this discussion is boring and useless." Well, the reason I don't think it's boring and useless is that everyone who said that had a different idea of what the X was that Colour should be the same thing as.

Colour is not the same thing as "ownership" - ownership of copyright privileges is one kind of Colour, and it's the one that Monolith was trying to play with, but I gave examples of other things I want to include in Colour.  The quality of randomly generated numbers that distinguishes them from the output of a psuedorandom number generator is one kind of Colour that has nothing to do with ownership.  Another would be the quality of some image files that makes them illegal to possess under section 163.1 of the Criminal Code (child pornography).

Colour is not the same thing as "metadata" - as I described in my essay, you can attach metadata to your data, but the metadata could easily be mistaken, lying, or rendered incorrect by changes elsewhere in the universe.  For instance, if you have a CD with the SCCM bits saying "this is the original copy", that's all well and good.  What happens when I use my CD burner to create a bit-for-bit identical copy?  Then I have a CD with the SCCM bits saying "this is the original copy", but it isn't.  Colour is arguably information that could be described in metadata, but it's not the same thing as metadata.  If you try to make it the same thing as metadata, for instance with a law against attaching false metadata to things, then you're right back where you started, because now you're worrying about the Colour of the metadata.  Yes, you can express the answer to the question "Is the metadata accurate?" by encoding it as the number 1 or 0, but as soon as you do so, I can ask the question "Is that 1 or 0 accurate?" Colour could be called the n-th level meta-metadata that's always just out of reach.

Someone proposed that Colour is the same thing as "history", and I think that's almost right.  Thinking about the concept after writing the essay I posted, I thought of one way to make the idea rigorous that seems to work mathematically.  The thing is, law doesn't deal with numbers.  Law is permanently embedded in the physical universe, and it deals with events - subsets of the entire space-time extent of the universe.  Some of those events are "persons", and they have especially important status which I'll mention later.  Anyway, when we talk about you posessing a file, we're not talking about the bits of the file, just the ones and zeroes.  We are talking about the subset of the universe associated with that file.  Maybe some magnetized rust on your hard drive, maybe a little chunk of silicon with electric charges on it, and so on.

As computer scientists, we lump events into equivalence classes based on a "numerical value" function.  My file on my hard drive is the same file as your file on your flash card which is the same file as his file on his backup tape, because they all have the same numerical value.  The interchangeability and indistinguishability of files with the same numerical value are not only built into computer science, but built into computers themselves.  It is an important feature of my motherboard that I can equally well plug a CD-ROM drive or a hard disk into the same IDE port and they'll be treated the same to the extent possible, and it is an important feature of my operating system that when I type "less" to view a file, it will display a file from the hard disk the same way as it would display one from the CD-ROM. Part of the purpose of the operating system is to apply the "numerical value" filtering function.

So computer science, for very good reasons, is almost exclusively a calculus of numerical values, which are homomorphic to events in the universe.  That's a math term; I mean it in its precise technical sense, but a non-technical approximation of its meaning would be that we can think about numerical values and come to conclusions that are correct about events in the universe too, without needing to think about the extra complication that is attached to events in the universe.

The trouble is, homomorphic images tend to throw out important details.  Computer science's "numeric value" image of the universe eliminates some uninteresting extra complication, but law is all about things under that "extra complication" heading.  Events in the universe have other attributes not included in their numerical values, and in particular, they have "causes" - other events with a special relationship to them.  The questions answered by Colour seem to be the same questions answered by the answer to "What events caused the event of this file's existence?" If you can describe those causative events, then you can answer the copyright, child porn, and randomness Colour questions rather neatly.

If we say that law is a calculus of events, with emphasis on the relation of "causality", then I think we have a description of law that bears some relation to what lawyers actually do, but which doesn't paint them as dangerous idiots from the computer-science point of view.  So I won't complain too loudly if you insist on saying "Colour is actually the same thing as history," "Colour is actually the same thing as cause," or "Colour is actually the same thing as origin."

Discussion of what I was or was not actually talking about raises another interesting Colour distinction:  not every sentence in that essay was something I actually believe.  Much of it was my characterization of one side or the other of a debate in which I'm claiming that neither side has the complete picture.  Some of my readers seemed to seize upon some of the sentences in the essay and argue against them without realizing that those sentences were not my complete position.

We could say that different parts of the essay had different Colours - some were "statements I agree with", some were "statements I agree with subject to limitations stated elsewhere", some were "statements I do not agree with", and so on; discussing it sensibly required the reader to recognize the Colour of each sentence, at least to some level.  I'd suggest that that's one example of Colour that is not adequately described by saying "Colour is cause, history, or origin" because the whole essay had essentially the same cause, history, and origin.  To make that definition fit you'd have to get into tricky questions of intent as to individual sentences, but maybe that's the way you have to go anyway.  It might be easier to say that Colour also includes some kind of element of "context".

The "Colour is actually the same thing as X" objection has a more elaborated form, though, and I found this one a little irritating because it exemplifies exactly the misunderstanding I was trying to clear up.  The line goes, "Colour is actually the same thing as X, and everyone knows that X exists and is important, except dangerous idiots.  Skala is slandering the computer scientists by calling them dangerous idiots who don't know about X!" I didn't hear this from anyone claiming to be a slandered computer scientist - only from people showing the behaviour of the people I referred to as "lawyers" in my essay.  As far as I could tell, they didn't consider that I might be saying the "lawyer" point of view was wrong; their annoyance was that I thought it was necessary to bother saying that the "lawyer" point of view was so obviously the only right one.  That's an interesting and useful rhetorical phenomenon - I managed to phrase my argument in such a way that people on both sides thought I was on their own side and attacking the others.

Well, yes, I said that computer scientists deny the existence of Colour.  But as I also said, that's not because there is something wrong with computer scientists.  Computer scientists deny the existence of Colour because, within the subset of the universe studied by computer science, Colour really does not exist.  They are neither mistaken nor lying.  Colour is not obviously real because it is NOT real; it's not something we could see if our vision were a little better; it is not something we can fake adequately; and you cannot import it from some other universe and expect it to continue to exist in the computer-science universe in any form you can understand.

A big part of my point is that computer scientists are not dangerous idiots; I am one myself, and we're right.  Yes, many computer scientists would do well to understand that our world of numbers isn't the only world, and that Colour exists in some other worlds; but really, the idiocy I think is more dangerous is the one that insists "Colour must exist in the computing world because it exists somewhere else, and the experts who say it doesn't exist in the computing world are just being difficult." Sorry, Colour does not exist in the computing world.  You must give up that dream.

But really, even if we could all agree on Colour, maybe that wouldn't be such a breakthrough.  An anonymous poster on LawMeme made a point I thought was insightful:  Colour is still the wrong question because the law isn't about files at all anyway.  The law is something like the "calculus of events" I mentioned above, but it's about persons - social beings - and questions of bits and what Colour they may be are just bookkeeping used to help answer the real questions, which are about persons.  Anonymous writes:

But to make any sense of law in general, you need to see social beings as the first class entities of the system, rather than the files which contain the works.  When viewed in this way, the Monolith discussion is a complete waste of time, because it discusses numerical properties of files, not the actions and intents of persons.  Thus it concentrates entirely on secondary entities, ignoring the primary.

Monolith said they'd created a file for which nobody could compute the Colour function so ha ha, the lawyers would all have to go away.  I said the lawyers could still evaluate the file's Colour because Colour is not a function.  But as Anonymous points out, there's a bigger problem with Monolith:  the lawyers don't even care what Colour the file is, because Colour is just their way of getting at the real questions, which are phrased in terms of social beings and their interactions.

I like the idea that law is a calculus of social beings and their interactions, because that makes a whole lot of seemingly knotty law-related questions just vanish.  Instead of arguing endlessly about whether a Web cache "reproduces" the file being cached or similar questions, we can focus on the social beings involved instead, and answer the real questions.

The trouble is, even the legal system does not necessarily always see itself in that light.  I think, in particular, of Subsection 163(5) of the Criminal Code, which is part of the Canadian obscenity law.  It says in its entirety, "For the purposes of this section, the motives of an accused are irrelevant." What does that mean?

I think what it means is that it's an attempt to get around the idea that the law is about social beings.  That wording (especially in context, considering other things that have been said about obscenity in Canada recently) says to me that the statute is trying to be primarily about the allegedly obscene material; sure, if it's obscene then maybe someone will go to jail over it, but the court is supposed to be focusing its attention on "is the material obsence", not the social-being questions of who and why.  This law is not supposed to be about social beings - it's trying hard to be about Colour; and as I mentioned in my original essay, supporters of obscenity laws also seem to want Colour to be a function, so that a given sequence of bits will always be either legal or illegal regardless of context, history, or other aspects of Colour that can't be expressed as functions.

Fortunately, the law doesn't stop being about social beings just because someone wrote a statute that says "this statute isn't about social beings".  We saw that in R. v.  Sharpe - the statute was written to exclude a social analysis, and the Justices turned around and said, "We're going to look at the social context anyway, because that's our job!" They rejected a pure Colour analysis of the stories, and didn't even consider the Colour-is-a-function analysis.

Now, since I've established that Colour isn't a function and that Colour isn't even the end of the story anyway, I'm not sure how much value there is in trying to make legal arguments by proving mathematical things about functions.  Monolith tried that and fell pretty flat.  Nonetheless I'd like to mention something I think is interesting just from a mathematical point of view, which is that the Bill C-12 "child pornography" function is uncomputable.  This may be relevant because as I mentioned, advocates of child pornography laws seem to be trying really hard to force the legal system to evaluate functions instead of considering social beings.  The trouble is, if the legal system is a function evaluator, then we can quite easily construct a function it can't evaluate, because as computer scientists know, all function evaluators have weak spots, and the weak spots are usually quite easy to find.

Here's how it works.  You write a story that is a legal thriller.  Someone gets arrested, goes to court, lawyers on both sides make their arguments, the judge hands down a decision, maybe there's a jury involved or a couple of appeals, you know the drill.  The point is that in the story, some legal question is solidly decided - but the story as written does not actually say what the final decision was.  It's easy to write a story like that.  This paragraph is one; I, or some better writer, could easily add some details to make it a lot longer and more thrilling.

Well, then you write a second story that refers back to the first one.  This one's an erotic story, with lurid sex scenes, but just like the legal-thriller story, it leaves something to the imagination, and what it leaves to the imagination is pretty important - like whether the participants are consenting, or how old they are.  That's not hard to do either - Alice can say to Bob, "I'll sleep with you only if you win your case." Or maybe Carol's age in years can be stated as being the same as the paragraph number Justice Dave cited - without actually saying what the number was.  The second story is linked to the outcome in the first story, in such a way that it becomes a lot more questionable (no consent, participants under 18, etc.) if the accused was finally acquitted.

Finally, you link them in the opposite direction by having the legal case involve the second story.  The accused was accused of possessing a story and here it is.  Sure, something like that would be difficult in the real world because it creates a chicken-and-egg problem, but in fiction it's no problem.  Even without invoking any time travel, you can attach an epilogue describing, after the fact, what a remarkable coincidence it was that the court case actually unfolded as described in the prequel to the allegedly obscene story at issue in the court case.

That is, of course, just a very rough sketch, but anyone used to thinking mathematically and reading complicated fictional plots should see that there are no huge barriers to implementing it.  The bottom line is that it's possible, and not even particularly difficult, to construct a piece of text the possession of which is illegal under Bill C-12 if and only is its possession is legal.  The "legal to possess" function would be impossible to evaluate on such an input.  This is just the kind of thing computer scientists do with Turing machines all the time.

Would it prove anything interesting to actually write that piece of text?  I think it probably wouldn't - because it would be instantly clear to anyone examining it seriously that it was a weird boundary case, and the court, which isn't required to behave like a Turing machine, would just discard the line of thinking that tries to answer the question inside the text, and would instead look at the social beings.  If I were defending the case they'd be asking "Okay, so why did Matthew write this anyway?" and as soon as they asked that, the question of whether I'm an evil child pornographer or not would become easy to answer.  Nonetheless, the supporters of Bill C-12 really seem to want it to be a function evaluated by a Turing machine.  If Bill C-12 were a function then it could be demolished by a counterexample like that one - a situation where the question cannot be answered.  Maybe if enough Conservatives want the court system to evaluate functions, then it will actually try to.  I'm scared to actually write the story because I don't think I can depend on the court system to escape from the Turing undecidability trap, and I think that's a shame.

Comments

Alun Jones from 199.73.1.1 at Thu, 30 Mar 2006 23:48:32 +0000:
It's not "metadata", that's "data about data".
It's more "paradata" - extrapolating from "paramilitary - like the military, but not actually a military", from the Greek meaning "beside".
It's a property of the data that is not expressible by the data itself. It can certainly be described by metadata, but of course, as with a label on the bottle, the metadata that aims to describe the data's paradata does not necessarily tell the truth. [Label a bottle of water as "poison" - does that turn the water into poison?]

I've also noticed that paradata - or colour - has another interesting property, that makes it stranger than quantum physics. Generate a random number. [Okay, scratch that. Generate a number randomly.] Is it equal to a number that has the "copyrighted" colour? No? Go back to the "Generate" step. If yes, then you have magically transferred the colour "copyrighted" to the randomly generated number. That colour wasn't there until you checked to see if the two numbers were equivalent.

Matthew Skala from 69.63.62.226 at Fri, 31 Mar 2006 00:47:14 +0000:
The term "metadata" is in current use in library science (and related fields) to describe data that describes other data. It may be attached to the data it describes, or stored separately. It may be that "paradata" is a better word for that, but too bad; the people who study the stuff have already decided what to call it, and they won't listen to you or me telling them to call it something else. Just try analyzing the etymology of "pedophile" and "homophobe" some time; neither word actually means what it means, if you see what I mean.

I draw a distinction between metadata, which is actual data stored somewhere, and what I'm calling Colour, which by definition isn't representable by bits. There are several reasons I call it Colour, by the way:

* Computer scientists, and other mathematicians, are accustomed to using colours like red and blue to describe abstract information about things. See, for instance, the "graph colouring" problem, which has thousands of applications very few of which have anything to do with visual perception of light wavelengths.

* Lawyers already use the word "colour" to describe something very similar to the concept I call Colour. See, for instance, concepts like "under colour of law".

* Colour instead of "color" is the Canadian spelling.

* Colour with a capital C instead of "colour" with a lowercase C, to underscore that I'm talking about a special abstract concept instead of literal colour. This is also a common convention in fields like math, and similar concepts (e.g. using nonstandard plural forms in Hebrew to indicate that you're using a word in a metaphoric instead of literal sense) go back more than two thousand years.

Paul from 63.161.128.197 at Sat, 03 Mar 2007 22:08:49 +0000:
I think you've highlighted the importance of the difference between the actual bits of a file and the ownership, history, or other relevant color involving the file. It really is a calculus of events. However I'm not sure it's a good idea to say that color is not information measured in bits, and that color is not metadata.

As a thought experiment, lets assume color does not have the same properties as information and as thus exists without being written down anywhere. Further, we are concerned with whether a stream of numbers is random generated from a high entropy source. Lastly, the color of a randomly generated number effects how we treat it.

As far as I can tell, every dicision we make regarding the randomly generated source can be arrived at by considering the metadata we have available about it. If there's absolutely no metadata about it, then even though it still has the same color as a high entropy source, we cannot treat it as such. We would make the same decisions without knowledge of the color of the randomly generated numbers. This contradicts the earlier assertion that color effects how we treat the numbers.

This argument can be extended to other things with color, such as ownership of MP3s and such, but fundamentally the bottom line is if metadata influences human behavior and "color that is not metadata" cannot, then "color that is not metadata" is a less useful concept. Perhaps we should consider color to be metadata.

Matthew Skala from 67.158.72.8 at Mon, 05 Mar 2007 00:24:20 +0000:
Paul: I wonder if you read the first article in the series, which this one is a follow-up to. One of the points I made in the first article is that metadata *could be wrong*, and you can never know whether the metadata is right or not just by looking at the values of its bits. Any metadata can be lying (the CD whose SCCM bits say "original copy" when it isn't), mistaken (the ripped audio file tagged "copyright-protected" by default, when it's of a public domain recording), or superceded by higher-level metadata (the metadata standards document that says "Here is what a valid metadata tag looks like:" followed by an example of one). So you can have metadata saying "this file was randomly generated", but that doesn't mean the file really was randomly generated. The quality of *really being* randomly generated is different from the quality of having metadata that *says* it's randomly generated. I use the term Colour to describe the hypothetical "real" metadata that may or may not match whatever visible metadata you have.

Your claim that that's not a useful concept is really one of the same points I'm making: Colour doesn't actually exist. It's not possible to make decisions on the "real" status of bits; all you can do is look at them and hope to deduce their status from what you see. However, there are powerful interests that very much want Colour to exist, even though that's not possible, and they refuse to take "no" for an answer, and that's part of what we're fighting about.

Luke from 67.101.0.59 at Tue, 20 Mar 2007 05:47:09 +0000:
Matthew: First of all, thanks for the excellent series of articles on a concept that is definitely difficult to understand. This entire set has brought things into sharp focus. I am finally able to hold the concept of Colour in my mind.

I think you may have misinterpreted Paul's post. I believe what he was trying to say was that because we can attatch, in our minds, the property of Colour to data, it is metadata. Indeed, I think he was driving at the idea of Colour as the hypothetical "real" metadata you described at the end of your first paragraph. In short, what he was saying was that even though actual metadata, in the form of tangible 1's and 0's, cannot be guaranteed to accurately represent Colour, Colour can still exist as metadata within our minds. I think Paul's comment would be better suited to the original article, and I wish he had checked back for your response. I don't like articulating the ideas of others.

Paul from 159.215.117.130 at Wed, 09 May 2007 19:09:56 +0000:
Yes, we are saying roughly the same thing, however you qualify your statement in terms of bits: 'it's not possible to make a decision on the "real" color of the bits.' I am making a stronger statement that it's not possible to make a decision on the "real" color of anything... because like Luke almost says, your impression of color in your head is metadata, and a leaky abstraction over the metadata available to you.

We live in a world of concepts that don't actually exist, like human rights and sqrt(-1). They're tools and ultimately they're only as good as they are useful. So, for example, a system like Monolith may be flawed, but not because it doesn't address something that doesn't exist. That's really their intention: make the tool useless.

So while we're saying a lot of the same things, I think we have a slightly different take on it. I'm not 100% sure about your stance since some things appear contradictory, such as "Colour doesn't actually exist" and "Colour is something real." I assume you meant color _of bits_ doesn't exist, but I can't be sure.

Daniel from 67.55.23.76 at Thu, 12 Feb 2009 04:17:25 +0000:
Brilliant and enlightening (though my brain can *just* stretch around the thought experiment story you describe). Reminds me of John Hodgman's Literary Tone Detector: "No longer will you be wondering: Is this novel by a noted author dull, or simply boring? … Have you ever been reading an op-ed in the newspaper or a cartoon in a magazine or perhaps even this book and wondered to yourself: Is this supposed to be funny? The Hodgman Literary Tone Detector (HLTD) will answer that question for you."

Matt from 69.63.60.29 at Wed, 04 Mar 2009 03:39:29 +0000:
Funny you should mention that... my current research with the computational linguistics groups at the universities of Waterloo and Toronto actually involves writing a "stylistic annotator" that's meant to make some baby steps toward exactly that. It probably won't be able to (correctly) distinguish between "dull" and "boring," but we hope it'll at least identify some of the factors that contribute to those.

Add Comment

Your name (required):
Your email address or URL (optional):
Type "bonobo" for anti-spam purposes:

Do not enter a fake email address. If you don't want to provide one, just leave it blank. Comments with fake email addresses will be deleted.

This form is for posting public comments to be read by other people who visit this Web site. If you have a software support question, or other material directed to the page author instead of to the general public, please send email instead.

All the data you enter, and your IP address, will be saved and displayed. Don't enter secret information. HTML is not accepted; it will be displayed as plain text. Your comment will only be added if you enter valid data in all required fields; if it isn't, use the back button and try again.

I, and I alone, reserve the right to remove postings for any reason.

Copyright 2004, 2008 Matthew Skala
Updates to this site: [RSS syndication file]