« What Colour are your bits? | Home | Race »

Colour, social beings, and undecidability

Mon 9 Aug 2004 by mskala Tags used: , ,

Okay, it's been about two months since I posted my piece about colourful bits, and I really should have posted a follow-up before now, but better late than never.  First of all, here are ten other places that carried the story, in no particular order:

For the most part, it seems like most people liked what I had to say, and most of them understood it.  The reactions of the people who didn't understand what I was saying, or who understood at least part of it but didn't agree, were actually more interesting to me.  I drew a certain amount of criticism for referring to two classes of people as "lawyers" and "computer scientists".  It's true, I didn't mean all lawyers or all computer scientists.  Maybe if I had consistently referred to the viewpoints of law and computer science, instead of to the people holding those viewpoints, I would have given less offense.  I still think, though, that the two viewpoints I described are so widely-held among, and so characteristic of, lawyers and computer scientists, that those terms are worth using.

I heard the opinion expressed several times that "Colour" was the wrong thing to call what I'm talking about.  People said "Colour is actually the same thing as X other thing, and we already understand X, so this discussion is boring and useless." Well, the reason I don't think it's boring and useless is that everyone who said that had a different idea of what the X was that Colour should be the same thing as.

Colour is not the same thing as "ownership" - ownership of copyright privileges is one kind of Colour, and it's the one that Monolith was trying to play with, but I gave examples of other things I want to include in Colour.  The quality of randomly generated numbers that distinguishes them from the output of a psuedorandom number generator is one kind of Colour that has nothing to do with ownership.  Another would be the quality of some image files that makes them illegal to possess under section 163.1 of the Criminal Code (child pornography).

Colour is not the same thing as "metadata" - as I described in my essay, you can attach metadata to your data, but the metadata could easily be mistaken, lying, or rendered incorrect by changes elsewhere in the universe.  For instance, if you have a CD with the SCCM bits saying "this is the original copy", that's all well and good.  What happens when I use my CD burner to create a bit-for-bit identical copy?  Then I have a CD with the SCCM bits saying "this is the original copy", but it isn't.  Colour is arguably information that could be described in metadata, but it's not the same thing as metadata.  If you try to make it the same thing as metadata, for instance with a law against attaching false metadata to things, then you're right back where you started, because now you're worrying about the Colour of the metadata.  Yes, you can express the answer to the question "Is the metadata accurate?" by encoding it as the number 1 or 0, but as soon as you do so, I can ask the question "Is that 1 or 0 accurate?" Colour could be called the n-th level meta-metadata that's always just out of reach.

Someone proposed that Colour is the same thing as "history", and I think that's almost right.  Thinking about the concept after writing the essay I posted, I thought of one way to make the idea rigorous that seems to work mathematically.  The thing is, law doesn't deal with numbers.  Law is permanently embedded in the physical universe, and it deals with events - subsets of the entire space-time extent of the universe.  Some of those events are "persons", and they have especially important status which I'll mention later.  Anyway, when we talk about you posessing a file, we're not talking about the bits of the file, just the ones and zeroes.  We are talking about the subset of the universe associated with that file.  Maybe some magnetized rust on your hard drive, maybe a little chunk of silicon with electric charges on it, and so on.

As computer scientists, we lump events into equivalence classes based on a "numerical value" function.  My file on my hard drive is the same file as your file on your flash card which is the same file as his file on his backup tape, because they all have the same numerical value.  The interchangeability and indistinguishability of files with the same numerical value are not only built into computer science, but built into computers themselves.  It is an important feature of my motherboard that I can equally well plug a CD-ROM drive or a hard disk into the same IDE port and they'll be treated the same to the extent possible, and it is an important feature of my operating system that when I type "less" to view a file, it will display a file from the hard disk the same way as it would display one from the CD-ROM. Part of the purpose of the operating system is to apply the "numerical value" filtering function.

So computer science, for very good reasons, is almost exclusively a calculus of numerical values, which are homomorphic to events in the universe.  That's a math term; I mean it in its precise technical sense, but a non-technical approximation of its meaning would be that we can think about numerical values and come to conclusions that are correct about events in the universe too, without needing to think about the extra complication that is attached to events in the universe.

The trouble is, homomorphic images tend to throw out important details.  Computer science's "numeric value" image of the universe eliminates some uninteresting extra complication, but law is all about things under that "extra complication" heading.  Events in the universe have other attributes not included in their numerical values, and in particular, they have "causes" - other events with a special relationship to them.  The questions answered by Colour seem to be the same questions answered by the answer to "What events caused the event of this file's existence?" If you can describe those causative events, then you can answer the copyright, child porn, and randomness Colour questions rather neatly.

If we say that law is a calculus of events, with emphasis on the relation of "causality", then I think we have a description of law that bears some relation to what lawyers actually do, but which doesn't paint them as dangerous idiots from the computer-science point of view.  So I won't complain too loudly if you insist on saying "Colour is actually the same thing as history," "Colour is actually the same thing as cause," or "Colour is actually the same thing as origin."

Discussion of what I was or was not actually talking about raises another interesting Colour distinction:  not every sentence in that essay was something I actually believe.  Much of it was my characterization of one side or the other of a debate in which I'm claiming that neither side has the complete picture.  Some of my readers seemed to seize upon some of the sentences in the essay and argue against them without realizing that those sentences were not my complete position.

We could say that different parts of the essay had different Colours - some were "statements I agree with", some were "statements I agree with subject to limitations stated elsewhere", some were "statements I do not agree with", and so on; discussing it sensibly required the reader to recognize the Colour of each sentence, at least to some level.  I'd suggest that that's one example of Colour that is not adequately described by saying "Colour is cause, history, or origin" because the whole essay had essentially the same cause, history, and origin.  To make that definition fit you'd have to get into tricky questions of intent as to individual sentences, but maybe that's the way you have to go anyway.  It might be easier to say that Colour also includes some kind of element of "context".

The "Colour is actually the same thing as X" objection has a more elaborated form, though, and I found this one a little irritating because it exemplifies exactly the misunderstanding I was trying to clear up.  The line goes, "Colour is actually the same thing as X, and everyone knows that X exists and is important, except dangerous idiots.  Skala is slandering the computer scientists by calling them dangerous idiots who don't know about X!" I didn't hear this from anyone claiming to be a slandered computer scientist - only from people showing the behaviour of the people I referred to as "lawyers" in my essay.  As far as I could tell, they didn't consider that I might be saying the "lawyer" point of view was wrong; their annoyance was that I thought it was necessary to bother saying that the "lawyer" point of view was so obviously the only right one.  That's an interesting and useful rhetorical phenomenon - I managed to phrase my argument in such a way that people on both sides thought I was on their own side and attacking the others.

Well, yes, I said that computer scientists deny the existence of Colour.  But as I also said, that's not because there is something wrong with computer scientists.  Computer scientists deny the existence of Colour because, within the subset of the universe studied by computer science, Colour really does not exist.  They are neither mistaken nor lying.  Colour is not obviously real because it is NOT real; it's not something we could see if our vision were a little better; it is not something we can fake adequately; and you cannot import it from some other universe and expect it to continue to exist in the computer-science universe in any form you can understand.

A big part of my point is that computer scientists are not dangerous idiots; I am one myself, and we're right.  Yes, many computer scientists would do well to understand that our world of numbers isn't the only world, and that Colour exists in some other worlds; but really, the idiocy I think is more dangerous is the one that insists "Colour must exist in the computing world because it exists somewhere else, and the experts who say it doesn't exist in the computing world are just being difficult." Sorry, Colour does not exist in the computing world.  You must give up that dream.

But really, even if we could all agree on Colour, maybe that wouldn't be such a breakthrough.  An anonymous poster on LawMeme made a point I thought was insightful:  Colour is still the wrong question because the law isn't about files at all anyway.  The law is something like the "calculus of events" I mentioned above, but it's about persons - social beings - and questions of bits and what Colour they may be are just bookkeeping used to help answer the real questions, which are about persons.  Anonymous writes:

But to make any sense of law in general, you need to see social beings as the first class entities of the system, rather than the files which contain the works.  When viewed in this way, the Monolith discussion is a complete waste of time, because it discusses numerical properties of files, not the actions and intents of persons.  Thus it concentrates entirely on secondary entities, ignoring the primary.

Monolith said they'd created a file for which nobody could compute the Colour function so ha ha, the lawyers would all have to go away.  I said the lawyers could still evaluate the file's Colour because Colour is not a function.  But as Anonymous points out, there's a bigger problem with Monolith:  the lawyers don't even care what Colour the file is, because Colour is just their way of getting at the real questions, which are phrased in terms of social beings and their interactions.

I like the idea that law is a calculus of social beings and their interactions, because that makes a whole lot of seemingly knotty law-related questions just vanish.  Instead of arguing endlessly about whether a Web cache "reproduces" the file being cached or similar questions, we can focus on the social beings involved instead, and answer the real questions.

The trouble is, even the legal system does not necessarily always see itself in that light.  I think, in particular, of Subsection 163(5) of the Criminal Code, which is part of the Canadian obscenity law.  It says in its entirety, "For the purposes of this section, the motives of an accused are irrelevant." What does that mean?

I think what it means is that it's an attempt to get around the idea that the law is about social beings.  That wording (especially in context, considering other things that have been said about obscenity in Canada recently) says to me that the statute is trying to be primarily about the allegedly obscene material; sure, if it's obscene then maybe someone will go to jail over it, but the court is supposed to be focusing its attention on "is the material obsence", not the social-being questions of who and why.  This law is not supposed to be about social beings - it's trying hard to be about Colour; and as I mentioned in my original essay, supporters of obscenity laws also seem to want Colour to be a function, so that a given sequence of bits will always be either legal or illegal regardless of context, history, or other aspects of Colour that can't be expressed as functions.

Fortunately, the law doesn't stop being about social beings just because someone wrote a statute that says "this statute isn't about social beings".  We saw that in R. v.  Sharpe - the statute was written to exclude a social analysis, and the Justices turned around and said, "We're going to look at the social context anyway, because that's our job!" They rejected a pure Colour analysis of the stories, and didn't even consider the Colour-is-a-function analysis.

Now, since I've established that Colour isn't a function and that Colour isn't even the end of the story anyway, I'm not sure how much value there is in trying to make legal arguments by proving mathematical things about functions.  Monolith tried that and fell pretty flat.  Nonetheless I'd like to mention something I think is interesting just from a mathematical point of view, which is that the Bill C-12 "child pornography" function is uncomputable.  This may be relevant because as I mentioned, advocates of child pornography laws seem to be trying really hard to force the legal system to evaluate functions instead of considering social beings.  The trouble is, if the legal system is a function evaluator, then we can quite easily construct a function it can't evaluate, because as computer scientists know, all function evaluators have weak spots, and the weak spots are usually quite easy to find.

Here's how it works.  You write a story that is a legal thriller.  Someone gets arrested, goes to court, lawyers on both sides make their arguments, the judge hands down a decision, maybe there's a jury involved or a couple of appeals, you know the drill.  The point is that in the story, some legal question is solidly decided - but the story as written does not actually say what the final decision was.  It's easy to write a story like that.  This paragraph is one; I, or some better writer, could easily add some details to make it a lot longer and more thrilling.

Well, then you write a second story that refers back to the first one.  This one's an erotic story, with lurid sex scenes, but just like the legal-thriller story, it leaves something to the imagination, and what it leaves to the imagination is pretty important - like whether the participants are consenting, or how old they are.  That's not hard to do either - Alice can say to Bob, "I'll sleep with you only if you win your case." Or maybe Carol's age in years can be stated as being the same as the paragraph number Justice Dave cited - without actually saying what the number was.  The second story is linked to the outcome in the first story, in such a way that it becomes a lot more questionable (no consent, participants under 18, etc.) if the accused was finally acquitted.

Finally, you link them in the opposite direction by having the legal case involve the second story.  The accused was accused of possessing a story and here it is.  Sure, something like that would be difficult in the real world because it creates a chicken-and-egg problem, but in fiction it's no problem.  Even without invoking any time travel, you can attach an epilogue describing, after the fact, what a remarkable coincidence it was that the court case actually unfolded as described in the prequel to the allegedly obscene story at issue in the court case.

That is, of course, just a very rough sketch, but anyone used to thinking mathematically and reading complicated fictional plots should see that there are no huge barriers to implementing it.  The bottom line is that it's possible, and not even particularly difficult, to construct a piece of text the possession of which is illegal under Bill C-12 if and only if its possession is legal.  The "legal to possess" function would be impossible to evaluate on such an input.  This is just the kind of thing computer scientists do with Turing machines all the time.

Would it prove anything interesting to actually write that piece of text?  I think it probably wouldn't - because it would be instantly clear to anyone examining it seriously that it was a weird boundary case, and the court, which isn't required to behave like a Turing machine, would just discard the line of thinking that tries to answer the question inside the text, and would instead look at the social beings.  If I were defending the case they'd be asking "Okay, so why did Matthew write this anyway?" and as soon as they asked that, the question of whether I'm an evil child pornographer or not would become easy to answer.  Nonetheless, the supporters of Bill C-12 really seem to want it to be a function evaluated by a Turing machine.  If Bill C-12 were a function then it could be demolished by a counterexample like that one - a situation where the question cannot be answered.  Maybe if enough Conservatives want the court system to evaluate functions, then it will actually try to.  I'm scared to actually write the story because I don't think I can depend on the court system to escape from the Turing undecidability trap, and I think that's a shame.


I appreciate that this is a late response, but it occurs to me that an interesting Computer Science analogy for "Colour" is "Type" (as in Type Theory). Bits don't have a type inherently; you attach a type to them for the benefit of readers of your source code (whether those readers are human, or computer). The type then propagates through the system, and in stricter type systems, can never be completely lost - if something starts out as "NotForDistribution", you can ensure that you retain that tag through the code, and explicitly handle it.

Once your program is compiled to machine language, the types have completely disappeared, and yet we still care whether this 64-bit quantity is an IEEE754 double, or a 64-bit unsigned integer, or 8 ISO 8859-1 characters, or a fragment of a UTF-8 string, or even a pointer to other data. This has correlations with the legal concept you describe as "Colour" - even though the machine sees a 64-bit quantity, we still care whether these bits were derived from a copyrighted work, or whether they're randomly generated, or whether they're new bits that "belong" to their creator.

Further extending the analogy, type systems can express quite complex conditions, and you can even have type systems that need human intervention to say "don't go here - if you do, you'll get stuck", just like handling laws, where there are cases which need a judge to apply their common sense.
Simon Farnsworth - 2010-05-17 05:37
When I moved this article over from an earlier version of the site code, the existing comments on the page were lost. There weren't enough of them and they weren't interesting enough for me to think it worthwhile to try to preserve them in some way, but: one of the things brought up in the old comments was that this Colour idea is a lot like the type systems in some programming language. Perl (not normally a very strongly typed language) has a concept of "tainted" data when it's run in setuid mode, which expresses something very like the virus-infection Colour I described; other languages have similar concepts used for various purposes.

As you say, though, that concept is part of the language rather than inherent to the computer itself. It only exists as long as we're inside the particular language that supports it. That is a big part of the "Trusted Computing" concept, and things like "Palladium": special hardware support to extend this kind of type safety all the way to assembly language. On the other hand it goes directly against our basic model of computation, in the work of people like Turing and von Neumann, which hinges on bits ultimately being just bits, with any type or semantics we impose on them being external decisions we are free to change when we want to. You need to be able to override type in order to write things like general-purpose file compression utilities.
Matt - 2010-05-17 10:50
I think perhaps the word that best captures your notion of 'events that led to this event', 'causality', and 'history', is 'provenance'.

From Wikipedia:

Provenance, from the French provenir, "to come from", means the origin, or the source of something, or the history of the ownership or location of an object.[1] The term was originally mostly used for works of art, but is now used in similar senses in a wide range of fields, including science and computing. Typical uses may cover any artifact found in archaeology, any object in paleontology, certain documents (such as manuscripts), or copies of printed books. In most fields, the primary purpose of provenance is to confirm or gather evidence as to the time, place, and—when appropriate—the person responsible for the creation, production, or discovery of the object. This will typically be accomplished by tracing the whole history of the object up to the present. Comparative techniques, expert opinions, and the results of scientific tests may also be used to these ends, but establishing provenance is essentially a matter of documentation.
Dan Walkowski - 2011-06-08 16:02
I think that if you did write that story, and the legal system were a type of Turing machine, you would merely keep a succession of prosecutors very busy investigating the case against you.
A charge of "contempt of court" may be a another possibility.
JJ - 2012-02-05 04:42
I don't think contempt of court is really a possibility just for writing the story. Contempt is about disobeying a court order, not about disobeying a statute. If as a result of some other court case I'd been ordered not to create child pornography, and then I created something for which it's mathematically impossible to prove whether it is or isn't child pornography, that might look like contempt; but the situation here is that *everybody*, not me specifically, has been ordered not to possess child pornography by Parliament, not by any court. Different branch, and we have no duty not to do things that are almost but not quite against statute, not even if we do them apparently for the purpose of testing the system.
Matt - 2012-02-05 07:47
"Nonetheless, the supporters of Bill C-12 really seem to want it to be a function evaluated by a Turing machine."
I think you're massively overthinking it. Wanting 'strict liability' laws and similar to exist does not mean that you want the legal system to function as a Turing machine.
Anonymous - 2016-08-27 07:00
It does when you're talking about strict liability for *possession of words*. Possessing those is very much different from possessing something physical like drugs, and the uncomputability of the functions that determine legality is one reason why.
Matt - 2016-08-27 07:49

(optional field)
(optional field)
Answer "bonobo" here to fight spam. ここに「bonobo」を答えてください。SPAMを退治しましょう!
I reserve the right to delete or edit comments in any way and for any reason. New comments are held for a period of time before being shown to other users.