I saw a Web BBS posting recently in which the poster, who was a foreigner learning English as a second language, asked "Which is correct - 'based off' or 'based off of'?" The person asking the question can probably be forgiven because they don't know any better, and at least were smart enough to ask, but if you know me you'll probably be able to guess that the general agreement among the answers, that "based off of" is incorrect and you should say "based off" instead, caused me to consider the merits of a tri-provincial killing spree.
I will not apologize for being a prescriptivist. There are some usages that would be wrong even if all the other native speakers of English used them; and "based off" (with or without an "of") is such a usage. I'm willing to accept "different than" as an issue of formalism, and acceptable in speech or informal writing even though I do not use it myself; I'm willing to (very grudgingly) grant that persons from the United States of America may be allowed to say "anyways" as a regional dialect thing, even though it makes them sound illiterate; but "based off" is just completely unacceptable.
Nonetheless, from a scientific perspective and from the point of view of "know the enemy," it may be interesting to look seriously at the questions of who does say "based off," and when they started.
When I mentioned this issue to my mother, a former editor, she expressed surprise that anybody said "based off" - she didn't recall hearing or seeing it. On the other hand, when I read the TV Tropes Wiki, my impression is that "based off" is almost universal there, with "based on" a rare exception. I change it whenever I see it, but given the rate of editing it seems a certainty that instances of "based off" and "based off of" are being added to TV Tropes faster than I remove them.
A little bit of Web searching turns up a few other people expressing views like mine about "based off," and the general consensus seems to be that it's become popular only in the last five years or so, and primarily among young people. I heard a colleague (a linguist to boot) use it and also "anyways" in a formal academic lecture recently, but that seems to be unusual; it's mostly the uneducated who say "based off."
Why did they start, and why did it grow? At this point it's probably impossible to really know, but my guess is that it's tied up in the metaphor built into the phrase - which is probably also why I find this one much more annoying than other equally incorrect phrases. When you say "based on," you are invoking a metaphor. Saying "based off" discards that metaphor in favour of a syntactic analogy to an unrelated word; so saying it reveals you don't really know what the word "base" means.
Suppose you're erecting something like a statue. You don't want it to fall over. So you put it on top of something solid and stable - like a big block of stone. That solid, stable thing is the base of the statue. More generally, a base (as a noun) is a place of security: for instance, soldiers hope to be safe while they are in their fortified "base"; and a runner in baseball can't be tagged out while he is "on base." The same root that gives us the noun "base" also gives us, for instance, the adjective "basic" - used for things that are simple and strong and provide bases for other things. Then "base" can also be used as a verb to indicate putting something on a base - you "base" your statue on the block of stone; or (extending the metaphor to intangibles) you can "base" an argument on a concept, or create a new piece of artistic work "based on" some earlier work - that is, using it as a source of ideas or inspiration. The meaning of the phrase "based on" is, well, based on this metaphor of putting objects on top of other objects for stability. Using the preposition "on" is fundamental to the metaphor.
Suppose the statue is not on the base but off of the base. Well, then it's not based at all; it is baseless; it's not stable or secure; it is the opposite of based. In fact, we use the phrase "off base" to describe something that has no base or foundation - or a baseball runner who is vulnerable to the tag. The usage of "based off" where "based on" would be correct, is off base.
But I think the purported justification for "based off" comes from a different line of thinking that goes like this: the speaker neither knows nor guesses the metaphor of solidly supported erections that links "based on" to the noun "base" meaning "place of security." Instead, they think of "to base" as a verb with no history, that means "to imitate" with no connection to any noun meaning of "base." Then they think it is a sort of politer version of "to rip off"; and so they think it should take the same preposition(s). Someone who would write "The Witch Hunter Robin opening is ripped off of a Madonna music video," thinks it's okay to write "The anime Suzumiya Haruhi no Yuuutsu is based off of a series of light novels." The latter anime is copied, it's imitative, but not in an unethical way - KyoAni had permission to use the source material and did so openly, so the anime is not a rip-off, but it's still a copy.
Note that that story nicely explains why some people say "based off of" instead of just "based off." The phrase "ripped off of" makes sense to me because "ripped off" is the verb (it's related to the noun "rip-off") and "of" is a preposition for linking an indirect object to that verb. Trying to hammer "base" into the hole left by "rip" when you don't want to suggest plagiarism, yields "based off of."
So, yes, this does imply that if I were forced to accept one of "based off" or "based off of," I might prefer "based off of"; "based off" seems to be a further corruption from there. That is not the right answer to the language learner's question in my opening paragraph, though!
Tracking the cancer's growth
On the Web in general, it's common practice to use Google's number-of-hits estimates to examine questions like this - xkcd has based some amusing cartoons on that technique - but we can do better. A linguist at Brigham Young University has assembled something called COCA, the "Corpus Of Contemporary American English," and put up a search engine for not only it, but several other major corpora of English. These "corpora" are samples of language built specifically for linguistic research. Exactly what kind of samples they are, and how they're constructed, depends on the particular corpus - each one does things a little differently - but it's typical that a corpus will attempt to contain a randomly selected representative sample of the entire language, for instance including both spoken and written words, fiction and nonfiction, a variety of levels of discourse, within defined boundaries of dialect and historical time period, and so on. So counting occurrences in one of these is kind of like doing that Google search, except like actually scientific and stuff!
The COCA pushes you to register and prove that you're a "real" researcher in order to expand its per-user-per-day usage limits, and I don't particularly like that. It still does give me a bit of a thrill, though, that I actually do have the relevant qualifications. So I did the secret handshake, created an account, and ran some searches. I stuck the results in a Gnumeric spreadsheet and you can download it if you want to (or go to the site, register and do searches yourself, and get even more data than I included in my spreadsheet) but I'll summarize the interesting bits here.
The COCA, which is 410 million words attempting to sample uniformly from American English from 1990 to 2010, includes 56016 instances of "based on" and 17 of "based off," which includes 8 instances of "based off of." Of those 17 instances of "based off" which includes 8 of "based off of," 12 of the "based off"s including 7 of the "based off of"s are in the "spoken" section of the corpus. There is one "based off of" in each of 1993, 1996, and 1998. All the other instances of "based off" (with or without an "of") are in 2004 or later.
So in this corpus, it looks like we can say:
- "based on" is much more popular;
- "based off of" seems to be older than "based off";
- "based off" has started to occur since 2004;
- "based off" and "based off of" are primarily used in speech, not written.
Just as a reaction from my experience: I think "based off" is far more popular than its occurrences in this corpus suggest, and I think that highlights something about this corpus: it's a corpus of educated language. It's mostly published writings, such as books, newspapers, and magazines; and even the spoken component is transcripts of radio and television, where the speakers are, to be blunt, not idiots. If you go on the Net and read writings like Wikipedia, where many of the contributors actually are idiots, I think you'll see a lot more "based off"s. Corpora of uneducated language do exist, and it would be interesting to look at them for this.
The same search engine also offers access to the "Corpus Of Historical American English" or COHA, which is 400 million words covering 1810 to 2009, split roughly evenly per year. In that corpus, there's a total of 15306 instances of "based on" and 2 of "based off," both of which are in the year 2005. There are no instances of "based off of."
BYU's search engine also offers access to a database of TIME magazine articles, 100 million words from 1923 to 2006. It contains 7460 instances of "based on," none of "based off" or "based off of."
Finally, there's the British National Corpus, which is 100 million words of British English. It covers from the 1980s to 1993; the data is not so nicely broken down by year. It contains 11467 instances of "based on," none of "based off" or "based off of"; but before concluding that "based off" is uniquely American, we should bear in mind that it didn't start appearing in the American data at all until 1993, and became a lot more prevalent starting around 2004-2005; so it could simply be too new to appear in the British data even if British idiots are using it as much today as are American idiots.
I was interested in "based off," but there are too few occurrences of it in the corpora (probably because, as I said, these are corpora of educated language) to make a nice chart. So instead I'll show you this chart I made of the prevalence of "based on" per million words. It looks like "based on" started to become popular in the 1830s - and prescriptivist Web loggers were probably as enraged by it then as I am by "based off" today.
UPDATE: I just spotted a "based off" in the December 10 Something Postiive. It's spreading.