Neural networks and the Unix philosophy

« Somebody but not anybody | Home | 自分のドキュメントクラスを作ろう »

Sun 25 Sep 2022 by mskala Tags used: compsci, software, linux

There are a number of directions from which we can look at current developments in deep neural networks and the issues I raised in my streamed comments on pirate AI. Here's a summary of the implications I see from the perspective of the Unix philosophy.

How things are right now

Here's how you might use a deep neural network generative model:

Register for an account with a deep-learning service
Agree to their terms and conditions, and possibly pay a fee
Visit Web site with a consumer browser
Enter prompt; hope it doesn't trigger the content filter
Download image file

This cannot be built into an automated system of your own, neither as a matter of policy (doing so is forbidden) nor technology (it's impractical to automate).

There are other options that can be automated via APIs, but they'll cost money and will keep you dependent on the third-party service and your network connection to it, as matters of both policy and technology.

Even to the extent you can download some of this software and run it locally, it will still be dependent on the network and on massive frameworks that specify the architecture of the entire installation. You can build your own thing, sort of, but your thing needs to fit within the framework or service.

What if

echo "Brian Kernighan slaying the dragon of monolithic software design, 19th Century engraving" | stbldiff | cjpeg > bk-dragon.jpg

This can be automated easily. Anyone with decent Unix command-line skills will know from a glance at the above command line what the "stbldiff" utility is for, and how to interface it with other software. They might guess that it probably has some options which would be explained in the output of the "man stbldiff" command, but reading the man page is not even necessary to use it at this level. You can build your own thing, locally, on top of the "stbldiff" utility. Not only within it.

The difference is the Unix philosophy.

The Unix philosophy (McIlroy, summarized by Salus)

There have been multiple attempts to summarize what the "Unix philosophy" actually is. To a large extent it's something intuitive; we know what is Unix-y by how it feels rather than by having a specific list of rules. But here's a simple definition many people like.

Write programs that do one thing and do it well.
Write programs to work together.
Write programs to handle text streams, because that is a universal interface.

Other aspects of the Unix philosophy

Here are some other points I think are especially relevant to AI, which tend to crop up in other statements of the Unix philosophy. Most are re-stated in varying wording by multiple people who write such definitions, and cannot be attributed to just one author each.

Standalone operation (software architecture)
Standalone operation (network)
Avoid captive user interfaces
Portability over efficiency
Free software

Write programs that do one thing and do it well

Positive example: sort. It just sorts the lines of text in a file. Maybe those lines are database records; but if you want to do some other database operation, you might use "join" or "grep," which are other programs. You wouldn't look for "sort" to also have joining and filtering features.

Negative example: HuggingFace "transformers." In its creators' own words, "Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models [...] reduce your [] carbon footprint [...] in different modalities, such as Natural Language Processing[;] Computer Vision[;] Audio[; and] Multimodal[.]"

That's not doing one thing! Even to the extent we might claim that a library should be exempt from this point, transformers fails in the library role because it puts itself on top of other huge packages, creating extensive second-order and higher-order dependencies. Using transformers vastly increases the number of other tools you need to use in order to get any single job done. Its claimed advantages come from gluing together other large multi-packages which themselves glue together many smaller functions, so that transformers and all its dependencies together can solve many different problems: exactly the opposite of doing one thing. At best we might say that transformers is attempting to be something like a Unix distribution, and thus should not be compared to the scope of a single Unix utility. But unlike a Unix distribution, the tools inside transformers cannot be separated from the whole pile.

Putting "download" first in the description highlights that this package includes a dedicated network client, which is a complicated application in itself unrelated to deep learning models. Including a network client as a core feature makes transformers dependent on both the network in general and on HuggingFace's proprietary service. You need to maintain an account with them (and be subject to their authority on what you're allowed to do with it, and quite possibly pay a monthly fee) to use "transformers" to its full capability.

It says its function is to train pre-trained models, which is contradictory nonsense. Pre-trained models are exactly those models which are already trained. So what does "transformers" really do? In fact it seems to do all three of: training new models; fine-tuning existing models; and running models that are fully pre-trained and distributed through HuggingFace's proprietary service.

"Reducing your carbon footprint" is mere puffery with no close connection to the function of the transformers package. Why does that get mentioned in the highest-level description of the package's overall purpose? Are there really any specific technical decisions in the design of "transformers" that were made for the purpose of reducing carbon footprint? Is that really the purpose for which users download and attempt to use this package? Sort might be equally said to do it, because real implementations of sort use an efficient algorithm which will consume less energy than an hypothetical less-efficient sort utility that does not actually exist - although carbon footprint was of course not the original reason for choosing the more efficient algorithm.

Write programs to work together

Positive example: Autotools. When you run configure ; make ; make install, it'll try to guess where you want things to go. But you can also tell it where to put things. All reasonable package managers can build and install an Autotools package. Autotools can be on top if you want (because it can invoke arbitrary command-like programs) but it's also easy to keep something else on top, with Autotools doing only what it is told to do. Autotools can fit within something else; other things are not forced to fit within Autotools.

Negative example: Python. Documentation for Python-based software of any significant size offers no method of installation except through specific package managers (e.g. "pip," "conda," and "npm," just to install PyTorch and its documentation), which need to be on top and force the use of the network (in general, and specific proprietary services). They impose their architecture, directory structure, and dependencies, on the entire system and cannot easily fit into an existing system. They "work together" with other software only by being the boss of it. In practice, users will end up using OS-level virtualization and "containers" to defeat these architectural limitations, at huge cost in wasted computing resources. (And, yes, "carbon footprint," if we must.)

Write programs to handle text streams

Positive example: pbmplus. Although it also has a more compact (and still ultra-simple) binary file format, the utilities in this package all support a plain text format, even for non-text graphic image files. Existing shell plumbing designed for text works perfectly for connecting the small and simple pbmplus utilities to create more complicated functions.

Negative example: tensorflow. It is not possible to run tensorflow without JSON-format (sort-of text-ish) configuration files. Models are stored in binary format with hardware-specific baked-in parameters (such as "sharding") which make them difficult to port even among different installations of tensorflow, let alone processing with non-tensorflow software. Data other than models is usually expected to come in or out in tensorflow-specific JSON-based formats, even when it is something like a transcript which could have an obvious text representation.

The Unix philosophy is usually stated in terms of what programs do - thus, handling text streams as input and output of the running code - but a closely related issue has to do with documentation. The documentation for sort is in a man page or info file, either of which are close enough to plain text that you can pipe them through a text utility like sort itself. Documentation for tensorflow is a loosely coordinated set of interlocking Web sites, "courses," "tutorials," and "notebooks," requiring network access and a graphical Web client with elaborate client-side scripting support.

Standalone operation (software architecture)

You can install sort without installing grep. As a consequence of "do one thing and do it well," each Unix package has few dependencies on other packages.

If you try to use sort and grep together, neither demands that you use a specific version of the other. As a consequence of "handle text streams," no utility has a reason to care exactly where its input came from or where its output is going.

Trying to do anything with a deep learning model requires installing at least one framework library, which will have a list of direct and indirect dependencies often running to hundreds of small modules. All of these will have version-control issues. To run it at all, you must run around installing all the other things and getting their versions right, even though you will probably never use most of them. Actually doing all that by hand is impractical, so package managers become necessary.

Long dependency lists might be waved away on the grounds that deep learning is an inherently complicated task and you simply need a lot of software to do it, but that really isn't true. You need a couple of tricky numerical algorithms for things like matrix arithmetic, possibly optimized for your hardware environment. The numerics library is one package that ought to be a few hundred kilobytes of code. Maybe you need an automatic differentiation package - which is tricky, and may be likely to have complicated relationships to other things, because it is meta-software that modifies and analyzes other software. But again, that should be one package. You also need gigabytes to terabytes of data, which depending on your application you might generate yourself or download from somewhere; but processing massive data does not necessitate hundreds of packages of code.

Structuring the code as one or more huge "frameworks" that purport to do everything for everybody and accomplish it by pulling in hundreds of other separately-maintained packages from a network repository, is an architectural decision forced by software engineering culture and not by the application domain.

Standalone operation (network)

Despite that network connections are ubiquitous today, and Unix was one of the first operating systems to be networked, "do one thing" recognizes that communicating with the network is a complicated activity in itself that should be separated from any software whose function is not specifically network-related. The ability to do non-network things without a network should be seen as a Unix principle, and likely would have been if it were more salient in the 1970s when this philosophy began to be formulated.

"Everything is a file," often cited as another Unix principle, means that any software which handles files can handle files stored on a network server through the operating system's support of networked filesystems - without network-specific code in the application-level software. "Handle text streams" means that any software which handles text streams can handle a stream from some other computer through a separate networking utility that presents a text-stream interface. Software that handles text streams can become a network service by connecting it to existing general-purpose server software. Small, self-contained programs ("do one thing") can be built locally just by running make without needing to automatically or manually download dependencies.

Use of frameworks that entail package managers, which in turn require network hits not only to the Internet in general but specific services and locations in particular (requiring policy alignment with and possibly monetary payments to the operators of those servers) goes against this value and makes standalone or air-gapped operation impossible.

Online-only documentation similarly makes operation without the network impossible at the human level.

Avoid captive user interfaces

The concept of avoiding "captive user interfaces" is sometimes claimed to be part of the Unix philosophy, and tends to be illustrated by examples of systems which implement their own command-line interpreters to replace the Unix shell. Captive command line interfaces may have been a big problem once, but they sound quaint and irrelevant in this era of GUIs. However, this point gets at an important and still-relevant principle which I would describe as avoid forcing yourself on top.

A "captive user interface" limits users to the activities it was originally designed for, and thus the imaginations of its creators. More traditional Unix utilities, which do simple things without placing requirements on their environment, can be combined into arbitrarily complex functions of the user's choice and are ultimately more powerful because they enable the creation of new systems. They do not insist on talking to a human; they can equally well connect to other pieces of software.

Does a software package work within the system where it is put? Or does it force itself on top of other things, rearranging their architecture to fit within its own?

The Unix philosophy is to prefer the former approach. "Framework" packages implement the other one.

Also note the humans-only nature of services provided through script-heavy Web sites, which seem to be 2022's version of the old-fashioned "captive user interfaces."

Portability over efficiency

The equivalence of computing systems is central to computer science theory. Except for speed and memory concerns, a PC can emulate a Mac, but a Mac can also emulate a PC. As a consequence, PCs and Macs can be said to be equivalent: any piece of software that can run on a PC can also run on a Mac (by means of the emulator) and vice versa. It's possible to describe a family of "sufficiently advanced computers" that includes PCs, Macs, a lot of other kinds of real-life computer hardware, and most of the hypothetical future computers we might someday build. These all have fundamentally the same level of question-answering and software-running capability, because they all can emulate each other. The work of Alan Turing provides a more formal mathematical basis for statements about the capabilities of general-purpose computing devices.

There are practical limitations to the equivalence of different computers. Some computers really are more powerful than others in a way that matters to human users; the carve-out of "speed and memory concerns" above is doing a lot of heavy lifting. But there are also engineering techniques of portability intended to minimize any hard differences between computers. I can still run software today, on my 2020s-era Unix desktop PC, that was written decades ago for some of the first Unix-based computers, which were built very differently. I can do that partly because the old software was deliberately written to embody the value that a future person like me should be able to do so. Some tasks I might run on my own computer will run significantly more slowly than they would on a bigger and more expensive computer; but at a basic level all computers do the same thing, just at different speeds.

An important part of the Unix philosophy is to design on the assumption that software may run on a wide range of different computers. Dependencies on special features of particular hardware or operating systems should be avoided. Even if a specific platform offers features that can improve performance significantly for a given application, the Unix philosophy holds that those features are better left unused, or their use is better made optional and implemented only after the portable version is shown to work.

As Knuth wrote (later crediting Hoare, who disclaimed having originated it), premature optimization is the root of all evil.

Deep learning models by their nature can greatly benefit from special hardware support: GPU computation, which often implies dependence on nVidia hardware in particular, and TPU computation, which implies dependence on Google's online services. The benefits of the special hardware are real and may be so quantitatively great as to make a qualitative difference in the application domains where the software can be used. A hard commitment to complete portability is hard to maintain in the deep learning field. But we could still make some effort toward portability. We could still write code for generic "computers" first, only later specializing it to run as fast as possible on specific hardware. The existing abstraction layers built into numerics libraries often really support hardware-platform portability already, if we would choose not to disable it.

It is a matter of software engineering culture, not technical necessity, that very much deep learning code is written with dependence on nVidia and Google hardware mandated at the application level. Assuming the use of specific proprietary cloud services (and therefore of a network) fits into the same pattern. Putting the Unix philosophy, and portability in particular, first would mean choosing not to build in those assumptions.

Free software

All Unix software is certainly not free in the sense of the free software movement, but software freedom is characteristic of the Unix philosophy. Free software can be defined by the "Four Freedoms":

The freedom to run the program for any purpose
The freedom to study how the program works, and change it
The freedom to redistribute copies
The freedom to distribute your modified versions

Despite the existence of proprietary non-free Unix software, large-scale recognition of the importance of these freedoms has supported projects like GNU. Through these projects it is possible to run an entire Unix system consisting entirely of packages that meet this standard.

But the Four Freedoms are not characteristic of deep-learning software.

Online services inherently break the "for any purpose" freedom. They invariably have explicit limitations on purpose of use via "terms of service"; implicit limitations stemming from architectures that preclude automation; and undisclosed frequently-changing hidden behaviour like DEI filtering, with unknowable and often serious consequences for applications. Even off-line, downloadable, deep learning models come under non-free licenses that purport to limit the purposes for which they may be used. Non-free licenses are often put forward as necessary for "trust and safety" reasons.

Studying how deep learning software works is an activity of great interest at the present time; but it's made much harder by the widely accepted architectural deviations from the Unix philosophy described in earlier sections. You can't take something apart to study it, much less modify it, if it exists only on somebody else's computing hardware; nor if it can run locally but only as part of a huge "framework" that puts itself on top of all other code. Either of these reduce the model to a black box. Undisclosed updates and hidden behaviour of online services make reproducible research difficult or impossible.

Often, a model may be claimed to be "free" in a sneaky impractical way because the code that runs it may be free but running that code depends on using specific proprietary online services which will place non-free requirements on users; and beyond the code itself, you may need non-free data like "weights," which you may not even be allowed to directly possess at all, let alone without agreeing to restrictive terms of use.

Redistributing unmodified copies is actually often permitted by copyright licenses on deep-learning software, to the extent you are allowed to have a copy in the first place. But if the model is only available through an API, then your freedom to share it consists of, well, you're allowed to tell other people to also subscribe to that service. Redistributing a modified version is usually about as possible as distributing an unmodified version, which is to say, impossible in many cases.

Where to go from here

I think the Unix philosophy is a worthwhile perspective for looking at deep learning work. Deep learning as it's practiced today goes very much against the Unix philosophy; it would be better if it were more in line with Unix principles; and Unix is worth promoting as good in itself. We should be doing more to bring deep learning into this tent.

That will not be easy, and the best opportunities may already be lost. Large corporate interests have a lot staked on exactly the disadvantages I see in the current approach, which from their point of view are advantages. They can and will use public worries about "AI safety" as excuses to mandate exactly the opposite of the Unix philosophy for AI, forever. Imagine a boot stomping on a GAN-generated picture of a face. The window to avert the coming regulatory capture is rapidly closing.

The obvious course of action for anyone who wishes to promote the values I think are important, is to simply build tools embodying those values. As a whole, that is not a small or easy project, and it is not easy to recruit anyone to work on such a project when much of it may seem to duplicate work the big corporate teams have already done. Any individual who wants to run a model like Stable Diffusion will find it easier to just download the framework code and the hundreds of dependencies via a package manager, tweak example code and follow "tutorials" until something basically runs, accept the usage limits, and work around the disadvantages. Even though only a small fraction of the huge steaming pile of bloated code is actually needed to run any one model, the small fraction is still a large enough amount of code that recreating it is a project of considerable size.

There is also a significant problem in that the Unix philosophy is no longer generally seen as important. People like me, who really care about it, are old-timers. If I announce a project to build a really standalone package for some deep learning task, the first question I'm going to get is where is the Github link for it, and the second is going to be how to install it with npm. Those are the tools programmers today think they need for all programming, and making "We do not use package managers" a cornerstone of what the project is all about, makes it a non-entity from the point of view of most of whoever would otherwise be the audience. It's a difficult sell, and it's not something on which a compromise is realistically possible, because it really is central to the purpose of a Unix-philosophy AI project.

Nonetheless, the Unix philosophy itself provides some solutions to the coordination problems, largely by showing them irrelevant. There really are small, standalone packages that would be useful and that a lone programmer could really build without needing buy-in from others. The "stabldiff" command-line utility I described above would be one of those. It is plausible for one person working alone in a reasonable time to create and release that, in an Autotools tarball. Maybe actually using the Stable Diffusion weights would not work because they are non-free, but it is reasonably plausible that EleutherAI or some similar project will release pretrained weights for something similar in the near future that really are free. And once one such thing existed, its obvious usefulness would be a powerful argument for why more such things need to exist.

Format-conversion utilities capable of translating models between different framework formats, and standalone (not themselves part of any framework), would be another thing the community needs, within the range of a small (likely one-person) project. Even something like "file" for the files created by existing deep learning frameworks, to look at the weird proprietary artifacts they create, figure out how something is "sharded," and guess what other software would be needed to deal with it, would be quite useful. Some of that might even be possible as just some new configuration stanzas for the existing Unix "file" utility, without needing new code.

Existing "framework" packages are under at least partially free licenses, and although weights and specific models often are not, there is some possibility of a motivated programmer teasing out just the parts of a framework necessary for a given task, to create a much leaner task-specific package that includes all its own dependencies, can be downloaded in one piece without a package manager, and does something useful without trying to do everything. On the model side, some unrestricted-use models do exist, more large ones will eventually be created and released, and for small applications, it is reasonable for individuals and small teams to train their own.

My hope is that once some stuff of this kind exists, its obvious necessity and superiority over the current monoliths will attract increasing attention and encourage the creation of more small, simple utilities, eventually driving a popular shift away from monolithic "frameworks."

0 comments

Ansuz