RSS

Normalizing Your Way to a Security Breach

When logging into a website recently, I was asked to verify my identity by picking the address I had on file with the business from a list that included other street addresses. Something like:

  1. 741 Juniper Parkway
  2. 59 Hawthorne Circle
  3. 331 Ushers Rd
  4. 82036 Sterling Terrace
  5. 160 Galaxy Way

I bet you could have impersonated me. There’s a flaw in that list. Do you see it?

The address pulled from their records is normalized to USPS guidelines (“Rd” for “Road”) but all the others have the street type spelled out (“Circle,” etc.). This isn’t just a formatting quirk; it’s an information leak. By applying normalization inconsistently, the developers unintentionally disclosed which option was real.

In security, the devil is in the details or, in this case, the abbreviations. Authentication systems live or die on uniformity: inputs must be normalized consistently, or attackers get clues for free. It’s a classic case of why we don’t “roll our own:” it’s incredibly easy to get it 99% right and still be 100% wrong.

 
Leave a comment

Posted by on January 8, 2026 in Cybersecurity

 

Tags:

DAGs in SQL

What is a DAG and Why Should You Care?

DAG stands for Directed, Acyclic Graph.  It sounds a bit obscure but it has many practical applications, especially in software design.  But what is it?  The first two words are just modifiers so, what is a graph?

Fundamentally, a graph is a mathematical abstraction.  I could tell you it is made up of nodes and edges but that’s still fairly abstract. You can visualize a graph as being islands (the nodes) connected by bridges (the edges), or cities and the roads connecting them.  Indeed, one of the classic software problems, the traveling salesman, is about just such a scene. The salesperson wants to visit their customers on all of the islands without visiting any island (or using any bridge) twice.  Is it possible?  Can you write a program to do it?  Or a program to show it’s not possible?

There are many variations on the problem. What if each bridge had a different toll and the goal is not to avoid repeats but to minimize total tolls paid?

Another interesting variation arises when the bridges or roads allow traffic to pass in only one direction. In the concrete, we recognize this as a one-way street.  In the abstract, it is a directed edge: an edge that can be used to go from A to B but not back from B to A.

A poor traveler with a bad map or a bad sense of direction might set out on a trip across these one-way bridges and come to realize they have been driving in circles (A to B to C and somehow back to A).  A good civil engineer might be able to set the direction of the bridges so that you couldn’t go in circles.  The graph corresponding to these islands and bridges has no cycles, it’s acyclic.

So, a directed, acyclic graph is one where the edges have direction and you can’t return to your starting point following the direction of the edges.

But why should you care?

Graphs are abstract but describe various real-world problems very well.  For example, consider the files and folders (or directories) on your computer.  We’re used to seeing these presented as an outline where you can navigate down from the root of the disk to a folder in the root, to another folder inside that, and so on and so on.  An outline or organization like this is a “tree,” a special type of directed graph where there is only one path from the root to the leaf.  Picture a real tree and you see the same thing: for any leaf, there is only one path from root to trunk to limb to branch to twig to leaf.

For a more complex example, consider trying to classify animals that have frequent contact with people.  We’ll exclude animals seen on safari or in zoos. We can start by dividing them into Farm Animals and Pets.  Horse, pig, chicken: all farm animals. Goldfish, canaries, guinea pigs: all pets. What about dogs? Dogs are great pets but they also are used on farms to herd livestock. So is a dog a farm animal or a pet?  It’s both. You can’t really use a tree to draw these categories but you can use a DAG!  There are two paths from the root to dog: Frequent Contact → Farm Animals → Dog, and Frequent Contact → Pets → Dog.  (We could divide farm animals into food animals and work animals, or pets into cuddly and not cuddly but dogs still fit two categories.)  A dog is not a goldfish or a pig but because going from one category to a subcategory is directional, we’ll never be confused about that.

A Simple DAG Database

Codd and others did a lot of work creating relational databases. Open source tools like MySQL and products like SQL Server have decades or maybe centuries of labor in them to implement those theories. An edge is a relationship between two nodes so why wouldn’t you use a relational database to store it? Why reinvent the wheel?

You might say because node and edge aren’t SQL types but neither are bank account and product, but banking and e-commerce systems certainly use relational databases.

How do we represent a node in SQL? A table, naturally. What columns does the table have?  Names are convenient for people to refer to things but numeric IDs are somewhat better for computers. So, let’s say a node has an ID, a name, and (why not?) a description.

CREATE TABLE [dag].[DagNode](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Description] [nvarchar](250) NULL)

An edge connects two nodes.  The ID we gave nodes gives us an easy way to represent the nodes.  The natural way to present this is a two-column table where each row is two IDs: one for the parent (the source of the edge) and one for the child (the destination of the edge).

CREATE TABLE [dag].[DagEdge](
[ParentID] [int] NOT NULL,
[ChildID] [int] NOT NULL)

Leaves and Referential Integrity

The table creation above omits the referential integrity constraints in my live database. The DagEdge table in my database has foreign key constraints that make sure both IDs refer to an actual node.

But what about leaves? In graph theory, a leaf is a node with only one edge. But that has some limitations in a database. First, the items we’re organizing in a DAG likely have different attributes than the name and description on a node. So, we create a new table for our leaves.

CREATE TABLE [dag].[DagLeaf](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL)

But then, we can’t use DagEdge to connect a node to a leaf because of the foreign key on ChildID. So, we create a new table to hold the node/leaf relationships.

CREATE TABLE [dag].[DagNodeLeaf](
[NodeID] [int] NOT NULL,
[LeafID] [int] NOT NULL)

And then add foreign keys to those referring to DagNode and DagLeaf, respectively.

While the DagNode and DagEdge tables are generic, we need a data table and a relation table for every type of data we’re going to organize into DAGs.

Examples

Code supporting this post is available on GitHub. That includes a script to create the tables mentioned above (complete with indices and constraints), a script to populate with some sample data, and a script full of sample queries to show how the data and functions behave. My examples are for SQL Server. Other databases of interest are MySQL and Postgresql. I’m happy to consider pull requests for those or others you may choose to implement.

Thanks

Thanks to CyberReef for allowing me to share this code. Something like it is used in the MobileWall product and related services. Hopefully, I didn’t add too many bugs as I tried to abstract the concepts.

Thanks, also, to my colleagues Carmen, Jessica, and Siva who reviewed the original code and provided valuable feedback.

 
Leave a comment

Posted by on December 4, 2025 in Software techniques

 

Tags: , , ,

Say It with Fonts

I once worked with a technical writer who had the ironic initials DOC. I was occasionally diverted from my software development tasks to help her with technical documentation and I learned a lot from her.

One of the things I learned was how effective it can be to use a well-selected font to convey information. Any writer (and most readers) see the need for and use of different fonts for headlines vs. body text, and to add emphasis in one way or another. But one thing we did in that environment that has shaped my technical writing ever since is use a special font for user interface text.

As a brief aside, it’s helpful to understand the difference between a typeface and a font.

  • A typeface (sometimes called a font family) is a design for letters, numbers, and symbols that has some unifying design goal. A typeface is often referred to by name. I grew up with Helvetica. All the cool kids are using Aptos now. You’d recognize Impact as the typeface used for text over photos in memes, even if you didn’t know the name. Other examples are Courier and Bookman.
  • A font is a typeface with specific properties like height (usually in points), weight (light, normal, bold), style (italic, roman), and sometimes width.

Thus the font “Courier, 12 pt. bold” is a 12-point-high, bold rendering of the Courier typeface.

Bonus fact: in typography, “roman” generally means upright. That is in contrast to “italic” (or oblique) text, which leans to the right. (Confusingly, Times New Roman is a typeface and you can certainly use Times New Roman, italic as a font.)

Back to using fonts in technical documentation; Unlike content in newspapers and novels, technical documentation talks a lot about things you see on the screen and things you type in response. Often user input is rendered in a monospace typeface, one where every character has the same width. Many systems have good support for this. A common idiom for marking user input is with backticks (`). Markdown does this, GitHub comments, and even recent versions of Outlook Web Access implement it. (If you work in raw HTML, you can think of the backticks like <code> tags.) A common style for this is to render the text between the backticks in monospace (often Courier), one point size smaller than the surrounding text, and bold. With this convention, `this is user input` is rendered something like this is user input.

The innovation I learned from DOC is to also use a specific font for user interface text. The font needs to be different enough from the surrounding text to stand out but not so different that it’s unattractive. The general rule I use is: a narrow version of the body type, one point size smaller, and bold. In modern Microsoft Word, the body type is Aptos 12 pt. so my UI text is Aptos Narrow, 11 pt., bold. This might look like “Type a value in the Username field.”

Creating a UI Text Style

I find it strange that I have never found a system that has this built in. But I have developed some workarounds.

Trac

When I first worked with Trac, I added a macro that added UI text formatting. I wanted something easy to type but with a sort of “quoting” vibe (like backticks are backward single quotes). I settled on double less than (<<) and double greater than (>>), which I intended to be reminiscent of guillemets («, »). With this convention, “Type a value in the <<Username>> field” is rendered like the last example. I still use this nearly every day.

Desktop Microsoft Word

In Microsoft Word, you don’t need to write any Python to create new style.

  1. Find Styles in the Home ribbon.
    Styles section of Home ribbon in Microsoft Word
  2. Click the button on the right to expand the style box.
    Expanded Styles box in Microsoft Word
  3. Click Create a Style to open the Create New Style from Formatting form.
    Simple Create New Style from Formatting form in Microsoft Word
  4. Click Modify… to show more options in the form.
    Complete Create New Style from Formatting form in Microsoft Word
    • Enter a name of your choice.
    • Pick Character as Style type.
    • Pick Default Paragraph Font for Style based on.
    • Pick a font and size as appropriate. (Aptos Narrow works for recent versions of Microsoft Word where the default font is Aptos.)
    • Pick other options as desired.
  5. Click OK.

Now your new style is available to apply to any text in your document.

Microsoft Word Online

It’s not quite that easy in Office 365 (or whatever they are calling it this week). The online versions of Microsoft Office tools don’t have all the features of the desktop versions. However, the online and desktop versions work well together so you can add a style to an online document with the desktop tool. (No doubt this depends somewhat on what licenses you have and other details but this works for me.)

  1. In the online version of Word, drop down the Editing button and pick Open in DesktopEditing menu in Microsoft Word online
  2. When prompted, confirm you want to Open Word.
    Confirmation dialog to open Word on the desktop
  3. Create the style as above (or do whatever other editing you want to do).
  4. Close the desktop app (that’s all, just close it!).
  5. Click Continue Here in the online app
    Confirmation dialog to continue editing online
 
Leave a comment

Posted by on November 12, 2025 in Uncategorized

 

Tags:

Understanding Mobile Device Enrollment

They say there are only two hard problems in computer science:

  1. Naming things
  2. Invalidating cache contents
  3. Off-by-one errors

I was reminded of the first problem as my company recently struggled to communicate clearly among ourselves and with vendors about the types of device enrollment in Mobile Device Management (MDM) systems.

I couldn’t find any industry-standard terms that applied here. If I’m wrong, I’d be happy to hear about it. If I’m right, maybe others will find the following names and definitions useful.

Enrollment Classes 

For enterprises, Apple Business Manager, Android Zero Touch Enrollment, and Samsung Knox are reliable, large-scale methods of enrolling devices in MDM management.  However, only authorized resellers can add devices to those systems.  A business buying a fleet of phones from a carrier or major retailer can rely on the reseller setting their phone up for easy management.   

However, an MVNO or other service provider cannot always establish the “chain of custody” necessary to prove ownership and get devices into those systems.  There are other methods, but they have limitations. Apple and Google have different procedures and names for their device enrollment processes. Platform-agnostic terminology can make it easier to talk about the end state of an enrolled device regardless of platform. Toward that end, we defined three classes of enrollment. 

  • Class A enrollment is the most secure and permanent. The MDM has nearly complete control of the device and after a factory reset the device will be automatically reenrolled in the MDM which can reestablish control. 
  • Class B enrollment is nearly as good.  The MDM has nearly complete control of the device, but a factory reset will disconnect the device from the MDM and require manual intervention to reenroll the device. 
  • Class C enrollment is useful but fairly weak. The MDM can control and monitor some aspects of the device, but the device holder has the ability to bypass the MDM controls and make changes that put the device or its user at risk. 

Class A 

Class A enrollment requires a device to be added to a zero-touch enrollment platform: Apple Business Manager (for iOS), Knox Mobile Enrollment (for Samsung), or Zero-Touch Enrollment (for Android, including Samsung). 

The zero-touch enrollment platforms are best suited for organizations managing large fleets of devices.  While some exceptions can be managed with effort, the usual path is that the organization buys devices from an “authorized reseller” who adds the devices to the platform for the organization. 

  • Advantages: Class A is sticky (even a factory reset doesn’t remove the device from the MDM) and easy (the devices are put in the portal by the reseller, and the organization doesn’t have to do anything to enroll them). 
  • Disadvantage: Class A is very difficult to add to existing devices. 

Class B 

Class B enrollment can be done to existing devices without concern for the zero-touch enrollment platforms. 

This is useful for organizations trying to onboard existing devices while maintaining tight control.  A Class B enrollment leaves the device “owned” by the MDM so it can enforce always-on VPN and other privileged policies. 

  • Advantages: Class B does not require the devices to be in a zero-touch enrollment platform, and it allows privileged policies to be enforced. 
  • Disadvantages: Class B requires touching every device, requires the device to be factory reset (“wiped”), and the device does not automatically reenroll after another factory reset. 

Class C 

Class C enrollment can be done to existing devices without the need to factory reset them. 

This can be useful for adding management to deployed devices.  However, because the device is not “owned” by the MDM, privileged policies like always-on VPN cannot be enforced. 

  • Advantage: Class C does not require a factory reset. 
  • Disadvantage: Class C does not prevent the user from disabling important policies like always-on VPN, or even from uninstalling the MDM client.  
 
Leave a comment

Posted by on November 5, 2025 in Definitions

 

Tags:

Test 25x Faster!

My very first professional programming project was debugging and completing a lab automation system written in BASIC. It ran on a desktop HP computer and controlled instruments and equipment through GPIB. The challenge was that it controlled experiments in “real time,” not the “really fast response” that “real time” often means, but rather by the clock. It would do something, wait a minute or 15 minutes or something, do the next thing, wait a while, etc. The system started with bugs fairly early in the run of the experiment so I could start the program running, take a short break, and come back to find the program had crashed. I’d figure out what went wrong, fix it, and start the program again. But this time the program didn’t crash in the code I’d just fixed; it ran longer and crashed in 10 minutes instead of the previous five. Do you see where this is going? Eventually, I had to wait hours for the next crash and the better the code got, the longer I had to wait to fix the next issue!

My current team also does work that has to happen by the clock. The software counts some things and takes certain actions at certain limits. The counts reset at the end of the period which might be an hour, a day, a week, or a month. We can fake the data that causes the counts to change, but we don’t want to wait around to see if a monthly action works as it should. And messing with the system clock to fool the software is messy. Fortunately, one of the most interesting aspects of the system (one that we need to test carefully and repeatedly to avoid regression) involves when things are supposed to happen at different time scales. Do things that happen at a small time scale interact appropriately with things that happen at a larger time scale?

Once we were confident that the system properly recognized the end of an hour, day, etc. (in core code that was unlikely to change), we sought a way to speed up testing of other features so we didn’t have a month-long test in our release process. What we realized is that an hour is 1/24 of a day and a day is around 1/30 of a month. So if our production system is primarily concerned with days and months, we can take production configuration, change days to hours and months to days, and test in roughly 1/25 of the time. An overnight test with this substitution effectively tests two weeks (14-18 days) of real execution that straddles a month boundary. A weekend-long test covers 3-4 months of real world cycling through the logic in the system! And we don’t have to disable NTP or play any other shenanigans with the system time.

Just don’t ask me what happens around Daylight Saving Time transitions.

 
Leave a comment

Posted by on June 17, 2024 in Uncategorized

 

Tags: , ,

Everything Old is New Again

Imagine a data processing system that takes advantage of local computing resources to provide a rich user experience and robust data validation while offloading a central computing system. It presents complex forms composed of smart fields that prevent entry of invalid data. Much of the validation is done locally, based on attributes applied to the fields to provide prompt feedback. However, when necessary, input can be validated against lists of values retrieved from the central system. When the form is complete, the user submits the data to the central system as a package that gets processed all at once before returning a success indicator or a failure indicator with a possible list of error messages to guide revising the data for resubmission.

Some readers will think it obvious that the data processing system is the World Wide Web. HTML forms — especially when souped up with modern web frameworks — support complex data validation and submit fields to a web server to update centrally stored data. If you’ve ever bought anything from an eCommerce site, you’ve used this technology.

However, as I was writing the first paragraph, I was not thinking of PCs running web browsers and data centers full of web servers, I was describing mainframes and their terminals. Undoubtedly, the web browser provides a more rich and responsive user experience than a 3270 terminal (which, among other things, lacked graphics), but diagrams of communication between parts of the old and new systems are identical in all but details.

I can’t say if the designers of HTML modeled their form system on mainframe data entry but it kind of looks like it. If they didn’t, they might have made their job easier had they done so. The suggestion that a programmer should not reinvent the wheel is often interpreted as applying to recreating other contemporary technology with similar features. However, it also means being aware of the history of computer science and technology in sufficient detail that you can learn from historical systems to make your job easier and your system stronger.

 
Leave a comment

Posted by on April 29, 2024 in Uncategorized

 

What Makes a Great Programmer

I once worked with a great programmer who had a sign on his wall that said:

The Three Qualities of a Great Programmer: Laziness, Impatience, and Hubris

All of those sound like negative attributes but considered in the right light, they are very insightful. I’ve thought of them often in the years since I read them on his wall.

Laziness

A great programmer is too lazy to perform simple, repetitive tasks and would rather spend two hours writing a script to do a task than 10 minutes doing the same task. (This always reminded me of a short story I once read about a young student too lazy to do their homework so they invented a machine to do it for them.)

Of course, if the task is only done once, automating it is not very practical. But if it’s preparing a weekly report or something, the scripting time is very well spent and pays for itself fairly quickly. And that ignores the fact that someone else can now run that script, can read it to learn about the process, etc.

Impatience

A great programmer is too impatient to wait for a slow-running program to finish, especially if that program does a repetitive task (see above) the programmer wants it to be fast so they can review the output and get on with other work.

Human interaction with a computer is incredibly sensitive to delay in response. If you’ve ever worked with a laggy mouse with a low battery, you know how it feels to have your actions take too long to take effect. Similarly, if you’re running a spreadsheet macro or small program it has to respond in well under a second or you lose flow and get frustrated. If the code originally took 1.5 seconds to run, making it run in 8/10 of a second may not seem a big improvement but it is the difference between a smooth work flow and a frustrating one, the difference between a tool you’ll come back to and come to rely on, and one you’ll set aside and not use.

Hubris

Maybe a great programmer isn’t hubristic. Maybe this is the humorous entry in the list. On the other hand, good software can save bad hardware. And it can do it after you ship. I’ve heard it said, “you can do anything in software” and I almost believe it.

A great programmer is likely to see a challenge and say, “I can do that!” Hubris? Confidence? Optimism? Maybe some of all of those. A meek programmer may look at the same problem, think it impossible, and not dive in. Without a certain measure of hubris, some of our greatest software systems might not exist. Their creators saw a challenge and believed they were up to it, and we all benefit from that.

 
Leave a comment

Posted by on April 23, 2024 in Uncategorized

 

Three Rules for Managing a Software Team

The bus factor of a team is a colorful illustration of the risk of having unique resources. How many team members could get hit by a bus without crippling the team? Less gruesomely, how many could be out with an illness? Or, more kindly, how many could take vacation time at once? Early in my career, I learned that if you’re irreplaceable, you’re unpromotable. And as long as I’ve managed software development, I’ve been acutely aware of the team’s bus factor. I think I also had an inkling of something that I’ve only recently put into words, perhaps a corollary of the bus factor. I’ve start thinking of these considerations in terms of three rules.

No Singletons

In a software system, a singleton is a unique resource for which you work to make sure there is only a single instance. Having a high bus factor (low risk from having a team member unavailable) means not having a singleton on the team.

Rule 1: Nothing the team does should be done by only a single member of the team.

No Specialists

What I’ve recently realized is that a singleton may also be a specialist. Not only are they the only one who does a certain task, but they may spend so much time doing it that they aren’t involved in any other work of the team. If that task is no longer relevant (technology evolves), what do you do with that person who knows your business and processes but only contributes to one narrow part of it? The answer is that you shouldn’t put people in that awkward position. If you need a specialist, hire a consultant. But if you have someone on your team with a specialty, you owe it to the team and the individual member to cross train them on other things you do.

Rule 2: No one on the team should do only one thing the team is responsible for.

No Exceptions

I’ve mostly managed small teams where I’ve been an active contributor to the code base, even if only part time. In that sense, I’m not a specialist. And when I write code, I am not exempt from our usual process: someone reviews and approves my code before it goes into production. An experienced developer can point out errors in my approach or implementation, but even a relatively new developer can ask insightful questions from a naive perspective. I am diligently humble enough to accept input from reviewers that I lead.

But if I am not a specialist, neither should I be a singleton. I could get hit by a bus. Or get sick (as happened recently). Or take some time off. In my absence, someone else can lead a team meeting or make an informed decision and work goes on. I talk to my team frequently about business requirements and other constraints not only to guide their minute-to-minute development decisions but so that the team doesn’t have a leadership singleton.

Rule 3: The first two rules apply to everything, including team leadership.

 
Leave a comment

Posted by on April 10, 2024 in Uncategorized

 

Tags:

Language Shapes Thought

My wife is a public relations professional. She works daily with the English language. When reviewing or editing a colleague’s or client’s writing, she is constantly looking to see if it is clear and if it is correct. English has rules and while they may be looser than those imposed by computer languages, they are important to clear communication. We frequently discuss the irony that when I am reviewing code, I am doing the same thing but in various computer languages. I work to make sure the code clearly communicates intent to the computer and comments clearly communicate intent to other developers.

Some formality can be achieve with things like UML but, for the most part, my colleagues and I talk about code using English. We might say that two files in the same directory are “siblings.” Or that one node in a DAG is a cousin to another. These kinds of relationships are important and I’ve found myself realizing that there’s no easy way to refer to a parent’s sibling. Of course English has “aunt” and “uncle” but those gendered words don’t fit well in computer science.

I was reminded of this recently when I read A Psychologist Explains How The Language You Speak Manifests Your Reality in Forbes. It talks about how language shapes perception and what you can convey. In Mandarin, it seems, the word you choose for “aunt” conveys whether she is on your father’s or mother’s side, as well as whether she’s an aunt by birth or marriage. (They don’t say if there is a vague, gender-neutral word for “parent’s sibling.”) I was also intrigued by Bilingualism Is Reworking This Language’s Rainbow (in Scientific American) which discussed how some human languages are better than others for describing a range of colors.

Similarly, computer languages restrict what you can express easily, and in some cases limit what you can do at all. Early in my computer science education, I took a course called “Computer Languages.” It was a survey course designed to introduce students to varied languages. It covered APL, LISP, Fortran, and SNOBOL. The instructor drove home the strengths and weaknesses of the languages by having us use each language to solve a problem it was ilsuited for. We were tasked with solving the travelling salesman problem in Fortran. That is a classic illustration of the power of recursion, often used to demonstrate how LISP works. But Fortran does not support recursion!

It is said that to a man with a hammer, every problem looks like a nail. If Fortran was the only language in my toolbox I could be forgiven for using it when presented with a problem better suited to LISP or C. But my toolbox contains more than a dozen languages. I can and do pick from a handful of modern candidates when picking the tool for a new problem.

As a hiring manager, I’ve often said that I would prefer not to hire a developer who only knows one language. But even two similar languages are fairly limiting. I’d look for a compiled language and a scripting language. Or a procedural language and a declarative language. If you know C, you can get up to speed on C# fairly quickly. But if all your experience is in procedural languages, you’re likely to write a lot of loops in C# instead of using LINQ. If you know SQL, then LINQ feels natural. Like a Tsimane’ speaker borrowing azul from Spanish to describe blue, knowing multiple computer languages allows you to express more programs more clearly than you could otherwise.

Languages — human and computer — grow by borrowing from other languages. Speakers and programmers benefit from knowing more than one language, even if they routinely use just one. Go learn another language; whether it is your 3rd or 13th you’ll be a better programmer for it.

 
Leave a comment

Posted by on March 20, 2024 in Uncategorized

 

Tags: ,

Not So Intelligent

A long time ago I implemented a program that learned to play Tic Tac Toe. I was a new programmer and not particularly skilled but I’d read an article about how someone had taught a matchbox to play and I thought I was at least that good. The article I’d read was in Scientific American but you can read about MENACE today on Wikipedia. The original work was “AI” in 1961! My program started out only “knowing” the allowable moves and as we played it “learned” strategy. After a dozen or so games, I couldn’t beat it. Even though I wrote it, I marveled at this program’s behavior.

Arthur C. Clarke famously said, “Any sufficiently advanced technology is indistinguishable from magic.” There are 512 possible final boards of Tic Tac Toe and more than 250,000 games (paths from an empty board to one of those 512 finishes). This is on the border of the ability of a human to inspect and understand. The fact that I didn’t really know how my program kept me from winning Tic Tac Toe didn’t make it magic, it was just sufficiently complex that I couldn’t intuit its inner workings. (I was a novice programmer then but decades years later I still find wonder in this program.)

Like those matchboxes and my novice program, a lot of things these days are called “artificial intelligence” but how intelligent are they? I’d argue, “not very.”

Neural Networks

Many AI systems — including my program — are trained to achieve their goal. The system starts with some constraints and rules then by exposing it to data (games of Tic Tac Toe, for instance), you train it or it learns how to behave in a way that seems intelligent.

One type of system that can be trained to seem intelligent is a neural network, based at least loosely on a limited understanding of how the brain works. With a neural network (NN), you repeatedly apply inputs and desired or expected output, and the network magically sets itself up to produce that output when the same or similar inputs are presented. (Please take a hint from the use of “magically” to realize that I’m being vague and general. I’m sure I’ve got details wrong.)

A common demonstration of neural networks is classification of data like images or audio clips. After training, you might use a NN to try to tell if a sound came from a flute or a saxophone, or what kind of animal was in a picture. Say you had several (or several hundred) images of cats and dogs. You might try to train a neural network to discriminate between them.

The test is trying the NN on a novel input.

What Went Wrong?

Somehow, the NN might have noticed that the cats all have a horizontal stroke at the bottom center but the dogs are all hollow. That fits the training data and leads to the same wrong conclusion with the test data.

Whatever the method, the NN focused on the wrong feature of the input and drew the wrong conclusion. When shown an image of a standing animal, it labeled it a dog. This is a classic example of GIGO: garbage in, garbage out.

Considering this small training set, we could try to fix the problem by adding standing cats and sitting dogs to the training images. Then the NN might focus on pointy tails or some other irrelevant feature and still reach the wrong conclusion. (Challenge: can you explain the difference between a cat and a dog well enough for another person following your directions to properly conclude the last image is a cat?)

Humans are great at pattern recognition and extrapolation, at least to the limits of our capacity. We can look at the data we trained the NN with and see things that might be wrong. But if you trained the NN on thousands of drawings (or thousands of photos!), it would be nearly impossible for a human to review the training data, determine the problem, and fix it. The larger the data set, the harder it is to tell what is wrong or to correct the problem.

Generative “AI”

If you play Scrabble or Wordle, you are likely familiar with the fact that “e” is the most common letter in English text. Different analyses show “t” or “a” second. As you might expect, “q” and “z” are fairly uncommon. What if you looked for the frequency of two-letter combinations? You might think “th” would be fairly common (indeed, it’s the most common) and something like “qz” fairly uncommon or absent. Things get interesting with three-character combinations; It turns out they embody a lot of the word-forming rules of the language. If you “randomly” generate text that adheres roughly to the same frequency of trigrams as the original language, you get something that is readable nonsense. Readable, because all the letter combinations look familiar to us and we can sound them out. Nonsense because there are very few actual words in the text.

What if, instead of the frequency of one letter following another, we considered the frequency of one word following another. If we had a large volume of text to train the system with, we could generate novel text from that training set.

If you used all of Shakespeare’s plays as a training set and asked the system to generate some text, it could. The output might be fairly readable (though it likely wouldn’t have much of a plot). But it would be in Shakespeare’s English, not modern, with nary an acronym or neologism in sight. And it would be more like a play than the sonnets Shakespeare is also famous for.

While Shakespeare was prolific, his collected work is still a small part of a small library. And it is microscopic compared to all the text on the Internet: digitized books but also software user manuals, scientific papers in online journals, social media posts, and on and on. That huge volume of text is what is used to train large language models (LLMs). Once the LLM is trained with a large fraction of the Internet, you can ask it to generate novel content. This content will sometimes be gibberish but a really good system will produce text that seems quite coherent.

If a LLM’s training material includes racist rants on social media, there’s a chance it will generate text that reflects that bigotry. Does the LLM have a conscious bias against certain people? No, it’s not even conscious. But it can look that way. And it’s not a good look. Remember GIGO. The system reflects the strengths and weaknesses and biases of the input. Do cats always lie down? Are plays always in Shakespeare’s voice?

A lot has happened since 1961 and many years since have been labeled the “year of AI.” Recent developments have lead to AI techniques yielding more useful applications. Maybe that year has finally come. With AI as with many things, we should be mindful of creators’ intent and the systems’ affects but let’s not cloak such applications in mystery.

 
Leave a comment

Posted by on February 25, 2024 in Uncategorized