An Internet Protocol address is not a personal identifier

IP addressAs it has become more popular, many misconceptions about the Internet have crept into public consciousness. Because it is large, complex, and incomprehensible from any single point-of-view, many people make the mistake of thinking that there is some kind of technical mastery behind it all, to make it all work. Nothing could be further from the truth. The Internet is messy, error-prone, and inefficient. It was designed right from its origins to be thus — “assume the network is unreliable” was a mantra of its early developers. One part of the Internet doesn’t know what another part is doing or what it is like, and it doesn’t need to know this information to function. Limited and imperfect knowledge is all that is required for communication to succeed. The Internet is a machine language for communication, and just like a natural human language it has rules of grammar and spelling and pronunciation that are broken all the time by its native speakers.

An Internet Protocol address, or IP address, is a sequence of ones and zeroes that identifies a node on a network. A node is a computer or a smartphone or some kind of intelligent device that can communicate. Knowing this address, an intelligent communicating device can calculate how to move data farther along in the network towards the final destination, and ultimately right up to the destination itself. That is a procedure called routing. IP addresses are the information, and routing is the action based upon this information.

Notice that in talking about an IP address I didn’t say anything about a human being who might be using the node on the network. I didn’t need to, because an IP address doesn’t have anything to do with a person. An IP address pertains to the task of networking devices.

The Internet works according to a layered model. Layers or functions define a type of action that must be taken for communication to occur. There are identifiers that are meaningful in describing what is going on at each layer. An IP address is an identifier for a node, and it is meaningful at the Internet layer that defines how packets are routed to their final destination. But that is just one of the functions that must be carried out for successful communication to occur.

There is no authentication between layers. This is referred to as a “stateless” mode of communication. Think of it like the connection between a person dropping a letter in a mailbox and the post office worker who collects all the letters to take to the sorting station. What is the connection between these two steps in communication? There isn’t one, immediately. Each function can operate while treating logically prior functions as a fait accompli. I can drop off my letter with only the vague awareness that it should be picked up later; the post office worker can collect the letter with only the vague awareness that it is properly addressed and has the right postage.

With the Internet, there is no direct connection between a packet of data moving across a network and a human being who may have initiated the sending of that data. That is because the Internet developed from earlier “store-and-forward” models of communication, that worked much like the post office in delivering paper mail.

Network communication can be initiated either by a human being operating a computer, or by the computer itself. From the point of view of the network, there is no way to know the difference. That is like saying that the post office can deliver mail where it was individually sent by a human being or sent in bulk by a machine, or it is like saying that the telephone company can route a call whether a human being is calling or if a machine is robo-dialing the call.

In summary, the layered model of the Internet can be described as follows. A process runs on a computer, and may be something like browsing a web page or sending an email. The “conversation” between computers at this layer can be described as a session, and it has identifying descriptors including port numbers. A packet of data containing a very small part of the conversation is then transferred to the networking layer, where an identifying number called an IP address is attached to it, to help route this piece of the conversation to its final destination. Now properly addressed, this chunk of data is passed to the data link layer, where it is goes out an interface or port to exit the computer to traverse a network, where it will transit a series of interfaces on its way to the final destination. There is an identifier at this level too, and it is commonly an Ethernet address that is associated with an interface. Finally, a signal is sent across a wire or through the air to physically reach another interface. Once at the final destination, the entire procedure is followed in reverse, until the corresponding application at the other end receives this small packet of data, which is one piece among many of the overall conversation.

An IP address is not a personal identifier, and on its own cannot connect a human being to data transmitted across a network. The Internet does not have such a thing as a personal ID, like a Canadian has a Social Insurance Number in interactions with the federal government. The only way to connect a person with Internet use is by taking a bottom-up approach to the network layers, and capturing all of the meta-data and data involved along the way.

Here is what it would take to convincingly prove that a person did anything on the Internet. First, prove that a person made use of a networked device by possessing and operating it. Then ascertain the identifying information at the data link layer for the port that sent and received packets of data. Then connect this meta-data with the IP address used to identify the node in the whole Internet. Then isolate the session layer information that identifies the particular conversation. Finally, reconstitute the pieces of the conversation, such that the actual data transmitted or received can be determined.

When the Internet began 40 years ago, each of the layers had a physical analogue and it was easy to picture what was going on. An operator was a real person; a node was a real computer; a port was a physical connection with a wire attached and there was only one address needed to identify it uniquely. Now, the trend of virtualization means that every level is more abstract. A node can be a virtual machine, which can be one of dozens or hundreds of computing/communicating computer-like objects inside the physical box. The physical port can manifest itself as any number of network presences, each one of which can have many IP addresses associated with it. From a static and human-shaped Internet, we have evolved to a dynamic Internet consisting mostly of machines talking to machines.

The amount of information that must be collected to tie an Internet Protocol address to the activities of a person using the Internet is very large, and it is computationally and monetarily expensive to record and store all of this. Logging data and meta-data, on its own, serves no security purpose. Only analyzing logged data serves a security purpose. This too has costs, which are substantial.

It is technically possible to record every telephone conversation. Capturing, storing, retrieving and analyzing all of this information would have large costs. It is technically possible to open everybody’s mail. Recording all the information contained in postal addresses and letters and then analyzing all this meta-data and data would have large costs. For the Internet, large data centres would have to be maintained to store all the captured meta-data and data, and large numbers of skilled analysts would have to be employed to make sense of it all.

The replacement of IPv4 with IPv6 does not change the nature of the Internet or the purpose of the Internet layer. The Internet is still a “best possible effort” relay system, and the Internet layer is still stateless, which is to say it does not have error corrections at this stage.

It is not true that IPv6 addresses are “persistent” in a way that IPv4 addresses are not. In fact, the opposite is the case. IPv6 addresses are transient, and more removed from human agency. When IPv4 addresses were introduced in 1982, they were fixed, unique addresses assigned by human beings to networked devices. As IPv4 evolved, ways of assigning addresses that were more automated began to be used, such as the Dynamic Host Configuration Protocol. With DHCP, pools of addresses are controlled by an Internet Service Provider, and assigned to connecting devices and revoked from disconnecting devices without human intervention. Now, with IPv6, we have a built-in scheme of stateless auto-configuration, where the device itself “makes up” its own interface identity, solicits a network prefix from a router, and concocts a complete 128-bit address thereby. An IPv6 address is a unique identity among 380 undecillion possibilities, and is firmly in the realm of autonomous technology — it is all about machines talking to machines. The future shape of the Internet is such that IPv6 addresses not only can be but perhaps should be random, unpredictable, and utterly removed from human agency.

If one has in mind a single IP address fixed to one physical device, which is proven to be under the control of one identifiable human being, then one is thinking about the Internet as it was created in 1982. The Internet does not work that way now, and it will be even more removed from that in the future as it improves in scale with IPv6 and improves in security with strong encryption and authentication.

One might consider consolidating multiple sources of information. The Internet is a node-based network of networks, so let’s suppose we gather meta-data and data anywhere along the line from the source, through intermediate relays, to the final destination. The “store-and-forward” nature of the connection between the nodes is stateless, just like the connection between the layers in the network stack is stateless. Therefore, it is up to us to put the pieces together. A unique problem with coordinating information gathered from different nodes arises from the need to accurately establish a timeline. The “chain of custody” must be followed through time. Timekeeping with most computing and networking devices is not very good. The most discrete interval most devices comprehend is a millisecond. A gigabit network card can push packets across a wire at a rate of over 14,000 per second. That is a rate 14 times greater than the most granular measurement of time built in to most computers. The Internet has a feature called the Network Time Protocol, but there is no requirement to run it. Most networked devices are going to have the wrong time, uncoordinated time, and have a coarse measurement of time. Think of it like a grainy surveillance video in a store — it may record a robbery, but it may be unusable for the purposes of an investigation or a prosecution.

This is not to say that gathering information from many sources is impossible. It is problematic, though, and it is not a panacea to the substantial cost and effort required in order to convincingly tie an Internet Protocol address to the actions of a human being.

Trying to isolate Internet spying to a domestic context is impossible, because the Internet is by its very nature trans-national. The trends of virtualization and so-called “cloud computing” make this even more so. The connection between an IP address and a human being is a weak one, and the always indirect connection between what happens at the network level and what a person does has been made even more tenuous and remote by the explosive growth and speed of the 21st century Internet.

The Domain Name System

The Domain Name System (DNS) was first defined in 1983, and has been vital to the functioning of the Internet ever since.  This is the scheme whereby Internet Protocol addresses — which are hard-to-remember numbers — are translated into easy-to-remember names. DNS is hierarchical in structure, global in scope, and works in an automated way so that most of the people who use browsers and email and other Internet applications have no idea that it is there, let alone understand how it works. It is enough that DNS is the magic glue so that someone types a URL into a browser or puts an address on an email, and it simply works. A domain name is an address like www.example.com. An Internet Protocol address looks like 192.168.188.166 (IPv4) or 2001:f38:1f70::b99:df8:7148:6e8 (IPv6). Human beings are much better at remembering and using names, whereas computers are by nature number crunchers. DNS bridges the gap for human beings as users of computers, by translating name addresses to number addresses, and vice versa. It is essential that DNS do this in a way that is accurate, and which makes sense to people. Since its foundation in 1983, DNS has been very successful because of its accuracy and sensibility, and it has become a global classification scheme on a par with that triumph of nineteenth century standardization, the International Postal Union. A big part of the common-sense validity of the Domain Name System comes from the concept of the top-level domain. A top-level domain is the largest-scale category of the name, giving the general sense of the kind of entity that has the name.  The first top-level names were organizational, and applied only to the United States.  They were .edu, .org, .net, .gov, and others.  Educational institutions like universities belonged in .edu; non-profit, non-governmental organizations belonged in .org; Internet entities belonged in .net; governmental agencies belonged in .gov.  Until the early 1990s, the Internet was restricted from commercial exploitation, so the top-level domain of .com was effectively a joke.  When this restriction was lifted and when the World Wide Web was invented by Sir Tim Berners-Lee, .com suddenly became serious, and registering a name under .com became an essential part of doing business and eventually to protecting trademarks.

The Internet became international in scope, and DNS expanded with it. Two-letter country codes derived from ISO standards became top-level domains.  Now, top-level domains like .ca for Canada and .ua for Ukraine and .za for South Africa and about 200 other designations competed with the traditional organizational ones like .com.  With the delegation of authority away from the United States-based university and military researchers, the clarity of DNS began to erode.  Should a Canadian company register under .ca or under .com, or should it register under both?  My own employer faced this decision, and was lucky to get its corporate trademark registered intact under both .ca and .com, but many other entities had to make compromises, with the result that a customer who is an Internet user does not have the old, comfortable assurances of where a name logically belongs.

Further weakening of DNS occurred with top-level country codes that belonged to states that were too weak to have a viable Internet, but who had two-letter combinations that because of their appearance were valuable as ersatz organizational domains.  For example, the South Pacific island nation of Tuvalu has the top-level country code of .tv. To the 10,000 people who live on Tuvalu, the letters “.tv” probably mean Tuvalu, but to the almost 7 billion other people who live on our planet the letters “.tv” bring to mind the word “television.”  Accordingly, the .tv top-level country code was exploited, early on, by television stations and similar TV-related entities, for their Internet presence, even though that presence had nothing to do with a small South Pacific island.

The original organizational top-level domains such as .edu and .net have since been expanded in number by the Internet Corporation for Assigned Names and Numbers (ICANN) and the Internet Assigned Numbers Authority (IANA).  Names ending in .museum and .aero and .pro and others have joined the list, with varying rules of assignment and varying popularity in terms of adoption.  Most recently, the top level domain .xxx was sanctioned, intended for adult entertainment entities.  The intent of all of this expansion has been to internationalize and to democratize the Domain Name System, but its practical effect has been to devalue domain names.  Before, there was an artificial scarcity, and so a domain like “sex.com” was considered valuable enough that somebody spent 13 million dollars for the right to use it.  Now, companies looking to protect trademarks or secure coveted labels are registering in dozens of top-level organizational and country code domains, but it is an investment making diminishing returns.

One reason that chasing after prestigious domain names is becoming a nostalgic pastime is that search engines are now the primary means by which Internet users are finding what they want.  It once was the case that someone using a web browser would guess at a likely name address, and enter it manually in the URL field.  It made sense for any business to want to have a named presence that was short, easy-to-remember, and made sense for the kind of entity that they were.  Very few people type URLs any more, or even know what they are.  Web users are clicking on links, and those links are pushed at them by search engines.  There is no longer the requirement that the domain name in the URL be short and meaningful.  DNS is still needed to resolve the domain name in a URL to an IP address, but the content of the domain name is of less importance for human eyes than it was before.

 

I said that DNS thrives because of accuracy and sensibility.  I have addressed the issue of sensibility, arguing that the expansion of top-level domains and the increasing importance of search engines has eroded the primacy of short, meaningful domain names as the nonpareil sinecure of anyone’s presence on the Internet.  What I would not have expected is the need to address the issue of accuracy.  The honesty of the Domain Name System as a reliable means to translate a name to a number and vice versa is now under attack from an unexpected source: the United States government.  Yes, the U.S. government, which created the Internet through the Defense Advanced Research Projects Agency, is now considering hobbling DNS, by forcing authoritative bodies that control naming service servers to falsify their records.  The Stop Online Piracy Act (SOPA) went before the U.S. Congress last year, and it had the backing of lobbyists from the old-line entertainment industries that have seen their monopoly power of distribution badly eroded because of the Internet.  The bill sought to bring under criminal law the civil law torts that organizations like the Motion Picture Association of America (MPAA) and the Recording Industry Association of America (RIAA) claim.  SOPA would have forced search engines not to acknowledge an Internet presence that actually exists, forcng DNS servers not to do their jobs honestly — all at the behest of a “take-down order” issued by someone who claims rights to intellectual property.  Thankfully, the immediate threat of SOPA was withdrawn, but the persistent threat of forcing DNS to tell lies remains.

This legalized hacking strikes at the very heart of the Internet.  By cutting away at the integrity of search engines and DNS, SOPA and legal measures like it look to do damage to the electronic network economy that has grown up over the past twenty years.  Laws like these are being made by people who do not understand the Internet (U.S. Representatives and Senators) at the behest of people who are afraid of the Internet (the old-line entertainment industry).  An honest DNS is a bystander victim in the battle of old media versus the Internet.  If DNS dies because of legislated corruption, that would be a shame, as it has been a tremendous success, and it made sense for a long time.  However, its demise would be the end of a long-term decline, because the Internet is evolving to new ways of connecting people with information by means of networks and computers.  My hope is that SOPA and similar legislative attacks on the Internet in the U.S. and Canada continue to “miss the mark,” because these are sledgehammer solutions to non-existent problems, and they cannot work because they don’t match what is true about the way the Internet works.  28 years after DNS was invented, some people in power are finally becoming aware of what it is.  They don’t understand it, they don’t like it, and they are afraid of it, and they may succeed in destroying DNS as an honest arbiter.  They are too late, because the Internet is moving on, beyond the United States and beyond its twentieth century foundational principles.  The Internet is dead — long live the Internet!

Pardon Alan Turing!

Alan Touring

It is shocking to the very core to find out what happened to Alan Turing after the war. The achievements of Bletchley Park were kept in profound secrecy until the 1970s. No one knew the great debt of gratitude that we all owe to Alan Turing. Photo by Flickr User basegreen

Alan Turing is perhaps the greatest computer scientist who has ever lived, and among the greatest British scientists of the 20th century. He was a key figure among the Bletchley Park “boffins” who cracked the Enigma code used by Nazi Germany, and who built Colossus, the world’s first electronic and digital computer. Alan Turing and the Bletchley Park team are directly responsible for saving many merchant ships and warships and hundreds of lives in the Battle of the Atlantic, and they can be credited with making the war shorter than it otherwise would have been. Alan Turing is a war hero. As a computer scientist, he is the inventor of the deceptively simple Turing Test which is a benchmark for artificial intelligence to this day. He thought about the potential of electronic computers in a way that is so fundamental and revolutionary that it takes your breath away to try to grasp his concepts today. He pioneered the concepts of the algorithm and of a stored-program computer, and that is how we have come to understand electronic computing as a union of software and hardware.

Continue reading

Still waiting for Internet Protocol version 6

The Internet Protocol version 4 began in 1981, and eventually someone will send the last packet using it. I'll bet you they will send it from a horse-and-buggy North American ISP.

Sixteen years ago, the replacement for the Internet Protocol was made official. Years of work by groups of experts from around the world had solved the problems that were faced by the old way of doing things, which was called Internet Protocol version 4. For a short while, they called this grand collaborative effort IP “Next Generation” — lots of Star Trek fans in that bunch! — but eventually it was decided to call the new, improved set of rules “Internet Protocol version 6,” or IPv6 for short.

Continue reading

Digital vrs. Analog or Consumers vrs. Industry

To me, it is like the year 1900, when stable owners and buggy whip makers could force compliant lawmakers to enforce a 5 mile per hour speed limit on automobiles, or demand that a flag man walk in front of cars when they drove through town.

The technology to allow entertaining performances to be preserved and then experienced later, again and again, was invented in the nineteenth century. The photograph, the audio recording, and the motion picture are all nineteenth century sensations. Starting with these, there was some medium introduced between the artist and the audience. People could still go to plays and they could still listen to what was starting to be called live music, but they now could also go to a cinema and watch a movie, or they could buy a record and play it. A business was born to mediate between the artist and the audience, and it even came to be known collectively as “the media.” Continue reading

Encryption as a munition

The World Wide Web is considered to be the “killer app” of the Internet.  Because of Sir Tim Berners-Lee’s invention, computer networking moved from the world of the military, universities and research institutions, and into everybody’s home and business.  Electronic commerce followed, but many people do not realise that in early days an important ingredient was missing from the mix.  That missing ingredient was strong encryption, and only because we have it now — built in to every browser and every electronic commerce web server — that we are able to think about the Internet as a place to do business. Continue reading

Computers and the Tragedy of the Commons

People who use computers and networks are very comfortable with discussing questions of how these resources are to be used, but are very uncomfortable discussing what they should be used for, or why.  The technical mind-set prefers to dwell on considerations of means over ends, and computers are very much utilitarian objects.  Computers are how-to devices, and computer users and administrators and programmers prefer to be how-to problem solvers using them. Continue reading

Competitiveness and training

The economic logic of vendor-certified, instructor-led classroom training is compelling, but for some reason the lesson is not learned by employers.

The latest competitiveness index from the World Economics Forum is not good news for either Canada or the United States.  In the latest rankings, Canada slipped two spots to number 12, and the United States slipped one notch to number five. Continue reading

Paying twice to store data on disks

To store large amounts of data, we need to exceed the capacity of any one hard disk drive. The only way to do that is to have the data span multiple drives, but nevertheless look and behave as if it was one, really big, imaginary drive.

Computer data storage is an increasingly lucrative business.  The amount of data that needs to be kept — reliably, persistently, and with easy access — is growing constantly.  Vendors who sell hard disk drives and the related “storage solutions” are keeping pace with this demand.  All at a price, of course!
Continue reading

The spectrum of programming

In the beginning, everyone who worked with computer software was a programmer.  It was the most natural thing.  Software is the lines of code that make the computer chip, connected to its peripheral devices, do binary arithmetic and thus “process data” in ways that we want.  To work with computers was to write those lines of code, in order to see what the results would be.  Everyone was a programmer.
Continue reading