Handout 34

Some Linux Fundamentals*


A Brief History of Linux

Linux is a freely distributable version of UNIX developed primarily by Linus Torvalds at the University of Helsinki in Finland. Linux was developed with the help of many UNIX programmers and wizards across the Internet, allowing anyone with enough know-how and gumption the ability to develop and change the system. The Linux kernel uses no code from AT&T or any other proprietary source, and much of the software available for Linux is developed by the GNU project at the Free Software Foundation in Cambridge, Massachusetts. However, programmers all over the world have contributed to the growing pool of Linux software.

Linux was originally developed as a hobby project by Linus Torvalds. It was inspired by Minix, a small UNIX system developed by Andy Tanenbaum, and the first discussions about Linux were on the USENET newsgroup comp.os.minix. These discussions were concerned mostly with the development of a small, academic UNIX system for Minix users who wanted more.

The very early development of Linux was mostly dealing with the task-switching features of the 80386 protected-mode interface, all written in assembly code. Linus writes,

``After that it was plain sailing: hairy coding still, but I had some devices, and debugging was easier. I started using C at this stage, and it certainly speeds up developement. This is also when I start to get serious about my megalomaniac ideas to make `a better Minix than Minix'. I was hoping I'd be able to recompile gcc under Linux some day...

``Two months for basic setup, but then only slightly longer until I had a disk-driver (seriously buggy, but it happened to work on my machine) and a small filesystem. That was about when I made 0.01 available [around late August of 1991]: it wasn't pretty, it had no floppy driver, and it couldn't do much anything. I don't think anybody ever compiled that version. But by then I was hooked, and didn't want to stop until I could chuck out Minix.''

No announcement was ever made for Linux version 0.01. The 0.01 sources weren't even executable: they contained only the bare rudiments of the kernel source, and assumed that you had access to a Minix machine to compile and play with them.

On 5 October 1991, Linus announced the first ``official'' version of Linux, version 0.02. At this point, Linus was able to run bash (the GNU Bourne Again Shell) and gcc (the GNU C compiler), but not very much else was working. Again, this was intended as a hacker's system. The primary focus was kernel development---none of the issues of user support, documentation, distribution, and so on had even been addressed. Today, the Linux community still seems to treat these ergonomic issues as secondary to the ``real programming''---kernel development.

Linus wrote in comp.os.minix,

``Do you pine for the nice days of Minix-1.1, when men were men and wrote their own device drivers? Are you without a nice project and just dying to cut your teeth on a OS you can try to modify for your needs? Are you finding it frustrating when everything works on Minix? No more all-nighters to get a nifty program working? Then this post might be just for you.

``As I mentioned a month ago, I'm working on a free version of a Minix-lookalike for AT-386 computers. It has finally reached the stage where it's even usable (though may not be depending on what you want), and I am willing to put out the sources for wider distribution. It is just version 0.02...but I've successfully run bash, gcc, gnu-make, gnu-sed, compress, etc. under it.''

After version 0.03, Linus bumped the version number up to 0.10, as more people started to work on the system. After several further revisions, Linus increased the version number to 0.95, to reflect his expectation that the system was ready for an ``official'' release very soon. (Generally, software is not assigned the version number 1.0 until it is theoretically complete or bug-free.) This was in March of 1992. Almost a year and a half later, in late December of 1993, the Linux kernel was still at version 0.99.pl14---asymptotically approaching 1.0. As of the time of this writing, the current kernel version is 1.1 patchlevel 52, and 1.2 is right around the corner.

Today, Linux is a complete UNIX clone, capable of running X Windows, TCP/IP, Emacs, UUCP, mail and news software, you name it. Almost all of the major free software packages have been ported to Linux, and commercial software is becoming available. Much more hardware is supported than in original versions of the kernel. Many people have executed benchmarks on 80486 Linux systems and found them comparable with mid-range workstations from Sun Microsystems and Digital Equipment Corporation. Who would have ever guessed that this ``little'' UNIX clone would have grown up to take on the entire world of personal computing?

System Features

Linux supports most of the features found in other implementations of UNIX, plus quite a few that aren't found elsewhere. This section is a nickel tour of the Linux kernel features.        

Linux is a complete multitasking, multiuser operating system (just like all other versions of UNIX). This means that many users can be logged into the same machine at once, running multiple programs simultaneously.  

The Linux system is mostly compatible with a number of UNIX standards (inasmuch as UNIX has standards) on the source level, including IEEE POSIX.1, System V, and BSD features.     It was developed with source portability in mind: therefore, you are most likely to find commonly-used features in the Linux system which are shared across multiple implementations. A great deal of free UNIX software available on the Internet and elsewhere compiles on Linux out of the box. In addition, all source code for the Linux system, including the kernel, device drivers, libraries, user programs, and development tools, is freely distributable.

Other specific internal features of Linux include POSIX job control (used by shells such as csh and bash),   pseudoterminals (pty devices), and support for national or customized keyboards using dynamically-loadable keyboard drivers. Linux also supports virtual consoles,   which allow you to switch between multiple login sessions from the system console in text mode. Users of the ``screen'' program will find the Linux virtual console implementation familiar.

The kernel is able to emulate 387-FPU instructions itself, so that systems without a math coprocessor can run programs that require floating-point math instructions.  

Linux supports various filesystem types for storing data. Various filesystems, such as the ext2fs filesystem, have been developed specifically for Linux.   Other filesystem types, such as the Minix-1 and Xenix filesystems, are also supported. The MS-DOS filesystem has been implemented as well, allowing you to access MS-DOS files on hard drive or floppy directly. The ISO 9660 CD-ROM filesystem type, which reads all standard formats of CD-ROMs, is also supported. We'll talk more about filesystems in Chapters 2 and 4.

Linux provides a complete implementation of TCP/IP networking. This includes device drivers for many popular Ethernet cards, SLIP (Serial Line Internet Protocol, allowing you to access a TCP/IP network via a serial connection), PLIP (Parallel Line Internet Protocol), PPP (Point-to-Point Protocol), NFS (Network File System), and so on. The complete range of TCP/IP clients and services is supported, such as FTP, telnet, NNTP, and SMTP.

The Linux kernel is developed to use the special protected-mode features of the Intel 80386 and 80486 processors.   In particular, Linux makes use of the protected-mode descriptor-based memory management paradigm and many of the other advanced features of these processors. Anyone familiar with 80386 protected-mode programming knows that this chip was designed for a multitasking system such as UNIX (or, actually, Multics). Linux exploits this functionality.

The Linux kernel supports demand-paged loaded executables. That is, only those segments of a program which are actually used are read into memory from disk. Also, copy-on-write pages are shared among executables, meaning that if several instances of a program are running at once, they will share pages in physical memory, reducing overall memory usage.  

The kernel also implements a unified memory pool for user programs and disk cache. In this way, all free memory is used for caching, and the cache is reduced when running large programs.

Executables use dynamically linked shared libraries,   meaning that executables share common library code in a single library file found on disk, not unlike the SunOS shared library mechanism. This allows executable files to occupy much less space on disk, especially those that use many library functions. There are also statically-linked libraries for those who wish to use object debugging or maintain ``complete'' executables without the need for shared libraries to be in place. Linux shared libraries are dynamically linked at run-time, allowing the programmer to replace modules of the libraries with their own routines.

About Linux's Copyright

Linux is covered by what is known as the GNU General Public License, or GPL. The GPL was developed for the GNU project by the Free Software Foundataion. It makes a number of provisions for the distribution and modification of ``free software''. ``Free'' in this sense refers to freedom, not just cost. The GPL has always been subject to misinterpretation, and we hope that this summary will help you to understand the extent and goals of the GPL and its effect on Linux.

Originally, Linus Torvalds released Linux under a license more restrictive than the GPL, which allowed the software to be freely distributed and modified, but prevented any money changing hands for its distribution and use. On the other hand, the GPL allows people to sell and make profit from free software, but does not allow them to restrict the right for others to distribute the software in any way.

First, it should be explained that ``free software'' covered by the GPL is not in the public domain. Public domain software is software which is not copyrighted, and is literally owned by the public. Software covered by the GPL, on the other hand, is copyrighted to the author or authors. This means that the software is protected by standard international copyright laws, and that the author of the software is legally defined. Just because the software may be freely distributed does not mean that it is in the public domain.

  GPL-licensed software is also not ``shareware''. Generally, ``shareware'' software is owned and copyrighted by the author, but the author requires users to send in money for its use after distribution. On the other hand, software covered by the GPL may be distributed and used free of charge.

The GPL also allows people to take and modify free software, and distribute their own versions of the software. However, any derived works from GPL software must also be covered by the GPL. In other words, a company could not take Linux, modify it, and sell it under a restrictive license. If any software is derived from Linux, that software must be covered by the GPL as well.

The GPL allows free software to be distributed and used free of charge. However, it also allows a person or organization to distribute GPL software for a fee, and even to make a profit from its sale and distribution. However, in selling GPL software, the distributor cannot take those rights away from the purchaser; that is, if you purchase GPL software from some source, you may distribute the software for free, or sell it yourself as well.

In the free software world, the important issue is not money. The goal of free software is always to develop and distribute fantastic software and to allow anyone to obtain and use it. In the next section, we'll discuss how this applies to the development of Linux.

The Design and Philosophy of Linux

Linux is primarily developed as a group effort by volunteers on the Internet from all over the world. Across the Internet and beyond, anyone with enough know-how has the opportunity to aid in developing and debugging the kernel, porting new software, writing documentation, or helping new users. There is no single organization responsible for developing the system. For the most part, the Linux community communicates via various mailing lists and USENET newsgroups. A number of conventions have sprung up around the development effort: for example, anyone wishing to have their code included in the ``official'' kernel should mail it to Linus Torvalds,   which he will test and include in the kernel (as long as it doesn't break things or go against the overall design of the system, he will more than likely include it).

  The system itself is designed with a very open-ended, feature-minded approach. While recently the number of new features and critical changes to the system have diminished, the general rule is that a new version of the kernel will be released about every few months (sometimes even more frequently than this). Of course, this is a very rough figure: it depends on a several factors including the number of bugs to be fixed, the amount of feedback from users testing pre-release versions of the code, and the amount of sleep that Linus has had this week.

    Let it suffice to say that not every single bug has been fixed, and not every problem ironed out between releases. As long as the system appears to be free of critical or oft-manifesting bugs, it is considered ``stable'' and new revisions will be released. The thrust behind Linux development is not an effort to release perfect, bug-free code: it is to develop a free implementation of UNIX. Linux is for the developers, more than anyone else.

Linux network configuration (as well as most others)

  Before you can configure TCP/IP, you need to determine the following information about your network setup. In most cases, your local network administrator can provide you with this information. 

TCP/IP HISTORY

TCP/IP originally was developed as part of an experimental research project funded by the Department of Defense (DOD). During the Grenada invasion, it became quickly apparent the various computers used by the different military branches could not talk to each other. The Army, Air Force and Navy each had networks and computer systems from different vendors. The DOD needed a network built to bridge the gap between the systems. One of the main goals of the DOD was to have a network that was robust enough to withstand damage to a single part of the network and to continue critical transmissions via alternate routes the network and to its final destination.

Although TCP/IP began as a government project, its development and evolution is actually the result of efforts by many different groups. Therefore, TCP/IP is nonproprietary and available at no charge to end-users. The UNIX platform was the original home of TCP/IP and is actually built into the UNIX operating system, however other network operating systems support TCP/IP. TCP/IP is the communications protocol used to connect hosts on the Internet and it has become the standard protocol for transmitting data over all networks. It is a key factor in the spread if the Internet.

TCP PROTOCOL OVERVIEW

TCP's function is to verify the correct delivery of data from the client to the server by providing reliable, stream-oriented connections. It also manages the data movement between applications on different computers. TCP is layered over IP and enhances the IP functionality in its weak areas.

Functions of TCP

IP PROTOCOL OVERVIEW

The Internet Protocol was developed to create a "Network of Networks" (the Internet). One device can provide the TCP/IP connection between all devices on a LAN (local area network) and the rest of the world. IP's function is responsibility for moving a packet of data from one node to another and across their respective networks. This is accomplished by using a four-byte destination address assigned to each packet. This address is called an IP number, which will be described later.

IP is known as a "connectionless" protocol, which means that it does not save a path in the network for the duration of the connection. Before transmission, IP breaks down large chunks of data into more easily manageable IP packets for delivery across the network. Each of these smaller portions is responsible for finding its way across the network. When message arrives at an IP router, the router decides the best path to send the message to avoid network traffic and to arrive at its destination.

IP ADDRESSING

Just as the postal service uses a unique address to deliver mail to a specific location, IP uses a unique code, or address, in order to find a particular device on the network. Each network segment is assigned a unique number and each host on the network segment is assigned a unique number. Together the numbers become the address. An IP address is a four-byte value that is expressed by converting each byte into a decimal number between 0 and 255 and separating each byte with a period. This unique address then designates an individual computer, on an individual host, on a specific network. For example an IP address of 36.185.12.188 would designate a computer named 188, on network 12, on host 185, of network 36.

TCP/IP REFERENCE MODEL and the OSI MODEL

The TCP/IP reference model is based upon an open system model. Its functionality loosely corresponds to the OSI Model to provide standardization and meet basic functionality requirements. The OSI model is a modular, layered concept that is essential allowing for interoperability amongst different vendors' products. The layered approach allows for the rework or replacement of a functional block of code without reworking the entire network. It also minimizes problems when adding new applications or interfaces. TCP/IP is designed using the layered approach also. The table below compares the seven layers of the OSI model with the TCP/IP reference model.

         

Layer

OSI

OSI Description

TCP/IP

TCP/IP Description

7 Application End user services *Application Authentication, compression, end-user services
6 Presentation Format of Data    
5 Session Applications communcations Transport Data flow between systems
4 Transport Assures delivery    
3 Network Routing Internet Packet routing
2 Data Link Transmit/receive packets    
1 Physical Physical medium    

* TCP/IP's application layer handles the responsibilities of the application and presentation layers of the OSI model.

TCP/IP LAYERS

Application Layer

The application layer provides a set of interfaces for applications to gain access to networked services. This is the user's "window to the world". Some examples of these application protocols are: FTP for transferring files and SMTP for e-mail. Additionally, the application layer handles the data format information for networked communications by converting data into a generic format that can be understood and therefore processed by both the sending and the receiving applications. This is analogous to the functionality of the presentation layer in the OSI model.

Transport Layer

The TCP protocol provided in the transport layer of TCP/IP is responsible for establishing and maintaining communications between applications on a network. It must provide acknowledgment of receipt, perform flow control, ensure data integrity using sequencing and allow for the retransmission of packets if an error is detected. With the exception of flow control, these functions are located in the OSI model's session layer.

The transport layer is comprised of 2 standard transport protocols, TCP and UDP (User Datagram Protocol). As mention earlier, TCP provides a reliable data stream and that is connection oriented. UDP, on the other hand, provides an unreliable data stream and is considered connectionless. Depending upon the type of transmission required, either TCP or UDP may be used. TCP is mainly used when an application needs to transmit large amounts of data and needs assurance that the data will arrive correctly and in a timely manner. UDP is used for transmitting small amounts of data that do not require acknowledgment of receipt. Broadcasting and multicasting are other communications methods that frequently use UDP.

TCP is responsible for data recovery and integrity. It accomplishes this by providing sequence numbers on each packet. These sequence numbers are used on the receiving end to reassemble the original message into the right order. The sequence numbers are also used to detect errors in transmission and the subsequent data recovery and retransmission.

TCP provides flow control functionality to help prevent over congestion on the network. It does this by specifying an acceptable range of sequence numbers and then monitoring the traffic.

The TCP/IP transport layer achieves multiplexing, which allows multiple network connections to simultaneously take place. It accomplishes this by specifying ports and including port numbers with the TCP/UDP data. Each process that uses TCP/IP must have a protocol port number. This port number references the location of an application or process on a particular device. There are 65,535 ports available. The first 1,023 are the most commonly used ports and are assigned by the Internet Assigned Numbers Authority.

The source and destination IP addresses, coupled with the port number is known as a socket. Sockets are used by services and applications that need to establish a connection with another host.

Internet Layer

The Internet layer's responsibility is to route data within and among different networks using the IP protocol. Routers are located in this layer and are used to forward packets from one network, or segment, to another. IP is a connectionless protocol that does not expect an acknowledgment of receipt. Although this aspect makes IP unreliable, it allows for faster transmission. Acknowledgments and retransmissions are left to the TCP protocol in subsequent layers.

Each IP packet contains a source and destination address. If IP determines that the destination address is local, it looks up the hardware address of the destination machine and then forwards the information directly to its destination. If the destination address is remote, it checks its local routing table for a route to the destination host and sends the packet along that route. If a route does not exist in the local routing table, IP forwards the packet to the local default gateway, which will complete the routing.

Network Interface

The IP portion of TCP/IP ensures the seamless interface with non-TCP/IP network protocols by determining the appropriate header information to add to each frame. It then creates a frame in a suitable format for the type of network being used, such as Ethernet, Token Ring, or ATM. It then deposits the data into the frame and sends the frame into the network.


* portions from here and here.and here.