A Static Web Server Architecture

Martin F. Johansen, 2023-12-10

A project that demonstrates well how progsbase works is a static web server. It takes HTTP requests on one side and reads blocks off a disk on the other side. Except for drivers, everything in between is handled in progsbase components.

A static web server is an extremely useful peice of software. It can be used to serve all kinds of static data such as web-pages, images and audio and video. (Audio and video is usually served as fragments while they are playing for the listener or viewer.)

The software described here is used for this website, so do try it out to see how it performs: progsbase.com.

Components

Requests to a web server arrives at a server as multiplexed, encrypted, HTTP requests. For this to become a message requesting a static asset, three things has to be done first: 1) Each request must be demultiplexed. 2) SSL must be terminated by an encrytion library and 3) the HTTP requests must be converted to a standard progsbase message. A standard progsbase message consists of a message length followed by the message.

Step 0.1: Demultiplexing

Multiplexing and demultiplexing is used to provide several virtual communication channels on a single channel. For a web server, the requests usually come from many individual computers. They arrive at a computer as packets with addresses that are assembled to a message for a virtual communication endpoint.

HTTP requests are done using TCP. The addresses used by TCP is quad tuple: (src-ip, src-port, dst-ip, dst-port).

We need to provide virtual communication endpoints for the demultiplexed messages. Each endpoint can only handle a single request at a time. Apache will queue requests if all endpoints are busy. Apache will choose which end-point to use in a round robin way.

To achieve this, we use the Apache components proxy_module, proxy_balancer_module and proxy_http_module with the settings below. In the example below, Apache will distribute the requests to four endpoints running on ports 8080-3, each able to handle a single HTTP request at a time.

<VirtualHost *:443>
  ...
  <Proxy "balancer://mycluster">
    BalancerMember "http://127.0.0.1:8080"
    BalancerMember "http://127.0.0.1:8081"
    BalancerMember "http://127.0.0.1:8082"
    BalancerMember "http://127.0.0.1:8083"
  </Proxy>
  ProxyPass        "/" "balancer://mycluster/"
  ProxyPassReverse "/" "balancer://mycluster/"
</VirtualHost>

Step 0.2: SSL Termination

Our architecture does not include an SSL component. It could have, but the algorithms have not been made yet. Therefore, we will start off by getting Apache to terminate SSL and pass on an HTTP request to progsbase components.

To achieve SSL termination, we use the Apache components ssl_module with the setting below.

<VirtualHost *:443>
  ...
  SSLEngine on
  SSLCertificateFile /etc/ssl/private/progsbase.com.crt
  SSLCertificateKeyFile /etc/ssl/private/progsbase.com.key
  ...
</VirtualHost>

Step 0.3: GET Adapter

Now we have an HTTP message from a single source. An HTTP request is peculiar in that its header needs to be completely received and parsed before we can determine the length of the entire request. If there is a valid header "Content-Length", then that is the size of the body, the part following the header. If not, the body is zero.

We cannot use progsbase for the infrastructural part of this component. However, we can use it for the computational parts. First, the progsbase HTTP library's ParseHTTPRequest is used to parse the header and separate out the body. It returns true if the header is valid and the body has been completely received. Thus, it can be used when receiving the HTTP request. When it finally returns true, we have a complete request.

Here is the complete, parsed request:

public class HTTPRequest{
  public char [] host;
  public char [] method;
  public char [] url;
  public ByteArray content;
}

First, the host is used to tell which domain is requested. Using this, we can host multiple domains on a single web server, also called virtual hosting.

The method tells us the nature of the request. For our GET adapter, we only accept GET here.

The url tells us the identifier of the static asset being requested.

Finally, the content is ignored for GET requests.

In order to convert this to a plain request for a static asset, we need to do some further processing.

First, if the URL contains a '?', then that and what follows is discarded.

Secondly, if the URL ends with a slash '/', then 'index.html' is appended.

We are then left with a request of this kind:

public class GetRequest{
  public char [] filename;
}

We convert this using the progsbase Unicode library from UTF-16 to UTF-8 using the UTF16ToUTF8 function.

We pass this request on to the next level. The response is a JSON message formated as follows:

{
  "code": 200,
  "type": "text/plain",
  "contents": "AABBCC"
}

The code is a HTTP response code. The type is the HTTP MIME type of the contents. Finally, the contents is the hex-encoded data. We us the Bytes's libraries TextToBytesBase16WithCheck function to convert the contents into an array of bytes. If this function fails, a code of 500 is returned instead of the one given in the response.

Step 1: Response Cache

Now that we finally have message that is:

Just the required information and nothing more and
Encoded as a simple message: message length + message.

We can now start doing the actual part of the functionality.

The GET adapter has a TCP port on one end and a Unix Domain Socket (Inter Process communication socket) on the other. Only such sockets are used for communication going downwards.

It is easy to make a socket from one computer available on another using an SSH tunnel, so this mechanism works for internal communication on a single computer, and for communication between computers.

The easiest and fastest way to process a request for a static asset is to have that asset in memory. We can use the progsbase component ResponseCache for this. It implements a generic response cache that will cache responses to requests.

When initializing this component, we give it two configuration parameters:

How many elements can be cached.
The maximum size of an element to cache.

Say we have 4 GiB of RAM available for caching. We can then ask it to cache a maximum of 32,768 assets of a maximum of 128 KiB each. If the asset is cached, it is simply served from cache, if not, a request is done to the next level.

This component has a port for inquiring about how it is performing. We can do a request to this port using the following Linux command:

./send-txt | nc -U FileServerCachedAP0 | ./recv-txt | jq

Here, the two commands send-txt and recv-txt sends and receives standard progsbase messages.

Let us look quickly at what they send and receive. A progsbase message consists of 15 text characters containing a JSON number. Then follows that amount of bytes. For send-txt above, the message is:

recv-txt then receives:

            103{"idles":1553468,"Client Reqs":8400,"Cached Reqs":5594,"Duplicate Hash Req":880,"Over Cache Max Req":0}

So, the send command sends an empty request. Then, nc is used to pass this message onto the socket where the admin port is listening for Cache 0. (There are also caches 1, 2 and 3). Finally, the response is formatted by the JSON program jq:

{
  "idles": 1553468,
  "Client Reqs": 8400,
  "Cached Reqs": 5594,
  "Duplicate Hash Req": 880,
  "Over Cache Max Req": 0
}

The component has been idling for 1,553,468 seconds, which means almost 100%. It has served 8,400 requests of which 5,594 were served from the cache. 880 requests caused a duplicate hash value. An improved hashing algorithm could improve this number. 0 requests were over the maximum allowed size.

The program is run using systemd. The following command gives statistics about its resource consumption:

systemctl status [email protected]

It reports the following statistics: Up 2 weeks 3 days. CPU spent: 17min 37.486s. Memory in use 58.4M.

The other three caches report:

{
  "idles": 1553951,
  "Client Reqs": 7710,
  "Cached Reqs": 4925,
  "Duplicate Hash Req": 836,
  "Over Cache Max Req": 0
}

{
  "idles": 1553851,
  "Client Reqs": 7347,
  "Cached Reqs": 4631,
  "Duplicate Hash Req": 838,
  "Over Cache Max Req": 0
}

{
  "idles": 1553725,
  "Client Reqs": 7045,
  "Cached Reqs": 4450,
  "Duplicate Hash Req": 756,
  "Over Cache Max Req": 0
}

The response cache components are extremely useful and can be located around the world, closest to where the users are. The do not need a disk to run, as they merely cache the requests in memory. Of course, the next componet below it could be a disk cache. In this example, however, it is not.

Step 2: Sequencer

The next component is a sequencer. It takes requests from the response caches and passes them on to a single file server, one after the other. Therefore called a sequencer. This component is important to only need a single source of the static assets.

The progsbase component Sequencer is used for this.

Here, four caches are connected to it. It's admin port reports the following using this command:

./send-txt | nc -U SequencerAP | ./recv-txt | jq

{
  "idles": 1555503,
  "Client Reqs": [
    2807,
    2785,
    2716,
    2595
  ]
}

Here we can see that it has received 2807 requests from Cache 0, 2785 requests from Cache 1, etc. This is basically a result of the round robin from Apache earlier.

Systemd reports that it has used 22min 4.222s of CPU time and that it is using 57.4M of memory.

Step 3: File Server

We have now finally received the message at the file server component. We use the progsbase component FileServer for this. It takes a request for a file and looks up the file on disk.

In order to find the file on disk, the component uses a simple key-value file system. When the file has been read completely, it is encoded as Base16 (Hex) and returned together with a status code of 200. If the file is not found, a return code of 400 is returned.

A MIME type is computed based on the file suffix. For example, if the file ends with .js, the MIME type is set to text/javascript.

The admin port of this component reports the following:

{
  "File Reqs": 10904,
  "File bytes": 72344197,
  "Idle": 1551318
}

It has handled 10,904 file requests with a total of 69 MiB of data, and average of 7.5 KiB per file.

It has spent 25min 19.055s of CPU time and is currently using 52.9M RAM.

It is important to note that this component has no caches. This is something the architect decides to put above or below it if he considers it necessary.

The component below this is a disk, either real or virtual. The interface is the same in both cases. In the case discussed here, the disk is virtual.

Step 4: Virtual Disk Adapter

If the file server is not the same server as the files, there is no need to use a virtual disk. However, the benefit of using a virtual disk, is that:

We get the flexibility of putting a request cache between the File Server and the disk.
We could also put a sequencer if we wanted multiple disk servers.
We also get an administration port monitoring the performance of the virtual disk.
We could place the files on another server if we wanted a purely optimized disk server.

We use the progsbase component VirtualDisk. The virtual disk server and client are in VirtualDiskAdapter.

The virtual disk adapter simply forwards the call to an actual disk.

Let us have a look at it's admin port. Firstly, it has used 23min 12.482s of CPU time and is currently using 34.5M om RAM.

The following command gives the status of the component.

./send-txt | nc -U Disk1AP | ./recv-txt | jq

{
  "Entries Reqs": 23132,
  "Read Reqs": 1086374,
  "Write Reqs": 0,
  "Entry Size": 512,
  "Entries": 200000,
  "Idle": 1554702
}

There are three disk instructions. They are Entries, Read and Write. Recall that 10,904 file requests serving 69 MiB have been processed. That has caused Entries to be called 23,132 times, Read 1,086,374 times and Write zero times. That Write has been called zero times is not a surprise as this is a static web server.

The disk in question here has 200,000 entries of 512 bytes each. Notice that 1,086,374 * 512 = 530.4 MiB bytes has been read, even though 69 MiB of files have been served. This is overhead because of the block size and nature of the file system used.

Again, notice that this component does not have any caching. This is because that is something an architect would put above or below it. In the case discussed here, the caching is simply the response caches in the beginning.

Resource usage

A major benefit of designing software this way is that one has complete control over the use of resources. Take, for example, the virtual disk above. It has been doing work for about 1,392/1,554,702 = 0.1% of the time. It would have been no problem pushing this workload all the way up to 100%. This merely means it is no longer idling, i.e. doing no work. There is one CPU which processes one request at a time. The client above cannot send a new request before having received the response.

Variants

If the work required demand more than 100%, then this is to be solved architecturally. In this example here, one could use two disks with the same contents. Then half of the clients would contact each. This would double the capacity for the static web server.

Another solution would be to add several disk caches above the disk, so that most requests would be served from those caches instead.

As you can see, the basic architecture here brings endless possibilities to set up a software system to deal with any workload by design.

Optimizations

Progsbase natively does not have a byte type. If the hardware you are on supports bytes, then the Bytes library can be replaced by one that uses bytes. For example, the structure:

public class ByteArray{
  public double [] bytes;
}

Is replaced by:

public class ByteArray{
  public byte [] bytes;
}

Which means that all libraries now use bytes natively. Some functions that read and write these structures also needs to be replaced, but in practise, one simply replace the Bytes library with an alternative library that has the correct optimized versions of all structures and functions.

While this might seem like a trivial optimization, the principle behind it scales well to further hardware assisted optimizations.