HARD-CORE

High Performance, Low Cost
Web Content Hosting Service

May 1998

Original Version: April 1997

[Reformatted for HTML5 and Archive.org links: 29 December 2017]

This is the result of on-going research based upon contacts within the entire industry.

Please consider this design and send comments to the author.

This was written in response to a request for qualified, technical opinions “from people in low places on the ‘net.”

The content of this document is nonproprietary.

Introduction

For a high performance, low cost, Web content hosting service, consider the following design.

The computer industry's products and the hosting marketplace's activities dictate certain factors. These factors drive site designers to common conclusions. Those conclusions force specific designs, and the general consensus within our industry by the actual systems administrators themselves is summarized here.

This document is for anyone planning for a Web content hosting facility. Executives, managers, systems administrators, technical, and non-technical people alike should all find something worthwhile.

Overview

This is a multiple computer architecture hosting environment. Multiple computers provide enhanced performance, security, integrity, privacy, and robustness for a shared hosting facility.

The specific hardware, software, and operating system choices presented are merely components and can be substituted with products deemed appropriate at the time you implement your service.

For cost-effectiveness purposes, this design promotes using free operating systems and free software. These are quality works of software with various corporations controlling and backing the development.

Most importantly, the selections made here represent the best options to find fully pretrained people to staff the roles in platform development, systems administration, help desk, on-call support, and if necessary, customization.

Likewise, many potential customers-- or the webmasters acting on the customers' behalves-- are already familiar with the selections made. This makes for a more appealing service from a marketing perspective.

This design is for low- to medium- range customers without needs for private, dedicated servers (for things like databases or additional Web services).

Dedicated hosting customers may, however, have benefit from similar configurations. (e.g., Offer dedicated HTTP hosts but shared, auxiliary anonymous FTP services.)

Additional services not detailed here include streaming audio, chat rooms, and interactive multiplayer games. Such items should be considered as straight-forward extensions to this design.

Criteria

We assume the following criteria, based upon the bottom line of the company's finances and its employees required to work only regular business hours.

Low initial capital investment for equipment and personnel.
Low operational overhead for equipment and personnel.
Ability to support multiple architectures of run-time software, permitting customers to mix-and-match.
Industry standard, off-the-shelf products.
Relatively easy to find, hire, and keep the systems administrators and support staff.
Rapid installation of new server hardware, operating system, server software, customers, and upgrades.
Robustness of operation (minimal downtime and maintenance).
Rapid problem detection.
Reasonably fault-tolerant.
Rapid recovery after any problem.
Reasonably secure against cracking.
Ease of replicating site for disaster recovery models.

Features

The server farm for content hosting can have these features.

One base configuration for all servers.
Low cost PC hardware (compared to traditional servers).
Reducing headcount requirements for platforms necessary to support specific software architectures (e.g.: Solaris/SPARC, NT/Intel, AIX/PowerPC).
High-performance hardware, operating system, and software.
Multi-processor hardware and operating system, if desired.
A widely used operating system for the servers which entry-level admin/support staff are likely to be familiar with.
Built-in network security features of the operating system kernel.
Built-in performance throttling of HTTP server software.
Multiple customers per server.
Easy migration of customers between servers.
Redundancy of customer files through incremental mirroring.
Recovery of a total failure by one HTTP server within minutes with no impact to customer files.
Recovery of a total failure by one file server within minutes with customer files restored, losing at most changes from the previous one hour.
Tape back-up is optional and necessary only for historic archiving or physical site redundancy.

General Design & Operational Policies

With the above criteria and factoring in the above issues and considerations, this design represents one implementation of the general consensus within our industry.

“Front-end” versus “back-end” servers would be used.
Front-end servers run the HTTP server software.
Back-end servers manage the customers' files.
Front-end servers reside on the public Internet.
Back-end servers reside on a non-routeable, isolated network shared with only the front-end servers.
Possibly, admin/support staff workstations could access the isolated network from within the company's internal LAN by way of firewalls.
Using clusters of server farms will permit unlimited scalablity.
By grouping a specific number of front-end servers with specific back-end servers, scalability then becomes a matter of replicating this “cluster” for each set of few hundred or few thousand customers.
This has the additional benefit of isolating network traffic to potentially different network segments through the use of switches or routers.

Note: The exact number of front-end to back-end servers in a single cluster and how many customers could be supported by a single cluster should be determined in the preliminary implementation phase. Hardware performance specifications change too quickly to predict such metrics now.
Front-end servers contain disks for only the operating system files and virtual memory swapping.
Back-end servers are file servers using NFS which contain all the customer files.
At each level (i.e., front versus back separately) servers are configured 100% identically to peers.
This includes everything except customer specific items such as virtual interface IP numbers and which file systems are mounted.
At each level (i.e., front versus back separately) one or more spare servers will be on-line and ready to be activated remotely in case of any problem with another server.
At each level (i.e., front versus back separately) all spare servers will mirror the configurations of all peer active servers.
This will enable any back-up server to replace any active server immediately.
Additionally, tape back-up are only required for historic archives or for physical site redundancy.
At the back-end, each back-up server will mirror the full contents (customer files and configurations) of the active servers continually throughout each day and evening.
This minimizes losses of data in the most severe crashes and enables any back-up server to replace any active server immediately.
All servers will have operating system files checked for integrity/corruption continually throughout each day and evening.
Cryptographic checksums (e.g., MD5) will be generated on all system files as a Tripwire-like mechanism.
This also assists in narrowing the time frame in which any cracking activity may have occured.
All periodic and/or continual processes (such as mirroring and file integrity checking) will be performed in small increments.
By using shorter time periods between mirroring/updates, the amount of data transferred will be otherwise reduced in both duration and immediate network capacity.
By using smaller segments of the file system to be integrity checked, each invocation will be otherwise reduced in both duration and computational load.
Likewise, especially for any international customers, there is no single high-impact activity during anyone's prime-time hours.

Servers Configuration Specifics

Use hardware based upon (latest) Pentium (or equivalant) chip.
Use a minimum of 128 MB RAM.
Use a pair of 10/100-base-T FastEthernet controllers (for PCI bus).
Front-end servers have separate network connections for the Internet versus the server farm LAN (connected only with front/back-end servers).
Back-end servers have separate network connections for the server farm LAN (connected only with front/back-end servers) versus the company internal LAN connection via one-way firewall.
Use “Ultra” SCSI disk controllers and drives. Only use IDE, EIDE, or ATAPI for CD-ROM or other distribution media.
The Adaptec 2940-Ultra SCSI disk controller with Ultra SCSI disks are highly recommended for all servers.
Use FreeBSD (or OpenBSD) Unix operating system.
This is a very stable, robust implementation of Unix which is freely usable, accessible, and distributable (but is not public domain). There is a not-for-profit corporation which controls the software development. (This is also the case for OpenBSD, Linux, and NetBSD operating systems.)
In many qualified people's technical/professional opinions, FreeBSD's (and OpenBSD's) quality matches many commercial flavors of Unix including SunOS/Solaris, HP/ux, and AIX.
The FreeBSD development branch now includes symmetric multi-processing (SMP) support for machines with multiple CPUs. This feature should be available in the “current” and “stable” releases in the near future.
Use Apache HTTP server.
This is a very stable, robust HTTP server which is freely usable, accessible, and distributable (but is not public domain). There is a not-for-profit corporation which controls the software development.
A commercial version (functional superset) which supports HTTPS/SSL is the Stronghold server from C2 Net. (Their software is released with crytpography within the US and in Europe, so strong encryption may be used world-wide without interference from US export laws.)
In many qualified people's technical/professional opinions, Apache's quality surpasses many commercial servers including those from Netscape.
(There are rumors mentioning the inclusion of a “configuration server” in the near-future. This would be much like Netscape's “administration server” or “GUI.”)
Configure Apache HTTP server for access-throttling.
This will limit the maximum load of each customer's HTTP server daemon, minimizing overall impact on the shared host.
Configure the operating system kernel to filter IP network packets of everything except for HTTP (port 80), HTTPS (port 443), and the login/file-transfer method of your choice: SSH, Telnet, rlogin, etc.
This way, regardless of any lack of filtering by the routers or firewalls, you are protected.
Use only SSH (secure shell) using the RSA encryption for all login and file transfers.
See “Secure customer access” below.
As a general configuration guideline, use x number of back-end servers and 3x front-end servers. That is, if there are 90 front-end servers, you should have 30 back-end servers.
The exact numbers will have to be determined at the preliminary implementation phase.

Physical Site Specifics

Use a console terminal server for remote access of any server's console.
Use a remote access switch for power-cycling any server.
Use dedicated hosts for monitoring the servers activity with self-contained capabilities to send text pages to admin/support staff.
Use dedicated hosts for development and testing of the service.
All hosts used for anything other than directly serving customers (i.e., everything mentioned in this section) should not be in the domain name service (DNS) tables and should be on separate networks.
This will defeat “daemon dialing” or “war dialing” attempts to locate such support servers.

Related Web Services

Additional services beyond the basic HTTP traffic of the Web are optional for a basic service offering. If customer demand is high enough, these additional, related services may supported while still fitting within the above criteria.

“Front-end” versus “back-end” servers would be used, similarly to the HTTP servers.
Mail spooling/forwarding may be handled on separate, dedicated mail servers.
Anonymous FTP services may be handled via separate, dedicated FTP servers.
Front-end mail servers would be identified through specific domain name service (DNS) configuration.
Mail service is handled in two stages, as is commonly done for publicly accessible servers.
The security procedures are supported by the TIS fwtk program, smapd.
Front-end FTP servers would use the same back-end servers as the HTTP servers but would instead mount the remote file system as read-only.
This way, despite being a separate server than what runs the HTTP software, the FTP site may include access to the HTTP DocumentRoot without any replication and still be secure.
Front-end servers reside on the public Internet.
Back-end servers reside on a non-routeable, isolated network.
Front-end servers permit only console login.
Front-end servers use minimal operating system configurations.
Front-end servers contain no users other than standard Unix administrative and daemon userids.
Front-end servers use a CD-ROM based “live” file system for all operating system files.
This prevents any possiblity of operating system file corruption.
Custom CDs should be used. A CD-ROM writer unit (approximately $600.00) is required on at least one firewalled, administrative workstation.
Files containing passwords, hostname, and IP number assignments may be stored on a write-protected floppy disk so as to permit updates without cutting a new CD.
Considering such minimal operating system configurations, there should be no performance impact after boot-up due to the slow nature of a CD-ROM device.
Front-end servers use local disks for temporary incoming spools.
Front-end servers behave as file servers to back-end servers which, in turn, would perform any processing required (eg, for mail).

Examples: Front-end mail servers would use local disks only for temporary spooling of in-bound mail. Back-end mail servers would treat front-end mail servers as NFS servers. Back-end mail servers perform all normal mail processing.
Front-end FTP servers would use local disks only for temporary spooling of uploaded (“contributed”) files. Back-end file servers would treat front-end FTP servers as NFS servers. Back-end file servers periodically (e.g., hourly) migrate uploaded/contributed files into customers' file systems.

Customer Access Policy

For security purposes and integrity of the servers, restrict what the customers may do on the server and how they may do it.

Provide a secure mechanism to upload files. See “Secure Customer Access” below.
Provide a graphical user interface (GUI) for manipulating their directories and files.
This GUI should include mechanisms for compression, archiving (e.g., tar or zip), and extracting, as well as file renaming, changing access permissions, deleting files, and adding/removing directories.
Currently, no such application exists in the public domain. If development and maintenance of a custom program is not possible, remote login is then required.
See “Secure Customer Access” below.
If login access is to be permitted, provide a restricted login shell.
Such a restricted shell should be limited to their own directory tree (i.e., chroot'd).

Secure Customer Access

Depending upon how secure you want the site, the following would be the maximum for a service offering.

This configuration is highly recommended.

Customers would not be severely impacted, and those who understand that Internet servers have inherit security risks will be receptive to this configuration.

Require all customers to use the secure shell, SSH.
MS Windows (3.11, 95, NT), Mac, OS/2, Amiga, and most flavors of Unix are supported. (See http://www.cs.hut.fi/ssh/#portability.)
Require all customers to use SSH configured with RSAREF2.
This is already common practice at some hosting firms.
(There are versions available for international customers so no export control issues are of concern.)
Customers must then present an initial SSH/RSA Public Key Identity upon sign-up for each userid, and only those identities may login and upload files.
Customers may login/upload files from any internet address provided they have their “Identity” on their workstation-- a very small text file.
Customers may change their Public Key Identities anytime they choose.
Customers use 'scp' (SSH's secure version of the 'rcp' remote copy program) to transfer files.
This program permits on-the-fly compression, supports recursive copying of entire directory trees, and is much more efficient than FTP.
If login is to be permitted, customers use 'ssh' (SSH's secure version of the 'rsh' remote shell program) to login.
There is little reason to permit a customer to login to a shared hosting server, when sufficient alternatives are provided.
The customer's shell should be limited to their own directory tree (i.e., chroot'd).