HARD-CORE
High Performance, Low Cost
Web Content Hosting Service

4 September 1997

Daniel Pezely

Written upon the request of IBM/Advantis Internet Services Integration.
Please consider this design and send comments to the author.




Introduction

For a high performance, low cost, Web content hosting service, consider the following design.

The computer industry's products and the hosting marketplace's activities dictate certain factors. These factors drive site designers to common conclusions. Those conclusions force specific designs and the general consensus within our industry by the actual systems administrators themselves are summarized here.

This design is for low- to medium- range customers who do not need dedicated servers, databases or additional Web services.

Dedicated hosting customers may, however, have similar configurations. Additional services not detailed here include streaming audio, chat rooms and interactive multiplayer games. Such items should be considered as straight-forward extensions to this design.




Criteria

We assume the following criteria, based upon the bottom line of the company's finances and its employees not having to work overtime.

  1. Low initial capital investment for equipment and personnel.
  2. Low operational overhead for equipment and personnel.
  3. Industry standard, off-the-shelf products.
  4. Relatively easy to find, hire, and keep the systems administrators and support staff.
  5. Rapid installation of new server hardware, operating system, server software, customers, and upgrades.
  6. Robustness of operation (minimal downtime and maintenance).
  7. Rapid problem detection.
  8. Reasonably fault-tolerant.
  9. Rapid recovery after any problem.
  10. Reasonably secure against cracking.



Features

The server farm for content hosting can have these features.

  1. One base configuration for all servers.
  2. Low cost PC hardware (compared to traditional servers).
  3. High-performance hardware, operating system, and software.
  4. Multi-processor hardware and operating system, if desired.
  5. A widely used operating system for the servers which entry-level admin/support staff are likely to be familiar with.
  6. Built-in network security features of the operating system kernel.
  7. Built-in performance throttling of HTTP server software.
  8. Multiple customers per server.
  9. Easy migration of customers between servers.
  10. Redundancy of customer files through incremental mirroring.
  11. Recovery of a total failure by one HTTP server within minutes with no impact to customer files.
  12. Recovery of a total failure by one file server within minutes with customer files restored, losing at most changes from the previous one hour.
  13. Tape back-up is optional and necessary only for historic archiving or physical site redundancy.



General Design & Operational Policies

With the above criteria and factoring in the above issues and considerations, this design represents one implementation of the general consensus within our industry.

  1. "Front-end" versus "back-end" servers would be used.
  2. Front-end servers run the HTTP server software.
  3. Back-end servers manage the customers' files.
  4. Front-end servers reside on the public Internet.
  5. Back-end servers reside on a non-routeable, isolated network shared with only the front-end servers.
    Possibly, admin/support staff workstations could access the isolated network from within the company's internal LAN by way of firewalls.
  6. Using clusters of server farms will permit unlimited scalablity.
    By grouping a specific number of front-end servers with specific back-end servers, scalability then becomes a matter of replicating this "cluster" for each set of few hundred or few thousand customers.
    This has the additional benefit of isolating network traffic to potentially different network segments through the use of switches or routers.
    Note: The exact number of front-end to back-end servers in a single cluster and how many customers could be supported by a single cluster should be determined in the preliminary implementation phase. Hardware performance specifications change too quickly to predict such metrics now.
  7. Front-end servers contain disks for only the operating system files and virtual memory swapping.
  8. Back-end servers are file servers using NFS which contain all the customer files.
  9. At each level (i.e., front versus back separately) servers are configured 100% identically to peers.
    This includes everything except customer specific items such as virtual interface IP numbers and which file systems are mounted.
  10. At each level (i.e., front versus back separately) one or more spare servers will be on-line and ready to be activated remotely in case of any problem with another server.
  11. At each level (i.e., front versus back separately) all spare servers will mirror the configurations of all peer active servers.
    This will enable any back-up server to replace any active server immediately.
    Additionally, tape back-up are only required for historic archives or for physical site redundancy.
  12. At the back-end, each back-up server will mirror the full contents (customer files and configurations) of the active servers continually throughout each day and evening.
    This minimizes losses of data in the most severe crashes and enables any back-up server to replace any active server immediately.
  13. All servers will have operating system files checked for integrity/corruption continually throughout each day and evening.
    Cryptographic checksums (e.g., MD5) will be generated on all system files as a "trip-wire" mechanism.
    This also assists in narrowing the time frame in which any cracking activity may have occured.
  14. All periodic and/or continual processes (such as mirroring and file integrity checking) will be performed in small increments.
    By using shorter time periods between mirroring/updates, the amount of data transferred will be otherwise reduced in both duration and immediate network capacity.
    By using smaller segments of the file system to be integrity checked, each invocation will be otherwise reduced in both duration and computational load.
    Likewise, especially for any international customers, there is no single high-impact activity during anyone's prime-time hours.



Servers Configuration Specifics

  1. Use hardware based upon Pentium Pro, 200MHz (or equivalant) chip.
  2. Use a minimum of 128 MB RAM.
  3. Use a pair of 10/100-base-T FastEthernet controllers (for PCI bus).
    Front-end servers have separate network connections for the Internet versus the server farm LAN (connected only with front/back-end servers).
    Back-end servers have separate network connections for the server farm LAN (connected only with front/back-end servers) versus the company internal LAN connection via one-way firewall.
  4. Use "Ultra" SCSI disk controllers and drives. Definately do not use IDE/EIDE/ATAPI for disks.
    The Adaptec 2940-Ultra SCSI disk controller with Ultra SCSI disks are highly recommended for all servers.
  5. Use FreeBSD Unix operating system.
    This is a very stable, robust implementation of Unix which is freely usable, accessible, and distributable (but is not public domain).
    In many qualified people's technical/professional opinions, FreeBSD's quality matches many commercial flavors of Unix including SunOS/Solaris, HP/ux, and AIX.
    The FreeBSD development SNAPSHOT now includes Symmetric Multi-Processing (SMP) support for machines with multiple Pentium (P5) or Pentium Pro (P6) CPUs. This feature should be available in the "current" and "stable" releases within six months.
  6. Use Apache HTTP server.
    This is a very stable, robust HTTP server which is freely usable, accessible, and distributable bu is not public domain.
    In many qualified people's technical/professional opinions, Apache's quality surpasses many commercial servers including those from Netscape.
    This software supports SSL for secure commerse.
    (There are rumors mentioning the inclusion of a "configuration server" in the near-future. This would be much like Netscape's "administration server.")
  7. Configure Apache HTTP server for access-throttling.
    This will limit the maximum load of each customer's HTTP server daemon, minimizing overall impact on the shared host.
  8. Configure the operating system kernel to filter IP network packets of everything except for HTTP (80, 443) and the login/file-transfer method of your choice.
    This way, regardless of any lack of filtering by the routers or firewalls, you are protected.
  9. Use only SSH (secure shell) using the RSA encryption for all login and file transfers.
    See "Secure customer access" below.
  10. As a general configuration guideline, use X back-end servers and 3X front-end servers. That is, if there are 90 front-end servers, you should have 30 back-end servers.
    The exact numbers will have to be determined at the preliminary implementation phase.



Physical Site Specifics

  1. Use a console terminal server for remote access of any server's console.
  2. Use a remote access switch for power-cycling any server.
  3. Use dedicated hosts for monitoring the servers activity with self-contained capabilities to send text pages to admin/support staff.
  4. Use dedicated hosts for development and testing of the service.
  5. All hosts used for anything other than directly serving customers (i.e., everything mentioned in this section) should not be in the domain name service (DNS) tables and should be on separate networks.
    This will defeat "daemon dialing" or "war dialing" attempts to locate such support servers.



Related Web Services

Additional services beyond the basic HTTP traffic of the Web are optional for a basic service offering. If customer demand is high enough, these additional, related services may supported while still fitting within the above criteria.

  1. "Front-end" versus "back-end" servers would be used, similarly to the HTTP servers.
  2. Mail spooling/forwarding may be handled on separate, dedicated mail servers.
  3. Anonymous FTP services may be handled via separate, dedicated FTP servers.
  4. Front-end mail servers would be identified through specific domain name service (DNS) configuration.
  5. Mail service is handled in two stages, as is commonly done for publicly accessible servers.
    The security procedures are supported by the TIS fwtk program, smapd.
  6. Front-end FTP servers would use the same back-end servers as the HTTP servers but would instead mount the remote file system as read-only.
    This way, despite being a separate server than what runs the HTTP software, the FTP site may include access to the HTTP DocumentRoot without any replication and still be secure.
  7. Front-end servers reside on the public Internet.
  8. Back-end servers reside on a non-routeable, isolated network.
  9. Front-end servers permit only console login.
  10. Front-end servers use minimal operating system configurations.
  11. Front-end servers contain no users other than standard Unix administrative and daemon userids.
  12. Front-end servers use a CD-ROM based "live" file system for all operating system files.
    This prevents any possiblity of operating system file corruption.
    Custom CDs should be used. A CD-ROM writer unit (approximately $600.00) is required on at least one firewalled, administrative workstation.
    Files containing passwords, hostname, and IP number assignments may be stored on a write-protected floppy disk so as to permit updates without cutting a new CD.
    Considering such minimal operating system configurations, there should be no performance impact after boot-up due to the slow nature of a CD-ROM device.
  13. Front-end servers use local disks for temporary incoming spools.
  14. Front-end servers behave as file servers to back-end servers which, in turn, would perform any processing required (eg, for mail).
    Examples: Front-end mail servers would use local disks only for temporary spooling of in-bound mail. Back-end mail servers would treat front-end mail servers as NFS servers. Back-end mail servers perform all normal mail processing.
    Front-end FTP servers would use local disks only for temporary spooling of uploaded ("contributed") files. Back-end file servers would treat front-end FTP servers as NFS servers. Back-end file servers periodically (e.g., hourly) migrate uploaded/contributed files into customers' file systems.



Customer Access Policy

For security purposes and integrity of the servers, restrict what the customers may do on the server and how they may do it.

  1. Provide a secure mechanism to upload files. See "Secure Customer Access" below.
  2. Provide a graphical user interface (GUI) for manipulating their directories and files.
    This GUI should include mechanisms for compression, archiving (e.g., tar or zip), and extracting, as well as file renaming, changing access permissions, deleting files, and adding/removing directories.
    Currently, no such application exists in the public domain. If development and maintenance of a custom program is not possible, remote login is then required.
    See "Secure Customer Access" below.
  3. If login access is to be permitted, provide a restricted login shell.
    Such a restricted shell should be limited to their own directory tree (i.e., chroot'd).



Secure Customer Access

Depending upon how secure you want the site, the following would be the maximum for a service offering.

This configuration is highly recommended.

Customers would not be severely impacted, and those who understand that Internet servers have inherit security risks will be receptive to this configuration.

  1. Require all customers to use the secure shell, SSH.
  2. MS Windows (3.11, 95, NT), Mac, OS/2, Amiga, and most flavors of Unix are supported. (See http://www.cs.hut.fi/ssh/#portability.)
  3. Require all customers to use SSH configured with RSAREF2.
    This is already common practice at some hosting firms.
    (There are versions available for international customers so no export control issues are of concern.)
  4. Customers must then present an initial SSH/RSA Public Key Identity upon sign-up for each userid, and only those identities may login and upload files.
  5. Customers may login/upload files from any internet address provided they have their "Identity" on their workstation-- a very small text file.
  6. Customers may change their Public Key Identities anytime they choose.
  7. Customers use 'scp' (SSH's secure version of the 'rcp' remote copy program) to transfer files.
    This program permits on-the-fly compression, supports recursive copying of entire directory trees, and is much more efficient than FTP.
  8. If login is to be permitted, customers use 'ssh' (SSH's secure version of the 'rsh' remote shell program) to login.
    There is little reason to permit a customer to login to a shared hosting server, when sufficient alternatives are provided.
    The customer's shell should be limited to their own directory tree (i.e., chroot'd).