HARD-CORE
High Performance, Low Cost
Web Content Hosting Service
May 1998
Original Version: April 1997
[Reformatted for HTML5 and Archive.org links: 29 December 2017]
This is the result of on-going research based upon contacts within the
entire industry.
Please consider this design and send comments to the author.
This was written in response to a request for qualified, technical
opinions “from people in low places on the ‘net.”
The content of this document is nonproprietary.
Introduction
For a high performance, low cost, Web content hosting service, consider
the following design.
The computer industry's products and the hosting marketplace's
activities dictate certain factors. These factors drive site designers to
common conclusions. Those conclusions force specific designs, and the
general consensus within our industry by the actual systems administrators
themselves is summarized here.
This document is for anyone planning for a Web content hosting facility.
Executives, managers, systems administrators, technical, and non-technical
people alike should all find something worthwhile.
Overview
This is a multiple computer architecture hosting environment. Multiple
computers provide enhanced performance, security, integrity, privacy, and
robustness for a shared hosting facility.
The specific hardware, software, and operating system choices presented
are merely components and can be substituted with products deemed
appropriate at the time you implement your service.
For cost-effectiveness purposes, this design promotes using free
operating systems and free software. These are quality works of software
with various corporations controlling and backing the development.
Most importantly, the selections made here represent the best options to
find fully pretrained people to staff the roles in platform development,
systems administration, help desk, on-call support, and if necessary,
customization.
Likewise, many potential customers-- or the webmasters acting on the
customers' behalves-- are already familiar with the selections made. This
makes for a more appealing service from a marketing perspective.
This design is for low- to medium- range customers without needs
for private, dedicated servers (for things like databases or additional Web
services).
Dedicated hosting customers may, however, have benefit from similar
configurations. (e.g., Offer dedicated HTTP hosts but shared,
auxiliary anonymous FTP services.)
Additional services not detailed here include streaming audio, chat
rooms, and interactive multiplayer games. Such items should be considered
as straight-forward extensions to this design.
Criteria
We assume the following criteria, based upon the bottom line of the
company's finances and its employees required to work only regular business
hours.
- Low initial capital investment for equipment and personnel.
- Low operational overhead for equipment and personnel.
- Ability to support multiple architectures of run-time software,
permitting customers to mix-and-match.
- Industry standard, off-the-shelf products.
- Relatively easy to find, hire, and keep the systems administrators
and support staff.
- Rapid installation of new server hardware, operating system, server
software, customers, and upgrades.
- Robustness of operation (minimal downtime and maintenance).
- Rapid problem detection.
- Reasonably fault-tolerant.
- Rapid recovery after any problem.
- Reasonably secure against cracking.
- Ease of replicating site for disaster recovery models.
Features
The server farm for content hosting can have these features.
- One base configuration for all servers.
- Low cost PC hardware (compared to traditional servers).
- Reducing headcount requirements for platforms necessary to support
specific software architectures (e.g.: Solaris/SPARC, NT/Intel,
AIX/PowerPC).
- High-performance hardware, operating system, and software.
- Multi-processor hardware and operating system, if desired.
- A widely used operating system for the servers which entry-level
admin/support staff are likely to be familiar with.
- Built-in network security features of the operating system kernel.
- Built-in performance throttling of HTTP server software.
- Multiple customers per server.
- Easy migration of customers between servers.
- Redundancy of customer files through incremental mirroring.
- Recovery of a total failure by one HTTP server within minutes
with no impact to customer files.
- Recovery of a total failure by one file server within minutes with
customer files restored, losing at most changes from the previous one hour.
- Tape back-up is optional and necessary only for historic archiving
or physical site redundancy.
General Design & Operational Policies
With the above criteria and factoring in the above issues and
considerations, this design represents one implementation of the general
consensus within our industry.
- “Front-end” versus “back-end” servers would be used.
- Front-end servers run the HTTP server software.
- Back-end servers manage the customers' files.
- Front-end servers reside on the public Internet.
- Back-end servers reside on a non-routeable, isolated network shared
with only the front-end servers.
Possibly, admin/support staff workstations could access the isolated
network from within the company's internal LAN by way of firewalls.
- Using clusters of server farms will permit unlimited
scalablity.
By grouping a specific number of front-end servers with specific
back-end servers, scalability then becomes a matter of replicating this
“cluster” for each set of few hundred or few thousand customers.
This has the additional benefit of isolating network traffic to
potentially different network segments through the use of switches or
routers.
Note:
The exact number of front-end to back-end
servers in a single cluster and how many customers could be supported by a
single cluster should be determined in the preliminary implementation phase.
Hardware performance specifications change too quickly to predict such
metrics now.
- Front-end servers contain disks for only the operating system files
and virtual memory swapping.
- Back-end servers are file servers using NFS which contain all the
customer files.
- At each level (i.e., front versus back separately) servers
are configured 100% identically to peers.
This includes everything except customer specific items such as
virtual interface IP numbers and which file systems are mounted.
- At each level (i.e., front versus back separately) one or
more spare servers will be on-line and ready to be activated remotely in
case of any problem with another server.
- At each level (i.e., front versus back separately) all spare
servers will mirror the configurations of all peer active servers.
This will enable any back-up server to replace any active server
immediately.
Additionally, tape back-up are only required for historic archives or
for physical site redundancy.
- At the back-end, each back-up server will mirror the full contents
(customer files and configurations) of the active servers continually
throughout each day and evening.
This minimizes losses of data in the most severe crashes and enables any
back-up server to replace any active server immediately.
- All servers will have operating system files checked for
integrity/corruption continually throughout each day and evening.
Cryptographic checksums (e.g., MD5) will be generated on all
system files as a Tripwire-like mechanism.
This also assists in narrowing the time frame in which any cracking
activity may have occured.
- All periodic and/or continual processes (such as mirroring and file
integrity checking) will be performed in small increments.
By using shorter time periods between mirroring/updates, the amount of
data transferred will be otherwise reduced in both duration and immediate
network capacity.
By using smaller segments of the file system to be integrity checked,
each invocation will be otherwise reduced in both duration and computational
load.
Likewise, especially for any international customers, there is no single
high-impact activity during anyone's prime-time hours.
Servers Configuration Specifics
- Use hardware based upon (latest) Pentium (or equivalant) chip.
- Use a minimum of 128 MB RAM.
- Use a pair of 10/100-base-T FastEthernet controllers (for PCI bus).
Front-end servers have separate network connections for the Internet
versus the server farm LAN (connected only with front/back-end servers).
Back-end servers have separate network connections for the server farm
LAN (connected only with front/back-end servers) versus the company internal
LAN connection via one-way firewall.
- Use “Ultra” SCSI disk controllers and drives. Only use
IDE, EIDE, or ATAPI for CD-ROM or other distribution media.
The Adaptec
2940-Ultra SCSI disk controller with Ultra SCSI disks are highly recommended
for all servers.
- Use FreeBSD (or
OpenBSD) Unix operating system.
This is a very stable, robust implementation of Unix which is freely
usable, accessible, and distributable (but is not public domain). There is
a not-for-profit corporation which controls the software development. (This
is also the case for OpenBSD,
Linux, and
NetBSD operating systems.)
In many qualified people's technical/professional opinions, FreeBSD's
(and OpenBSD's) quality matches many commercial flavors of Unix
including SunOS/Solaris, HP/ux, and AIX.
The FreeBSD development branch now includes symmetric multi-processing
(SMP) support for machines with multiple CPUs. This feature should be
available in the “current” and “stable” releases in the near future.
- Use Apache HTTP
server.
This is a very stable, robust HTTP server which is freely usable,
accessible, and distributable (but is not public domain). There is a
not-for-profit corporation which controls the software development.
A commercial version (functional superset) which supports HTTPS/SSL is
the Stronghold server from C2 Net.
(Their software is released with crytpography within the US and in Europe,
so strong encryption may be used world-wide without interference from US
export laws.)
In many qualified people's technical/professional opinions, Apache's
quality surpasses many commercial servers including those from
Netscape.
(There are rumors mentioning the inclusion of a “configuration server”
in the near-future. This would be much like Netscape's “administration
server” or “GUI.”)
- Configure Apache HTTP server for access-throttling.
This will limit the maximum load of each customer's HTTP server
daemon, minimizing overall impact on the shared host.
- Configure the operating system kernel to filter IP network packets
of everything except for HTTP (port 80), HTTPS (port 443), and the
login/file-transfer method of your choice: SSH, Telnet, rlogin, etc.
This way, regardless of any lack of filtering by the routers or
firewalls, you are protected.
- Use only SSH
(secure shell) using the RSA
encryption for all login and file transfers.
See “Secure customer access” below.
- As a general configuration guideline, use x number of back-end
servers and 3x front-end servers. That is, if there are 90 front-end
servers, you should have 30 back-end servers.
The exact numbers will have to be determined at the preliminary
implementation phase.
Physical Site Specifics
- Use a console terminal server for remote access of any server's
console.
- Use a remote access switch for power-cycling any server.
- Use dedicated hosts for monitoring the servers activity with
self-contained capabilities to send text pages to admin/support staff.
- Use dedicated hosts for development and testing of the service.
- All hosts used for anything other than directly serving customers
(i.e., everything mentioned in this section) should not be in
the domain name service (DNS) tables and should be on separate networks.
This will defeat “daemon dialing” or “war dialing” attempts to
locate such support servers.
Related Web Services
Additional services beyond the basic HTTP traffic of the Web are
optional for a basic service offering. If customer demand is high enough,
these additional, related services may supported while still fitting within
the above criteria.
- “Front-end” versus “back-end” servers would be used, similarly
to the HTTP servers.
- Mail spooling/forwarding may be handled on separate, dedicated mail
servers.
- Anonymous FTP services may be handled via separate, dedicated FTP
servers.
- Front-end mail servers would be identified through specific domain
name service (DNS) configuration.
- Mail service is handled in two stages, as is commonly done for
publicly accessible servers.
The security procedures are supported by the
TIS
fwtk
program, smapd.
- Front-end FTP servers would use the same back-end servers as the HTTP
servers but would instead mount the remote file system as read-only.
This way, despite being a separate server than what runs the HTTP
software, the FTP site may include access to the HTTP DocumentRoot
without any replication and still be secure.
- Front-end servers reside on the public Internet.
- Back-end servers reside on a non-routeable, isolated network.
- Front-end servers permit only console login.
- Front-end servers use minimal operating system configurations.
- Front-end servers contain no users other than standard Unix
administrative and daemon userids.
- Front-end servers use a CD-ROM based “live” file system for all
operating system files.
This prevents any possiblity of operating system file corruption.
Custom CDs should be used. A CD-ROM writer unit (approximately $600.00)
is required on at least one firewalled, administrative workstation.
Files containing passwords, hostname, and IP number assignments may be
stored on a write-protected floppy disk so as to permit updates without
cutting a new CD.
Considering such minimal operating system configurations, there should
be no performance impact after boot-up due to the slow nature of a
CD-ROM device.
- Front-end servers use local disks for temporary incoming spools.
- Front-end servers behave as file servers to back-end servers which, in
turn, would perform any processing required (eg, for mail).
Examples:
Front-end mail servers would use local disks only for temporary
spooling of in-bound mail. Back-end mail servers would treat front-end mail
servers as NFS servers. Back-end mail servers perform all normal mail
processing.
Front-end FTP servers would use local disks only for temporary spooling
of uploaded (“contributed”) files. Back-end file servers would treat
front-end FTP servers as NFS servers. Back-end file servers periodically
(e.g., hourly) migrate uploaded/contributed files into customers'
file systems.
For security purposes and integrity of the servers, restrict what the
customers may do on the server and how they may do it.
- Provide a secure mechanism to upload files.
See “Secure Customer Access” below.
- Provide a graphical user interface (GUI) for manipulating their
directories and files.
This GUI should include mechanisms for compression, archiving
(e.g., tar or zip), and extracting, as well as file
renaming, changing access permissions, deleting files, and adding/removing
directories.
Currently, no such application exists in the public domain. If
development and maintenance of a custom program is not possible,
remote login is then required.
See “Secure Customer Access” below.
- If login access is to be permitted, provide a restricted login
shell.
Such a restricted shell should be limited to their own directory tree
(i.e., chroot'd).
Depending upon how secure you want the site, the following would be the
maximum for a service offering.
This configuration is highly recommended.
Customers would not be severely impacted, and those who understand that
Internet servers have inherit security risks will be receptive to this
configuration.
- Require all customers to use the secure shell,
SSH.
- MS Windows (3.11, 95, NT), Mac, OS/2, Amiga, and most flavors of Unix
are supported.
(See
http://www.cs.hut.fi/ssh/#portability.)
- Require all customers to use SSH configured with
RSAREF2.
This is already common practice at some hosting firms.
(There are
versions available for international customers so no export control
issues are of concern.)
- Customers must then present an initial SSH/RSA Public Key Identity
upon sign-up for each userid, and only those identities may login and upload
files.
- Customers may login/upload files from any internet address provided they
have their “Identity” on their workstation-- a very small text file.
- Customers may change their Public Key Identities anytime they choose.
- Customers use 'scp' (SSH's secure version of the 'rcp' remote copy
program) to transfer files.
This program permits on-the-fly compression, supports recursive copying
of entire directory trees, and is much more efficient than FTP.
- If login is to be permitted, customers use 'ssh' (SSH's secure
version of the 'rsh' remote shell program) to login.
There is little reason to permit a customer to login to a shared
hosting server, when sufficient alternatives are provided.
The customer's shell should be limited to their own directory tree
(i.e., chroot'd).