HARD-CORE
High Performance, Low Cost
Web Content Hosting Service
4 September 1997
Daniel Pezely
Written upon the request of IBM/Advantis Internet Services Integration.
Please consider this design and send comments to the author.
Introduction
For a high performance, low cost, Web content hosting service, consider
the following design.
The computer industry's products and the hosting marketplace's
activities dictate certain factors. These factors drive site designers to
common conclusions. Those conclusions force specific designs and the
general consensus within our industry by the actual systems administrators
themselves are summarized here.
This design is for low- to medium- range customers who do not
need dedicated servers, databases or additional Web services.
Dedicated hosting customers may, however, have similar configurations.
Additional services not detailed here include streaming audio, chat rooms
and interactive multiplayer games. Such items should be considered as
straight-forward extensions to this design.
Criteria
We assume the following criteria, based upon the bottom line of the
company's finances and its employees not having to work overtime.
- Low initial capital investment for equipment and personnel.
- Low operational overhead for equipment and personnel.
- Industry standard, off-the-shelf products.
- Relatively easy to find, hire, and keep the systems administrators
and support staff.
- Rapid installation of new server hardware, operating system, server
software, customers, and upgrades.
- Robustness of operation (minimal downtime and maintenance).
- Rapid problem detection.
- Reasonably fault-tolerant.
- Rapid recovery after any problem.
- Reasonably secure against cracking.
Features
The server farm for content hosting can have these features.
- One base configuration for all servers.
- Low cost PC hardware (compared to traditional servers).
- High-performance hardware, operating system, and software.
- Multi-processor hardware and operating system, if desired.
- A widely used operating system for the servers which entry-level
admin/support staff are likely to be familiar with.
- Built-in network security features of the operating system kernel.
- Built-in performance throttling of HTTP server software.
- Multiple customers per server.
- Easy migration of customers between servers.
- Redundancy of customer files through incremental mirroring.
- Recovery of a total failure by one HTTP server within minutes
with no impact to customer files.
- Recovery of a total failure by one file server within minutes with
customer files restored, losing at most changes from the previous one hour.
- Tape back-up is optional and necessary only for historic archiving
or physical site redundancy.
General Design & Operational Policies
With the above criteria and factoring in the above issues and
considerations, this design represents one implementation of the general
consensus within our industry.
- "Front-end" versus "back-end" servers would be used.
- Front-end servers run the HTTP server software.
- Back-end servers manage the customers' files.
- Front-end servers reside on the public Internet.
- Back-end servers reside on a non-routeable, isolated network shared
with only the front-end servers.
Possibly, admin/support staff workstations could access the isolated
network from within the company's internal LAN by way of firewalls.
- Using clusters of server farms will permit unlimited
scalablity.
By grouping a specific number of front-end servers with specific
back-end servers, scalability then becomes a matter of replicating this
"cluster" for each set of few hundred or few thousand customers.
This has the additional benefit of isolating network traffic to
potentially different network segments through the use of switches or
routers.
Note: The exact number of front-end to back-end
servers in a single cluster and how many customers could be supported by a
single cluster should be determined in the preliminary implementation phase.
Hardware performance specifications change too quickly to predict such
metrics now.
- Front-end servers contain disks for only the operating system files
and virtual memory swapping.
- Back-end servers are file servers using NFS which contain all the
customer files.
- At each level (i.e., front versus back separately) servers
are configured 100% identically to peers.
This includes everything except customer specific items such as
virtual interface IP numbers and which file systems are mounted.
- At each level (i.e., front versus back separately) one or
more spare servers will be on-line and ready to be activated remotely in
case of any problem with another server.
- At each level (i.e., front versus back separately) all spare
servers will mirror the configurations of all peer active servers.
This will enable any back-up server to replace any active server
immediately.
Additionally, tape back-up are only required for historic archives or
for physical site redundancy.
- At the back-end, each back-up server will mirror the full contents
(customer files and configurations) of the active servers continually
throughout each day and evening.
This minimizes losses of data in the most severe crashes and enables any
back-up server to replace any active server immediately.
- All servers will have operating system files checked for
integrity/corruption continually throughout each day and evening.
Cryptographic checksums (e.g., MD5) will be generated on all
system files as a "trip-wire" mechanism.
This also assists in narrowing the time frame in which any cracking
activity may have occured.
- All periodic and/or continual processes (such as mirroring and file
integrity checking) will be performed in small increments.
By using shorter time periods between mirroring/updates, the amount of
data transferred will be otherwise reduced in both duration and immediate
network capacity.
By using smaller segments of the file system to be integrity checked,
each invocation will be otherwise reduced in both duration and computational
load.
Likewise, especially for any international customers, there is no single
high-impact activity during anyone's prime-time hours.
Servers Configuration Specifics
- Use hardware based upon Pentium Pro, 200MHz (or equivalant) chip.
- Use a minimum of 128 MB RAM.
- Use a pair of 10/100-base-T FastEthernet controllers (for PCI bus).
Front-end servers have separate network connections for the Internet
versus the server farm LAN (connected only with front/back-end servers).
Back-end servers have separate network connections for the server farm
LAN (connected only with front/back-end servers) versus the company internal
LAN connection via one-way firewall.
- Use "Ultra" SCSI disk controllers and drives.
Definately do not use IDE/EIDE/ATAPI for disks.
The Adaptec
2940-Ultra SCSI disk controller with Ultra SCSI disks are highly recommended
for all servers.
- Use FreeBSD Unix
operating system.
This is a very stable, robust implementation of Unix which is freely
usable, accessible, and distributable (but is not public domain).
In many qualified people's technical/professional opinions, FreeBSD's
quality matches many commercial flavors of Unix including
SunOS/Solaris, HP/ux, and AIX.
The FreeBSD development SNAPSHOT now includes Symmetric Multi-Processing
(SMP) support for machines with multiple Pentium (P5) or Pentium Pro (P6)
CPUs. This feature should be available in the "current" and "stable"
releases within six months.
- Use Apache HTTP
server.
This is a very stable, robust HTTP server which is freely usable,
accessible, and distributable bu is not public domain.
In many qualified people's technical/professional opinions, Apache's
quality surpasses many commercial servers including those from
Netscape.
This software supports SSL for secure commerse.
(There are rumors mentioning the inclusion of a "configuration server"
in the near-future. This would be much like Netscape's "administration
server.")
- Configure Apache HTTP server for access-throttling.
This will limit the maximum load of each customer's HTTP server
daemon, minimizing overall impact on the shared host.
- Configure the operating system kernel to filter IP network packets
of everything except for HTTP (80, 443) and the login/file-transfer method
of your choice.
This way, regardless of any lack of filtering by the routers or
firewalls, you are protected.
- Use only SSH
(secure shell) using the RSA
encryption for all login and file transfers.
See "Secure customer access" below.
- As a general configuration guideline, use X back-end servers and
3X front-end servers. That is, if there are 90 front-end servers,
you should have 30 back-end servers.
The exact numbers will have to be determined at the preliminary
implementation phase.
Physical Site Specifics
- Use a console terminal server for remote access of any server's
console.
- Use a remote access switch for power-cycling any server.
- Use dedicated hosts for monitoring the servers activity with
self-contained capabilities to send text pages to admin/support staff.
- Use dedicated hosts for development and testing of the service.
- All hosts used for anything other than directly serving customers
(i.e., everything mentioned in this section) should not be in
the domain name service (DNS) tables and should be on separate networks.
This will defeat "daemon dialing" or "war dialing" attempts to
locate such support servers.
Related Web Services
Additional services beyond the basic HTTP traffic of the Web are
optional for a basic service offering. If customer demand is high enough,
these additional, related services may supported while still fitting within
the above criteria.
- "Front-end" versus "back-end" servers would be used, similarly
to the HTTP servers.
- Mail spooling/forwarding may be handled on separate, dedicated mail
servers.
- Anonymous FTP services may be handled via separate, dedicated FTP
servers.
- Front-end mail servers would be identified through specific domain
name service (DNS) configuration.
- Mail service is handled in two stages, as is commonly done for
publicly accessible servers.
The security procedures are supported by the
TIS
fwtk
program, smapd.
- Front-end FTP servers would use the same back-end servers as the HTTP
servers but would instead mount the remote file system as read-only.
This way, despite being a separate server than what runs the HTTP
software, the FTP site may include access to the HTTP DocumentRoot
without any replication and still be secure.
- Front-end servers reside on the public Internet.
- Back-end servers reside on a non-routeable, isolated network.
- Front-end servers permit only console login.
- Front-end servers use minimal operating system configurations.
- Front-end servers contain no users other than standard Unix
administrative and daemon userids.
- Front-end servers use a CD-ROM based "live" file system for all
operating system files.
This prevents any possiblity of operating system file corruption.
Custom CDs should be used. A CD-ROM writer unit (approximately $600.00)
is required on at least one firewalled, administrative workstation.
Files containing passwords, hostname, and IP number assignments may be
stored on a write-protected floppy disk so as to permit updates without
cutting a new CD.
Considering such minimal operating system configurations, there should
be no performance impact after boot-up due to the slow nature of a
CD-ROM device.
- Front-end servers use local disks for temporary incoming spools.
- Front-end servers behave as file servers to back-end servers which, in
turn, would perform any processing required (eg, for mail).
Examples: Front-end mail servers would use local disks only for temporary
spooling of in-bound mail. Back-end mail servers would treat front-end mail
servers as NFS servers. Back-end mail servers perform all normal mail
processing.
Front-end FTP servers would use local disks only for temporary spooling
of uploaded ("contributed") files. Back-end file servers would treat
front-end FTP servers as NFS servers. Back-end file servers periodically
(e.g., hourly) migrate uploaded/contributed files into customers'
file systems.
For security purposes and integrity of the servers, restrict what the
customers may do on the server and how they may do it.
- Provide a secure mechanism to upload files.
See "Secure Customer Access" below.
- Provide a graphical user interface (GUI) for manipulating their
directories and files.
This GUI should include mechanisms for compression, archiving
(e.g., tar or zip), and extracting, as well as file
renaming, changing access permissions, deleting files, and adding/removing
directories.
Currently, no such application exists in the public domain. If
development and maintenance of a custom program is not possible,
remote login is then required.
See "Secure Customer Access" below.
- If login access is to be permitted, provide a restricted login
shell.
Such a restricted shell should be limited to their own directory tree
(i.e., chroot'd).
Depending upon how secure you want the site, the following would be the
maximum for a service offering.
This configuration is highly recommended.
Customers would not be severely impacted, and those who understand that
Internet servers have inherit security risks will be receptive to this
configuration.
- Require all customers to use the secure shell,
SSH.
- MS Windows (3.11, 95, NT), Mac, OS/2, Amiga, and most flavors of Unix
are supported.
(See
http://www.cs.hut.fi/ssh/#portability.)
- Require all customers to use SSH configured with
RSAREF2.
This is already common practice at some hosting firms.
(There are
versions available for international customers so no export control
issues are of concern.)
- Customers must then present an initial SSH/RSA Public Key Identity
upon sign-up for each userid, and only those identities may login and upload
files.
- Customers may login/upload files from any internet address provided they
have their "Identity" on their workstation-- a very small text file.
- Customers may change their Public Key Identities anytime they choose.
- Customers use 'scp' (SSH's secure version of the 'rcp' remote copy
program) to transfer files.
This program permits on-the-fly compression, supports recursive copying
of entire directory trees, and is much more efficient than FTP.
- If login is to be permitted, customers use 'ssh' (SSH's secure
version of the 'rsh' remote shell program) to login.
There is little reason to permit a customer to login to a shared
hosting server, when sufficient alternatives are provided.
The customer's shell should be limited to their own directory tree
(i.e., chroot'd).