May 1998
Original Version: April 1997
Daniel Joseph Pezely
|
This is the result of on-going research based upon contacts within the entire industry. Please consider this design and send comments to the author. This was written in response to a request for qualified, technical opinions ``from people in low places on the 'net.'' The content of this document is nonproprietary. |
For a high performance, low cost, Web content hosting service, consider the following design.
The computer industry's products and the hosting marketplace's activities dictate certain factors. These factors drive site designers to common conclusions. Those conclusions force specific designs, and the general consensus within our industry by the actual systems administrators themselves is summarized here.
This document is for anyone planning for a Web content hosting facility.
Executives, managers, systems administrators, technical, and non-technical
people alike should all find something worthwhile.
This is a multiple computer architecture hosting environment. Multiple computers provide enhanced performance, security, integrity, privacy, and robustness for a shared hosting facility.
The specific hardware, software, and operating system choices presented are merely components and can be substituted with products deemed appropriate at the time you implement your service.
For cost-effectiveness purposes, this design promotes using free operating systems and free software. These are quality works of software with various corporations controlling and backing the development.
Most importantly, the selections made here represent the best options to find fully pretrained people to staff the roles in platform development, systems administration, help desk, on-call support, and if necessary, customization.
Likewise, many potential customers-- or the webmasters acting on the customers' behalves-- are already familiar with the selections made. This makes for a more appealing service from a marketing perspective.
This design is for low- to medium- range customers without needs for private, dedicated servers (for things like databases or additional Web services).
Dedicated hosting customers may, however, have benefit from similar configurations. (e.g., Offer dedicated HTTP hosts but shared, auxiliary anonymous FTP services.)
Additional services not detailed here include streaming audio, chat
rooms, and interactive multiplayer games. Such items should be considered
as straight-forward extensions to this design.
We assume the following criteria, based upon the bottom line of the company's finances and its employees required to work only regular business hours.
The server farm for content hosting can have these features.
With the above criteria and factoring in the above issues and considerations, this design represents one implementation of the general consensus within our industry.
Possibly, admin/support staff workstations could access the isolated network from within the company's internal LAN by way of firewalls.
By grouping a specific number of front-end servers with specific back-end servers, scalability then becomes a matter of replicating this ``cluster'' for each set of few hundred or few thousand customers.
This has the additional benefit of isolating network traffic to potentially different network segments through the use of switches or routers.
This includes everything except customer specific items such as virtual interface IP numbers and which file systems are mounted.
This will enable any back-up server to replace any active server immediately.
Additionally, tape back-up are only required for historic archives or for physical site redundancy.
This minimizes losses of data in the most severe crashes and enables any back-up server to replace any active server immediately.
Cryptographic checksums (e.g., MD5) will be generated on all system files as a ``trip-wire'' mechanism.
This also assists in narrowing the time frame in which any cracking activity may have occured.
By using shorter time periods between mirroring/updates, the amount of data transferred will be otherwise reduced in both duration and immediate network capacity.
By using smaller segments of the file system to be integrity checked, each invocation will be otherwise reduced in both duration and computational load.
Likewise, especially for any international customers, there is no single high-impact activity during anyone's prime-time hours.
Front-end servers have separate network connections for the Internet versus the server farm LAN (connected only with front/back-end servers).
Back-end servers have separate network connections for the server farm LAN (connected only with front/back-end servers) versus the company internal LAN connection via one-way firewall.
The Adaptec 2940-Ultra SCSI disk controller with Ultra SCSI disks are highly recommended for all servers.
This is a very stable, robust implementation of Unix which is freely usable, accessible, and distributable (but is not public domain). There is a not-for-profit corporation which controls the software development. (This is also the case for OpenBSD, Linux, and NetBSD operating systems.)
In many qualified people's technical/professional opinions, FreeBSD's (and OpenBSD's) quality matches many commercial flavors of Unix including SunOS/Solaris, HP/ux, and AIX.
The FreeBSD development branch now includes symmetric multi-processing (SMP) support for machines with multiple CPUs. This feature should be available in the ``current'' and ``stable'' releases in the near future.
This is a very stable, robust HTTP server which is freely usable, accessible, and distributable (but is not public domain). There is a not-for-profit corporation which controls the software development.
A commercial version (functional superset) which supports HTTPS/SSL is the Stronghold server from C2 Net. (Their software is released with crytpography within the US and in Europe, so strong encryption may be used world-wide without interference from US export laws.)
In many qualified people's technical/professional opinions, Apache's quality surpasses many commercial servers including those from Netscape.
(There are rumors mentioning the inclusion of a ``configuration server'' in the near-future. This would be much like Netscape's ``administration server'' or ``GUI.'')
This will limit the maximum load of each customer's HTTP server daemon, minimizing overall impact on the shared host.
This way, regardless of any lack of filtering by the routers or firewalls, you are protected.
See ``Secure customer access'' below.
The exact numbers will have to be determined at the preliminary implementation phase.
This will defeat ``daemon dialing'' or ``war dialing'' attempts to locate such support servers.
Additional services beyond the basic HTTP traffic of the Web are optional for a basic service offering. If customer demand is high enough, these additional, related services may supported while still fitting within the above criteria.
The security procedures are supported by the TIS fwtk program, smapd.
This way, despite being a separate server than what runs the HTTP software, the FTP site may include access to the HTTP DocumentRoot without any replication and still be secure.
This prevents any possiblity of operating system file corruption.
Custom CDs should be used. A CD-ROM writer unit (approximately $600.00) is required on at least one firewalled, administrative workstation.
Files containing passwords, hostname, and IP number assignments may be stored on a write-protected floppy disk so as to permit updates without cutting a new CD.
Considering such minimal operating system configurations, there should be no performance impact after boot-up due to the slow nature of a CD-ROM device.
Front-end mail servers would use local disks only for temporary spooling of in-bound mail. Back-end mail servers would treat front-end mail servers as NFS servers. Back-end mail servers perform all normal mail processing.
Front-end FTP servers would use local disks only for temporary spooling of uploaded (``contributed'') files. Back-end file servers would treat front-end FTP servers as NFS servers. Back-end file servers periodically (e.g., hourly) migrate uploaded/contributed files into customers' file systems.
For security purposes and integrity of the servers, restrict what the customers may do on the server and how they may do it.
This GUI should include mechanisms for compression, archiving (e.g., tar or zip), and extracting, as well as file renaming, changing access permissions, deleting files, and adding/removing directories.
Currently, no such application exists in the public domain. If development and maintenance of a custom program is not possible, remote login is then required. See ``Secure Customer Access'' below.
Such a restricted shell should be limited to their own directory tree (i.e., chroot'd).
Depending upon how secure you want the site, the following would be the maximum for a service offering.
This configuration is highly recommended.
Customers would not be severely impacted, and those who understand that Internet servers have inherit security risks will be receptive to this configuration.
This is already common practice at some hosting firms.
(There are versions available for international customers so no export control issues are of concern.)
This program permits on-the-fly compression, supports recursive copying of entire directory trees, and is much more efficient than FTP.
There is little reason to permit a customer to login to a shared hosting server, when sufficient alternatives are provided.
The customer's shell should be limited to their own directory tree (i.e., chroot'd).