Linuxjournal 网站经典文章翻译,原文地址: Large-Scale Web Site Infrastructure and Drupal ,由于主要按照内容翻译,非逐字翻译,不妥之处,请参考原文阅读。
时至今日,架设一个Drupal网站已经是非常容易的了,直到网站规模开始变大, 那么你就需要花费很大的精力在查找和修复网站性能瓶颈上面。本文我们将谈到一些技术可以把你的Drupal网站负载能力优化到你曾经希望它应该达到的那种程度。
Setting up a Drupal Web site is pretty simple these days, until it gets popular, then you need to bring out the big guns and start finding and fixing the performance bottlenecks. In this article, we show some of the techniques that can allow your Drupal Web site to scale to the grandiose levels you originally hoped for.
由于Twitter网站经常崩溃,所以Twitter网站的用户经常会经常碰到臭名昭著的“失败鲸”,提示:网站访客过多,请稍后再试。这种情况频繁的出现,因此用户经常能遇到。
大家记得不久以前,Facebook也网站经常down掉,这个也是由于网站的访问量异常大。但是在一些小网站上面,我们也经常碰到网站访问过多造成网站down掉问题。
When Twitter experiences an outage, users see the infamous “fail whale” error message, an illustration of twit-birds struggling to hoist a sleeping cartoon whale into the air along with the words “Too many tweets! Please wait a moment and try again.” It happens so often, Twitter has a much-heralded illustration for it.
Not too long ago, many readers may remember Facebook going down for days at a time. True, those sites are dealing with extraordinary levels of traffic, but smaller sites often face the same problems.
小网站怎么也会有这样的问题呢? 我们来找一下原因。
首先,现在的网站已经几乎没有静态页面的网站了。比如Nowadays,网站整合了社交网络的大部分功能,这就是说单单的一个网页会触发所有相关的服务器/程序运行。
其次,内容更新比较频繁并且类型更多,比如多媒体、广告、视频、手持设备等,而不仅仅只有一些简单的文本终端。这种复杂的载荷关联,是许多社交网站面临的很严重的危机,接下来的是找一些合理的方法来解决这“失败鲸”的问题。
其实在大多数大型高性能网站的解决方案,很多都是相同的,无论用的是什么技术开发什么样的网站。比如 Lullabot(本文的作者所在公司)是一个Drupal开发的公司,也就是我们常常见到的LAMP架构,但是技术都是相似的,很多性能相关的问题都是平台中立的。(下来我们就来谈几点).
How come? First, Web sites are no longer a collection of static pages. Nowadays, Web sites combine social-networking features with highly customized content for individual users, meaning most pages have to be assembled on the fly.
Second, content is changing—rich media, on-line advertising, video, telephony. There’s more than text forcing its way through the pipe, and network traffic only continues to grow. Addressing this tandem of complexity and load is the bane of many growing social-media Web sites’ existence. What follows are some clever ways to address this whale of a problem.
Surprisingly, the solutions to most scaling problems are frequently the same, regardless of the technology upon which the site was built. Lullabot (the parent company of this article’s authors) is a Drupal development company, meaning that most of our experience is centered around the typical LAMP stack (Linux, Apache, MySQL and PHP), although most techniques are universal, and some of the most advanced performance software is platform-neutral.
服务器架构
高性能网站架构的一个重要因素当然首要的是硬件(如下图)。如果有充足的资源,系统管理员一般喜欢用更多的硬件服务器来解决问题。其实很多服务可以放在一起,开发人员可以选择性的优化或者检查一些数据查询。 不过,当带宽或者用户达到某一个数量,我们就需要解决一些和硬件相关当问题。这就是为什么一个合理的硬件计划是非常重要,可以随意的增减或者筛减硬件,在需要的时候。
一个典型的架构,不管是虚拟机器还是独立机器,通常包含多个服务器,多个数据库,有时还会有多个缓存服务器,所有的这些都需要一个负载均衡服务器来分发请求,这个负载均衡器的配置也变的很关键,比如一台Web或者DB的均衡器配置通常比Cache服务器的均衡器要高。
服务器架构图
我们在处理分发处理请求到多个web服务器是非常简洁的做法,但是遇到上传文件处理,就会有问题了。一般我们的负载均衡是基于round-robin算法的,这种情况下,用户在上传操作中提交一个文件到一台web,当用户刷新页面之后,就被分发到另一台web,而这台web上面并没有新上传的文件。为了解决这个问题,我们需要一个文件服务器把几台Web服务器联合一下,通常情况下我们使用NAS(网络附件存储)或NFS(网络文件系统)mount到每一台web服务器上。这样用户上传和处理文件通过NFS就可以共享所有的文件地址了。
Server Infrastructure
One of the main factors in scaling a Web site is, of course, the hardware (Figure 1). System administrators always can throw more hardware at a problem and solve it at least temporarily, if they have the resources to do so. Quite a few services can be put in place before this needs to be done, and developers can selectively optimize the application by reducing or optimizing queries. Nevertheless, when it comes to sheer numbers of users and bandwidth over a short amount of time, there almost always comes a point where it’s necessary to include hardware in the mix. That’s why it is important to have your hardware infrastructure planned in a way that it rapidly can scale upward on a traffic spike, and back down when your traffic recedes.
A typical setup, whether virtual or dedicated, usually includes multiple Web servers, multiple database servers and sometimes even separate caching servers, all behind a load balancer that distributes traffic between machines. Depending on its processor speed and the amount of available memory, a Web server or database often can double as the caching server, because caching services usually require less resources than Apache or MySQL.
Although distributing traffic across multiple Web servers, or Web heads, can be a quick win, it can introduce problems with managing file uploads. If requests are being distributed round-robin by the load balancer, a user may upload a file on one server but then be switched to a different Web server after the upload, which doesn’t have the newly uploaded file. To solve this problem, a file server also is added into the mix. The file server is usually some form of NAS (Network Attached Storage) or an NFS (Network File System) mount that allows the application to share files between machines. Each Web head will have a copy of the application stored in the Web root, but when it comes to the files that are uploaded or changed often by the users of the application, an NFS mount connects all the servers to a shared file location.
缓存策略
高性能网站构建中另外一个重要的因素,肯定是软件了。为了能以后可扩展、高并发的需求,缓存是一个重要的因素。缓存机制不是相互排斥的,比较优秀的网站都是联合多个缓存。大多数类型的缓存寻求减少所需的磁盘访问量,或者提供给编译成的字节码,使他们更快的运行更接近机器语言更好。