Friday, November 6, 2009

Zend Performance

Let me first start off by saying that the Zend Framework has been very good to us.

It enabled us to build a kick-ass application in a relatively short amount of time. On top of that, we followed the conventions from Zend and PEAR and essentially have a very maintainable piece of software which I don't hate looking at every day (which is as one can imagine, a huge plus).

The other day our servers were overwhelmed with the rising traffic and I started profiling my application through Xdebug. Initially I tried to use Zend Studio and the Zend_Debugger but Zend doesn't like my (awesome) operating system (FreeBSD) and only provides Linux and Windows extensions. Xdebug, while being free and awesome in general, doesn't know this prejudice. :-)

On this project we currently run with 100,000 visitors per day on average, our peak is Sunday night where we get a ton more traffic than usually. We run the latest PHP (5.2.6 at this time), etc.. The software comes from FreeBSD ports, there are no magic secret patches. I'm picky about the modules I compile and load but the list is far from optimized.

To our defense, we just relaunched over summer and since we are a team of four total and only two of those four people write code. Since we started off slow with 60,000-80,000 visitors per day since summer, we never really had a chance or need to optimize and tried to avoid all premature optimization.

We currently use 50-some Zend-classes. I wish I could provide a better number, but as you may know, the Zend Framework is only bundled as a whole and figuring out which classes are all in the mix is tricky. So the 50 is an estimation based on grepping through our own class code mostly.

On the server we run Apache 1.3. Currently we have a total of four webservers (two older (dual core, 6 GB RAM, slow disks (7.5k rpm)), two newer models (eight cores and 6 GB RAM, faster disks (15k rpm)). The backend consists of two powerful workhouses with eight cores, more RAM (than the frontends) and a lot of disk (at 12k rpm each).

Prior to starting the quest for performance one of our older servers was able to handle ten (10) requests/second at peaks, now we are at 42 requests/second. (Give or take a few.) In regard to page loading time, we went with a few optimizations from 340 ms to 76 ms in no time (all figures according to Xdebug). So I feel like we are right on the right track to Getting rich with PHP. (Where's my Lexus at? :-))

We benchmarked using Apache Bench (at moderate ab -n 1000 -c 100 http://url) and Siege, which are both really awesome tools and provide you with an instant DoS attack on your servers. I might add that you are better off running those tools from "localhost" vs. remote as you might trigger your providers IDS/snort/DoS protection otherwise.

Here are a few things, that helped us. Suggestions are in no particular order and I should add that whatever is applicable for my suitation, doesn't have to work for you. Also, my number game could be off and if you have suggestions on how to improve, please comment or drop me a line.

1) APC
It cannot be stressed enough. Please run APC. Please take into account to adjust the default settings, also check apc.slam_defense, apc.write_lock and apc.stat.

We had APC before, but I felt like I needed to mention it on the list of things.

Also, apc_fetch, apc_store etc. are great ways to add little caches throughout your application.

And they almost require zero time to implement. I may suggest you use apc_fetch/apc_store directly vs. wrapping the Zend_Cache layer around it which provides (IMHO) little added value and benefit but just adds more class code around the obvious.

2) Adjust PHP's realpath.cache setting and .ttl
This helps, somewhat.

3) Get rid off require_once, use __autoload (and the Zend_Loader)

This might be a hassle when during development, because require_once evaluates each include, thus letting you know if it finds a parse error, and also where.

With include_once (which Zend_Loader essentially uses), it's a bit tricky at times. A good idea here would be a phing task (or some other script) which strips out or replaces require_once when you deploy your application to production.

Removing require_once in favour of __autoload shows one of the biggest performance improvements in my entire application - I shaved off roughly 220 milliseconds by removing about 15 (or so) calls to require_once in my bootstrap.php file. And that's with APC enabled, and a decent sized realpath.cache (and .ttl).

Beyond weird coding conventions (I shall bitch about those in another blog post), require_once is also the number one performance killer from the entire Zend_* code base. The before/after is amazing. Without any of those enhancements from the list just by stripping out require_once from our ZendFramework "install", we went from 9-10 requests/second to 27 requests/second.

Use the following shellscript to strip them:

grep -rl require_once . | grep -v svn |grep -v Loader | xargs perl -pi~ -e 's/require_once/#require_once/'

3) Zend_Loader
I know, I just recommended using the Zend_Loader but with no offense, the Zend_Loader sucks is not so great when it comes to general performance. Obviously I did not write it and really no offense meant, but it does some really weird stuff on the inside which I am not sure what the use-case is. But I am sure there is one. ;-)

In order to preserve the API, I extended Zend_Loader and started overwriting functions such as Zend_Loader::_securityCheck(), which runs a regular expression on the name of each file you feed to __autoload/Zend_Loader.

On top of that I switched to usig the Zend_Loader only for models and controllers. But not for Zend_ and Company_-classes. Since Zend (and we) essentially follow the great PEAR coding standard in regard to one class per file and a very explicit naming scheme, all you have to do in your __autoload is the following:

function __autoload($className) {
include_once $className = str_replace('_', '/', $className) . '.php';
}

Now, that would be the bare minimum and our loader looks slightly more complicated but I haven't stopped there and we are still in the process of "dumbing" it down even further, but so far it saved us between five and 15 ms per page.

4) Cache DB results, avoid queries!

Those tricky, tricky DB queries.

Even though our DB backends idle mostly even when we get beat with traffic, there's a few things to keep in mind.

One of them is - DB queries are really expensive. And by queries I am not talking about the "SELECT * FROM foo"-part, but rather about opening a connection to another server, sending the query, receiving it and so on. Let alone by caching one of those, we roughly gained another 20 ms on the frontpage. And it's not a very complex query either.

I remember looking puzzled when ahem... I was presented with the code that pulls a status message on each request to the homepage but I had forgotten about this already and just noticed it again when it popped up in xDebug with a notable amound of milliseconds.

5) Zend_Db_Table

Zend_Db_Table is very easy to use, in fact most of our models wrap around a couple tables and that's why we got a bunch of them. Now what I did not realize (but thanks to JamesG@#zftalk now I do), is that the meta data the class uses to provide all those nifty interfaces is generated on each request. That's a DESCRIBE TABLE in the background, which is pure overhead.

Zend_Db_Table_Abstract::setDefaultMetadataCache() to the rescue.

5) Apache

5a)
I sometimes hate Apache, but I also can't live without it.

Over the past years, I have tried all sorts of things in the webserver market - Lighttpd and nginx with php-cgi (fastcgi) seem to be no fun. A commercial solution such as Resin or Zeus has never been an option either.

I've always come back to Apache (1.3) for the simple fact that Apache and PHP are really so tightly integrated that nothing ever will go wrong.

Remember that guy Nik who claimed that Apache/PHP sometimes fail and deliver the sourcecode to the browser (because Facebook obviously failed to configure Apache)? Well, that doesn't happen - ever. The only problem with Apache is that Apache and client(browser)-communication is a bitch.

Nginx to the rescue! Fast install, easy to configure (don't let the Russian FAQ scare you, the nginx.conf-dist will teach you all you need!) - just chain your Apache to localhost:8080 and let Nginx proxy all requests to it and your Apaches move from "lockf" status, to "run" and "accept" always.

Whenever Apache receives a request of a slower client, it will have to wait until the slower client is done reading all of the response. While waiting, your 30 MB Apache sits there unable to do anything else. Which nginx in the mix, the Apache sends the response as fast as it can to nginx, thus having more time to take care of what it's supposed to do for you - PHP.

Judging from my poor benchmarks, nginx adds to the number of requests by factor six or seven (6 or 7). It's amazying and I never expected it to have such a great impact. It also doesn't eat away on resources, so beware of the Russians! :-)

Take all "optimizations" into account, Apache 1.3, proxied by Nginx can now handle over 3000 requests/second (ab -n 10000 -c 1000 http://url).

5b)
The obvious quirks, for example check out your default Apache install and unload all the modules and extensions you never use anyway.

For example, we don't have any of those HTTP authentication boxes ever. So why do we need *_auth_* modules. Then, we don't use a user_dir, why load mod_userdir, our Apache does not log - why load mod_config_log, or my most favorite: mod_status.

Make sure mod_status is really disabled because otherwise that's one very, very expensive operation you got right there, with each request.

A good idea is to check top, unload, and look again:

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
42601 root 1 96 0 7092K 2324K CPU1 1 0:00 1.00% top
36242 www 1 20 0 114M 33304K lockf 1 0:22 0.29% httpd
38251 www 1 20 0 114M 32184K lockf 0 0:08 0.29% httpd
42579 www 1 20 0 114M 28016K lockf 1 0:02 0.29% httpd
37975 www 1 20 0 115M 34688K lockf 1 0:18 0.24% httpd
36344 www 1 20 0 115M 34036K lockf 1 0:18 0.24% httpd

Take into account (thanks, Jan), that the size column, is not the real size. It's like the theoretical size including whatever the Apache could use if it had to, but you want to look at RES (resident) instead. Because that's what's in the memory right now.

Another smart move is to put all the rules from .htaccess into your server configuration because otherwise Apache searches for various (!) .htaccess files with each requests and tries to evaluate the rules you have in there.

Imagine this:

/htdocs/foo/bar/index.html

In this request Apache will look for .htaccess in the following directories:
/htdocs
/htdocs/foo
/htdocs/foo/bar

Turn it off (AllowOverride none) instead and move on because when you move all your directives into httpd.conf (or similar) at least they get evaluated once when the server process starts up.
From a deployment perspective it's nicer to have .htaccess because all you need to do is re-deploy one file vs. editing a server config and restarting a server, but this really pays off. With certain APC settings, you will need to restart the server anyway, also, "No pain, no gain!".

Thursday, October 29, 2009

REPLACE in PostgreSQL

I've recently made the switch from MySQL to PostgreSQL for some of my more complicated database projects, such as the ad matching discussed in my last post. PostgreSQL has a much richer feature set (although MySQL 5 is catching up). One feature I couldn't find in PostgreSQL however was the REPLACE extension offered by MySQL where rows that occur in duplicated primary keys replace the existing rows instead of give errors. Fortunately, PostgreSQL's advanced rules system allows the creation of an equivalent feature.

For the sake of this example we'll use a very simple table called 'map' with a two fields: 'key' and 'value'. Not surprisingly, key is the primary key.

Let's further assume that every insert into this table should actually be a replace. Then all we need is this simple statement and we're done:

-- Merge rule
CREATE OR REPLACE RULE merge_btv_lookup_weather_locations AS
ON INSERT TO btv.lookup_weather_locations
WHERE (EXISTS ( SELECT 1
FROM btv.lookup_weather_locations
WHERE lookup_weather_locations.code = new.code)) DO INSTEAD UPDATE btv.lookup_weather_locations SET name = new.name, type = new.type, active = new.active
WHERE lookup_weather_locations.code = new.code;

That's it! So how does it work? First, we create a rule to be processed whenever a row is inserted into the map table. Then, we use the subquery EXISTS(SELECT 1 FROM map WHERE key=NEW.key) to test for a row on the same primary key. If there isn't a row, the query proceeds as a normal INSERT. But if there is a row, the query is converted into an UPDATE that sets the value instead of trying to insert it. The 'NEW.var' means whatever values we were trying to INSERT.

This is why I'm starting to become a PostgreSQL fan.

Sunday, October 11, 2009

How to convert dmg file to an iso image

I found this out a while ago when looking for info on reading .DMG files on Windows or Linux boxes. I found out it was not possible, and I wasn't too happy. You see, my iMac has no CD-R drive, just a CD-ROM. This tip creates ISO images from DMG images, so they can be burned elsewhere. To convert the file to an ISO image, type the following command at your terminal window:

hdiutil convert /path/to/filename.dmg -format UDTO -o /path/to/savefile.iso

Replace /path/to/filename.dmg with the path and name of the existing .DMG file, and replace /path/to/savefile.iso with the desired path and name for the converted image.

This then creates an ISO image burnable in Nero on Windows, or pretty much anything on Windows that will burn ISOs and same with Linux. I just converted a DMG image as a test, and it took a while -- it only converted at about 1MByte per secoond, but I only have a 333Mhz imac G3, so speed wise, it may be good.

Wednesday, August 19, 2009

Even if you use OpenOffice, you might still want all the Microsoft TrueType fonts so that documents created using Word or PowerPoint look as they were supposed to when you open them with OpenOffice. Also, with the Microsoft Fonts installed we browsing will be better since the pages will look as the designer originally intended them to. Most webpages are designed with Microsoft fonts in mind. The stylesheet specify these fonts. On Linux, when these specified fonts are not available on your computer, they are replaced with generic equivalents. With these fonts installed, you will see the page as it was designed. To install the fonts, all you need to do in Ubuntu is to install the msttcorefonts package. Instructions for installation are given below.

The Truetype Microsoft fonts provided by the package include:

  • Andale Mono
  • Arial Black
  • Arial (Bold, Italic, Bold Italic)
  • Comic Sans MS (Bold)
  • Courier New (Bold, Italic, Bold Italic)
  • Georgia (Bold, Italic, Bold Italic)
  • Impact
  • Times New Roman (Bold, Italic, Bold Italic)
  • Trebuchet (Bold, Italic, Bold Italic)
  • Verdana (Bold, Italic, Bold Italic)
  • Webdings

Installing Microsoft Truetype fonts on Ubuntu

You can install the MS core fonts by installing the msttcorefonts package. To do this, enable the “Universe” component of the repositories. This is done by default in Feisty. After you do that, use the following command from the command line:

$sudo apt-get install msttcorefonts

This will give you the core fonts, but if there are other TrueType fonts that you want installed, it is as easy as copying the font files to the ~/.fonts/ directory.

After installing new fonts, you will have to log out and log in again to be able to see and use the new fonts. If you want to avoid this, you can regenerate the fonts cache by issuing the following command:
$sudo fc-cache -fv

Sunday, June 14, 2009

Enable and Disable Ubuntu Root Password

Ubuntu is one of the few Linux distributions out there that will not enable the root account.If you want to do something with root permission on the console you have to type sudo before the command.


sudo” means superuser do. “sudo” will prompt for “Password:”. Please specify user password

As you have noticed during the Ubuntu installation there was no question about the root password, as you might have been used to see during other Linux distribution installation process.Because of this your root accout is inactive.

If you want to enable root account (which is not recommended) enter the following command.

$sudo passwd root

This will prompt for a new root password and once you confirm it, you can start using the root account to login.

If you want to disable root account in ubuntu you need to lock the root account by using the following command

$sudo passwd -l root

If you want to work on a root console you’d better use the following command

$sudo -i

Sunday, June 7, 2009

Как да си изключа swap partion-а, за да проверя дали няма да имам подбрение в performance-a. Тази стъпка се препоръчва на машини с повечко RAM, които се ползват desktop или тестови машини, в никакъв случай не препоръчвам тази стъпка за машини в production. Идеята е да изстискаме малко performance, но машината трябва да се наблюдава, за да няма странни проблеми.

Thursday, June 4, 2009

How to grep in files by specific extension

For example lets search for "*.php" files

We should have something like this:

find ./ -name "*.php"

Now let grep for specific string "$GLOBALS" in a specific php file - test.php

grep -n "$GLOBALS" test.php

Now let combine what we learn

find ./ -name "*.php" -type f -exec grep -H -n "$GLOBALS" {} \;

the above code search the current directory for php files and search in them "$GLOBALS"

-H -n means show me filename and line number.

This should work.

For best results use GUI editor with search in files :-)

For more info

man grep