The lead developer at Mosaic, Brighton with a passion for web application development and motorcycles.
External Link: PHP extension writing: PHP Extensions Made Eldrich
Since writing my 15 Excellent Resources for PHP Extension Development post in September last year Kristina Chodorow of 10gen (MongoDB) has written an excellent four part article on writing PHP Extensions on her blog Snail in a Turtleneck.
Recently (well in a loose sense anyway) I had the need to build a document bank in PHP for a client at Mosaic. It was a fairly involved application with various public and private APIs for integration into the clients network of websites.
The core PHP code was written on top of the Agavi framework and various PHP libraries for extracting text and meta data from documents. One of the major features the client required was for the system to detect similar files to prevent unintentional duplicates making it into the document bank.
The idea was that this document bank would be the one central resource for all of the documents written and managed by the organisation. Duplicates or near duplicates would of course make this a pointless exercise. So I turned to StackOverflow for some pointers, but came up empty.
After some research and much searching of the web I came across an open source package called ssdeep written by Jesse Kornblum. I found it through reading his research papers; Identifying almost identical files using context triggered piecewise hashing.
Installing Gearman is pretty easy as there are packages for it in Ubuntu:
sudo apt-get install gearman libgearman-dev
The development headers (libgearman-dev) are only required if you need to compile a library for your programming language such as a PHP extension. To install the PHP module you would run:
sudo pecl install channel://pecl.php.net/gearman-0.7.0
If you have trouble with the above step then it is probably because you are running an older version of Ubuntu. In this case take a look at my previous post Getting gearman to install on Ubuntu.
Moving onto mod_gearman_status, which is an Apache module to show the status of Jobs and their associated workers. It looks like the following:

Firstly lets ensure that the system has the build-essentials package installed:
sudo apt-get install build-essentials
We now need to install the Apache2 development headers, which assuming you installed the standard Ubuntu Apache2 package will be the prefork edition. If you don’t understand this then don’t worry you can just continue below.
sudo apt-get install apache2-prefork-dev
Now download modules C file and run the following command to build it and install it as an Apache module:
sudo apxs2 -c -i mod_gearman_status.c
Apache now needs to be told how to load the module and what configuration settings to use. In /etc/apache2/mods-available you need to create two files.
/etc/apache2/mods-available/gearman_status.load:
LoadModule gearman_status_module /usr/lib/apache2/modules/mod_gearman_status.so
/etc/apache2/mods-available/gearman_status.conf:
<IfModule mod_gearman_status.c> <Location /gearman-status> SetHandler gearman_status </Location> </IfModule>
Enable the module with the following command:
sudo a2enmod gearman_status
Restart Apache to load the module:
sudo service apache2 restart
Now in your browser you can visit http://example.org/gearman-status where example.org is your servers address.
External Link: PHP Hangs When Fed 2.2250738585072011e-308
A pretty horrible bug when you assign the number 2.2250738585072011e-308 to a variable PHP will hang on Linux or Windows 32bit builds of PHP. This does affect $_GET and $_POST variables as well and as such could be an exploit in some PHP sites.
So the following code will break your PHP for example:
$var = 2.2250738585072011e-308;
Or if a page is given a GET parameter like page.php?param=2.2250738585072011e-308
$var = $_GET['param'] + 1;
//OR
$var = (double)$_GET['param'];
More debate available on http://news.ycombinator.com/item?id=2066084
Quite often when you are working with legacy code you will come across a mess of globals. Every single method will make use of the same global instance of the database class for example. So where do you begin to work with this massive impediment?
Logging is a great way to see what methods and classes are being used by you application and where. To achieve this you would normally need to add a logging call to each and every method in the code base. Clearly this would be incredibly tedious and time consuming.
This is where a proxy object can be implemented to save time and centralise the logging functions. The basic idea of a proxy object is that it will be instantiated in place of the actual class and the proxy will delegate any calls through to the original class. For the purposes of this example the original class will be called Database and the proxy object will be called LazyLoadingProxy.

When working in a team it is very useful to have a central web server with multiple environments and a configuration as close to the live server as possible. This can be a bit of a nightmare though if you need to setup a new VirtualHost container in Apache every time a new project is brought on or when a developer wants to work on a version of the site in their own environment.
The good news is that this can all be handled automatically and new sites can be setup by simply adding a new directory to the file system. There are at least two ways of getting this going; the first of which is the mod_vhost_alias module for Apache and the second is enabled via mod_rewrite. I prefer to use the second method as it is more flexible and it allows you tap into the ability of mod_rewrite to introduce environment variables and redirect requests (this is particularly useful for robots.txt - you’ll see).
The Apache2 Manual does have a very good page dedicated to overcoming this problem, but I will be sharing with you all the settings I am using which you will need to stop Google et. al. from crawling your sites served from the staging environment for example.
Installing via the pecl command can be a pain on Redhat. First off all you will need to install the php-devel package:
yum install php-devel
Then you will need ensure that the PEAR/PECL installer is at the latest version so as root run:
pear update-channel pear.php.net
pear upgrade pear
You may need to force pear to upgrade itself by using:
pear upgrade —force pear
I had to use the —force option because my version of PEAR was so old that the installer thought my version of Tar_Archive might not have been up to muster. It was however.
With all this in place you are ready to attempt to install your chosen extension:
pecl install ssdeep
If you get something like the following back:
/usr/bin/phpize: /tmp/ssdeep/build/shtool: /bin/sh: bad interpreter: Permission
Then it is likely that your temporary directory is mounted in a safer noexec state, which means that you cannot execute scripts within the /tmp directory. To test this you can put a simple bash script into your /tmp directory and chmod it with +x. I used the following bash script:
#!/bin/bash
echo “SIMON”
If you do not get SIMON back when you execute the file, but an error like “/bin/sh: bad interpreter: Permission” then the directory is set to noexec.
There are a few ways to overcome this with the easiest being to:
mount -o remount,exec /tmp
pecl install ssdeep
mount -o remount,noexec /tmp
There are a variety of other ways to get this working documented in the Media Temple wiki pages if the above technique does not work for you.
Whilst developing a PHP extension recently I spent quite a bit of time researching exactly how to create an extension, the best practices and the DocBook format of the PHP manual for documenting the extension.
By the time I finished writing the extension I had found some very good resources both on the web and in print.
Online Articles or Websites:
Printed Books:
Information on Documenting PHP and PECL Extensions:
Also worth a mention is the ability of the new PEAR2 Pyrus installer to generate a skeleton for you to build your PECL extension upon - see the generate-ext section of the manual.
Pyrus can also be used to package your extension up for release using the pickle command.
For more help there is a PECL mailing list and IRC channel (#php.pecl on efnet), both of which are in the support section of the PECL website.
Whilst not strictly for PHP extensions the Autotools: a practitioner’s guide to Autoconf, Automake and Libtool book by John Calcote on the Free Software Magazine is very useful reading for an understanding of the build process involved in extension writing.
There is also the GNU Manual for Autoconf and GNU Autoconf, Automake, and Libtool by Gary V. Vaughan et al.
External Link: The PHP ssdeep Extension is Now in PECL
This means you can now install it easily by simply running:
sudo pecl install ssdeep
There is also proper documentation in the PHP manual which can be found at php.net/ssdeep.
For more information on the extension either see the PECL project page or the ssdeep PHP/PECL extension’s homepage.
External Link: php_ssdeep Fuzzy Hashing PHP Extension
Updated 16/9: php_ssdeep is now in PECL so I have updated this post to reflect that.
On a recent project I needed a fast way to compare documents for likeness and return a percentage match. With much research and one unanswered Stackoverflow post later I came across Jesse Kornblum’s ssdeep utility intended for computer forensics such as looking for signatures in files when hunting rootkits etc. All the technical details of fuzzy hashing are described in his 2006 journal article Identifying almost identical files using context triggered piecewise hashing.
I was using a wrapper around the ssdeep binary written in pure PHP, but I recently decided to write my first PHP extension by tying into the fuzzy hashing API ssdeep provides. The result is now a PECL extension or module and the BSD licensed source code is hosted on PHP.net’s SVN.
There are full instructions provided in the PHP manual and the php_ssdeep site to install the exension and use the provided functions. If you end up using this extension or its code I would be very interested to find out more about your project.