Simon Holywell

On a recent project I needed a fast way to compare documents for likeness and return a percentage match. With much research and one unanswered Stackoverflow post later I came across Jesse Kornblum’s ssdeep utility intended for computer forensics such as looking for signatures in files when hunting rootkits etc. All the technical details of fuzzy hashing are described in his 2006 journal article Identifying almost identical files using context triggered piecewise hashing.

I was using a wrapper around the ssdeep binary written in pure PHP, but I recently decided to write my first PHP extension by tying into the fuzzy hashing API ssdeep provides. The result is a compiled .so PHP extension file or module and BSD licensed source code hosted on github.

There are full instructions provided in the readme file on github to either install the pre-compiled .so or build your own from source. If you end up using this extension or its code I would be very interested to find out more about your project.

A PHP wrapper for the unix at command

A project I am working on at the moment requires time delayed job queues and having found nothing yet that can manage it properly so I decided to wrap up `at` into a PHP class. This gives you simple methods to add, list and remove jobs from the `at` queue using object oriented code.

The code is very simple and I have documented it reasonably well so along with the examples you should get on your way quickly. The class can, of course, be used from the command line as well so if you want to write batch scripts in PHP to handle adding a collection of predefined `at` jobs for example - it can make it easier.

I feel the important features of `at` have been added into the code, but if you want to wrap any of the other functions then please do fork my code and make a patch or post a pull request. For more information on what `at` can do please either run `man at` in your console or visit the Edinburgh University’s man at page.

You can get the code from my repository on GitHub: PHP-at-Job-Queue-Wrapper.

If you do have any trouble getting the `at` daemon (atd) going then my previous post may help you debug it - please see If you are having problems getting Ubuntu atd running for more on that.

Getting the gearman PHP PECL package to build on Ubuntu is problematic with many unaccounted for dependency issues.

I only made a couple changes when following the instructions from JSJoy as I am running Karmic rather than Lucid I changed the apt-get sources to:

deb http://ppa.launchpad.net/gearman-developers/ppa/ubuntu karmic main
deb-src http://ppa.launchpad.net/gearman-developers/ppa/ubuntu karmic main

My sources file was also located at /etc/apt/sources.list and not /etc/sources.list as stated in the original post from JSJoy.

Drop Cap with PHP Regular Expression

This is a simple regular expression I wrote to convert the first letter of an article into a drop cap. It will surround the first letter with a span tag containing the class drop-cap. You can then apply any styling you like to the span with CSS. It will skip over any HTML encoded characters or tags at the beginning of the article as well so it always highlights the first letter of the content and not any HTML formatting.

http://gist.github.com/469749

A simple and somewhat dirty script for backing up Tumblr and/or Disqus via its API to an SQLite DB. It now handles backing up Disqus comments to SQLite as well.

I have knocked together a very simple and somewhat dirty PHP CLI script to download copies of an entire Tumblr blog through their API. I have imaginatively called it Tumblr Backup PHP. I will be adding extra features as and when I can. The first new feature on the list will be ability backup the associated Disqus comments at the same time.

This script was developed on Windows so you will need to update the PHP binary location in the top of the run.php file. More installation information is available in the readme file on GitHub.

Essentially the script uses php-rest-api by Jason Tan to download all your blog posts in 50 post chunks from the Tumblr API and Idiorm by Jamie Matthews to then save all the posts into an SQLite database.

All you need to do is update the configuration file with your blogs details and then execute run.php in your console/command prompt. You will need SQLite, cURL and PHP5 to run the script. You do not need a web server you can just install PHP on your machine and run the script from the command line just like you would with a Python or Ruby script.

There is currently no allowances for time outs when accessing the API and there is also no restore script at the moment.

The next release of Agavi will have initial support for running applications on the Microsoft Windows Azure platform, as well as a database adapter for the new ext/sqlsrv driver to communicate with Microsoft SQL Server and support for the IIS7 web server, which now finally has a very nice

Agavi Form Population Filter

Attaching the population filter without using form IDs (suitable where the current form is on the same page as the URL in forms action parameter)

http://gist.github.com/294742

Use form ids to link the pre-population to a particular form

http://gist.github.com/294743

The above hints were created with help from IRC.

Netbeans and Remote XDebug

To get Netbeans to listen for browser initiated debug sessions please consider the following steps:

  1. Go to Project Properties > Run Configuration > Advanced > Debug URL and choose the Do not open a web browser.  Save.  (you may like to setup Path Mapping, but it works for me without it)
  2. In the projects listing right click on your intended project and choose Debug, which will start Netbeans listening for connections.
  3. In your web browser you can now access your website http://www.example.org?XDEBUG_SESSION_START=netbeans-xdebug
  4. You will now see the debug information appear in the debug log area of Netbeans.

(Para-phrased and expanded upon an answer on StackOverflow)

Agavi PHP Framework Resources
Agavi Framework Logo

Agavi Framework

Bitextender backed Agavi is a very secure and helpful open source (LGPL) MVC framework with the core development being headed by David Zülke (Wombert) and Felix Gilcher (certainly in the IRC channel!). It can take some time to get the hang of the framework so I have put together all the resources I use or have used to help you get started.

Documentation Resources:

Support Resources: