Javascript Sudoku Solver using jQuery

I took the time earlier in the week to write a html/javascript sudoku solver. This code could be either used to cheat at sudoku or used as a basis for creating an online sudoku game.

Either way, have fun: http://www.bizzeh.com/solver/


Reset The Internet

The following is an example of the sort of work it would take to back up the data on the internet. Destroy the current implementation, and create a new one.

Where to start?

Assuming that all parties involved in something as large scale as resetting the entire internet have agreed to actually doing this, and doing so at no extra cost to the customer (me and you) we would first need to consider how, and where, all the data that is currently on the internet will be backed up.

The shear scale of the internet as it currently stands is unimaginable, the number of domain names registered globally now totals more than 138 million, according to the second quarter 2007 Domain Name Industry Brief published by VeriSign, a provider of digital infrastructure for the networked world.

The largest top-level domains (TLD) in terms of total base of registrations are .com, .de (Germany), .net, .uk (United Kingdom), .cn (China) and .org.

Backing up the domain name database

Let us work out how much space a raw database would take up to simply store the records of the domain name, two domain name servers, and for arguments sake, a 32bit signed integer to represent the unix timestamp of when the domain was registered, and for now, we will forgo all the other information we know domain names carry. Say we stick to the maximum 63 characters originally set out for domain names and use 8 bit characters.

63characters x 8bits per character = 504 bytes

Now we add on the 32 bits for the date first registered.

504 + 32 = 536

We now need to make sure we have enough room to store two domain name servers from 0.0.0.0 to 255.255.255.255, so we need a field that can hold 15 characters, or, we can use another 32 bit integer for each of the domain name servers and store it in its address format.

536 + (32 x 2) = 600

So now we have 600 bits per row to store this information. Now we need to find out what that would be in real terms of storing 138 million domain names.

600 x 138,000,000 = 82,800,000,000bits

82,800,000,000 / 8 = 10,350,000,000bytes

10350000000 / 1024 / 1024 = 9870.5mb

So we have over 9000mb of information just stored in those fields, the central database would also have to store the owner, the address of the owner, the technical contact, and the administrative contact. The ISP that was used to register the domain name would also have to be stored along with the last renewal date and various others. With this extra data we could increase this figure by quite an amount.

Backing up the web

To give some context to the size of just the World Wide Web (only one of the services that currently runs on the internet), the BBC currently has 5.43 million pages indexed in Google, ranging from their homepage at 120kb to a media rich news page of around 250kb so if we take an average of half way between (185kb) to work this out, everyone can be happy:

185kb x 5,340,000 = 987,900,000kb = 964,746mb = 942gb

Say we were to back up all that data on to a set of DVD’s, an standard DVD claims 4.7gb storage on the front of the disk, where it is actually closer to 4.4gb. To make this easier we assume that the BBC have not compressed any of their data, but have packaged it into a set of 4.4gb files (multi-part rar files or multi part tar files) so that we do not run into the problem caused by having a lot of small files.

942gb / 4.4gb = 214.1 DVD’s

So we would need 215 DVD’s to simply back up bbc.co.uk (since we cant have 0.1 of a DVD), and this assumes that we are simply storing the data hoping that nothing goes wrong with any of the discs, so a recovery record of 5% in WinRAR could be added which would be sufficient enough to recover all text based data within the files, this would push the 942gb to 989gb.

989gb / 4.4gb = 224.7 DVD’s

Now we need 225 DVD’s in order to back up the BBC and ensure that we have some sort of recovery record in case of data corruption because of a bad copy or a scratched disc.

To be continued…


PHP Data Optimisation/PHP Function Optimisation

Using the keyword generator code from the previous post, i have written a small example to benchmarking script to test it setting up the array in get_filter_words with 2200 keywords:

The test code is as follows:

$text = strip_tags(file_get_contents('http://www.bizzeh.com/'));

for($x=0; $x<3; $x++) {
  $start = microtime(true);
  for($y=0;$y<100; $y++) {
    $ar = get_valid_keywords($text);
  }
  $end = microtime(true);

  echo($end-$start . "<br/>");
}

Using the default get_filter_words function we get average execution times of:

33.2730691433

I decided to optimise this function slightly to improve performance:

function get_filter_words() {
  static $words;
  if(empty($words)) $words = array('000', ..., 'zwölf' );
  return $words;
}

Defining $words as static allows it to persist across the entire script without being cleared on return, and since we are now checking if its empty and only loading the array if it is empty, we are now saving quite a lot of script time:

9.74925804138


PHP Keyword Generator and Keyword Density Generator

What we do here is first, create an array of bad words that we want to filter out, ie, the most common words in the most common languages such as “and” “or” “are”.

We also need to have a lower boundary to check against, most search engines have a lower word bound of 3 characters, so this is what we will use here.

We now check valid keywords against a string of text which will explode the string into an array based on commas or spaces and check each word against our list and that its 3 characters or greater. If we are not in the bad words list and we are 3 characters or greater, add it to the valid array, and if we are already there, increase the count by 1.

once we have cycled through all the words, we then order by largest first to smallest and then return our array.

we now have an array of words ordered by their popularity in the string that was given.

function get_filter_words() {
  $words = array('000', ..., 'zwölf' );
  return $words;
}

function is_valid_keyword($word) {
  $common_words = get_filter_words();

  return (strlen($word) >= 3 && !in_array($word, $common_words)) ? true : false;
}

function get_valid_keywords($words) {
  $word_arr = array();
  $word_ret = array();

  if(!is_array($words)) {
    $word_arr = preg_split("/[\s,]/", $words, -1, PREG_SPLIT_NO_EMPTY);
  }

  foreach($word_arr as $word) {
    if(is_valid_keyword($word)) {
      if(empty($word_ret[$word])) {
        $word_ret[$word] = 1;
      } else {
        $word_ret[$word]++;
      }
    }
  }

  arsort($word_ret, SORT_NUMERIC);

  return $word_ret;
}

Btw, you need to find your own bad word list


Automated Search Engine Submission via Sitemap Ping

If you want to automate the way in which you submit your new websites to the major search engines, try using the sitemap.xml pinging service. You just replace [FULL SITEMAP URL] with a URL encoded version of the full URL of  your sitemap and then GET request each ping service.

http://www.google.com/webmasters/tools/ping?sitemap=[FULL SITEMAP URL]
http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=[FULL SITEMAP URL]
http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=SitemapWriter&url=[FULL SITEMAP URL]
http://submissions.ask.com/ping?sitemap=[FULL SITEMAP URL]
http://www.bing.com/webmaster/ping.aspx?siteMap=[FULL SITEMAP URL]
http://api.moreover.com/ping?u=[FULL SITEMAP URL]

Load and Parse Large XML Files in PHP

Usually, PHP is limited to using somewhere between 16mb and 128mb of RAM. So what happens if  you want to parse a 1.1gb file of exported product data (over 500,000 products) and not hit the RAM limiter?

At first this seemed to be a pretty impossible task, as to parse the file you require the entire XML to parse it to a tree.

Usually you would run something such as file_get_contents() and then parse the contents returned, but this would load in the entire 1.1gb of XML and put you well beyond most PHP ram limiters.

What you need to do is parse the XML in small chunks (example  below uses 128kb chunks) and parse those bit by bit, this way, you get to speedily parse through your XML file, while at the same time, steer clear of the PHP RAM limiter.

set_time_limit(0);
define('__BUFFER_SIZE__', 131072);
define('__XML_FILE__', 'pf_1360591.xml');

function elementStart($p, $n, $a) {
  //handle opening of elements
}

function elementEnd($p, $n) {
  //handle closing of elements
}

function elementData($p, $d) {
  //handle cdata in elements
}

$xml = xml_parser_create();

xml_parser_set_option($xml, XML_OPTION_TARGET_ENCODING, 'UTF-8');
xml_parser_set_option($xml, XML_OPTION_CASE_FOLDING, 0);
xml_parser_set_option($xml, XML_OPTION_SKIP_WHITE, 1);

xml_set_element_handler($xml, 'elementStart', 'elementEnd');
xml_set_character_data_handler($xml, 'elementData');

$f = fopen(__XML_FILE__, 'r');
if($f) {
  while(!feof($f)) {
    $content = fread($f, __BUFFER_SIZE__);

    xml_parse($xml, $content, feof($f));

    unset($content);
  }
  fclose($f);
}

Imitate target=_blank with jquery

You can replace target=_blank with jquery, or prototype or with raw javascript if you like. Here is the code i came up with to do just this.

jQuery

$(function() {
	$('a[rel*=external]').click(function() {
		var w = window.open(this.href);
		if(!w) alert("Boo! A popup blocker stopped our window from opening");
		return false;
	});
});

I don’t want to accept cookies

A lot of people are paranoid about cookies, and not without reason, but the simple fact is that this is how you create persistence in a stateless protocol. I’ve heard all the arguments and all the debates on the subject, and this is how we’re doing it. The only information that can be stored in a cookie is information you have already handed over and its impossible for more information to be gathered other than what you have already supplied.
If you don’t want to use cookies, you don’t have to. You can still use most websites but you will not be able to use some of the advanced features since they require logging in and having your login information stored on your pc. If your paranoia requires you not to use cookies, this is the sacrifice you’ll have to make.

A lot of people are paranoid about cookies, and not without reason, but the simple fact is that this is how you create persistence in a stateless protocol. I’ve heard all the arguments and all the debates on the subject, and this is how we’re doing it. The only information that can be stored in a cookie is information you have already handed over and its impossible for more information to be gathered other than what you have already supplied.
If you don’t want to use cookies, you don’t have to. You can still use most websites but you will not be able to use some of the advanced features since they require logging in and having your login information stored on your pc. If your paranoia requires you not to use cookies, this is the sacrifice you’ll have to make.


Appliance World Online

This website is one of my earlier large projects, and was created for a company in the Manchester area. This website was the basis and drive to create the DCom ecommerce system. it took roughly four weeks to create from start to finish. It includes a full product information system, a shopping cart, user registration/login system with a user control panel so that people who register can have a wish list of items they wish to buy in the future, and so that they can view previous orders. it also has a unique infinitely recursive category system, which allows as many categories within the tree as you would ever need, and you can go as many categories deep as you would ever need. the website also features a linking system unlike other ecommerce solutions in that, you can place any one product under as many categories as you wish.


Umbrella Property Rentals

Umbrella Property Rentals required a system that could not only display all the properties that they had to offer on their own website, but have it integrated with several 3rd parties such as Vebra and Right Move. Once a property is added, the Google Geo Location API is used to look up the address and find the latitude and longitude coordinates for the address and plot them on an interactive Google Map.


Older posts >>