Saturday, 29 January 2011

So, it works.

After another little while pottering away this afternoon, I've managed to get the code for the big GA into some kind of shape. All the essentials are there, as it were:
- chromosomes and genes
- a basic fitness function
- a basic parent-selection method
- a basic crossover system
- a basic mutation system

I've knocked together a rough initial dataset, and it's generating the sort of results that I'd expect; trending from worse to better. This is good :-)

Now we move onto the actual tricky part, which is working in the various other bits of the requirements, like the quality of service restrictions and so on. First up will be fixing the fitness function to be a bit more standardised.

Step 1: types of QoS attributes
First thing to consider is whether we should have different methods of combining attributes over the service. My initial reaction is "yes", and here's why:
For something like "cost", all we need to do to work out the total cost of a particular set of services is to add up the cost of each of them. Easy. But for something like "Availability", expressed in a percentage, we can't really just add up all the availability scores, can we? That'd produce an oddly skewed result at the end.
eg: we have three services, each of which has a 90% chance of being available. If we add those up, we get an availability score of 270 for the operation. We could divide that by the number of services to get back to the average availability, but that still only takes us to 90% -- in fact, the aggregate availability (the chance that all three services will be available during the invocation) is only 73%.
So, which will serve the algorithm better - the average availability of the services, or the actual computed aggregate availability for the whole set? I'm not sure. Possibly I should test both :-)

Step 2: weighting factors
Also, we should take into account a couple of other factors when calculating the fitness of a chromosome: partly a user-defined weighting (eg: if they would rather the algorithm brought the cost of the service composition down, rather than the execution time), and partly a dynamic weighting to try and meet any particular requirements (eg: if the maximum cost of the service composition is set to £10, and none of the current solutions match that, then the weighting factor assigned to the cost of the service composition should be increased).
What this means is that I'll have to take the fitness calculation of a chromosome out of the chromosome itself, and into the main population object; that way I can store a series of weighting factors that can be changed quickly and easily throughout the execution time.

Thursday, 27 January 2011

Note to self

NB: this will likely make no sense, unless you've recently seen the code for the genetic algorithm :-)

Ooh: idea.

What about dynamically altering the multiplying factors for the different quality of service attributes, based around the current best solution?


Wednesday, 19 January 2011

More than meets the eye

I've been working recently on a series of XSLT conversions for a big XML data set, and I'm really impressed with the stuff you can pull with that.

The job's been to take a whole load of bibliographic information from the backend of a publishing system and convert it to something which can be brought into a design package to quickly put together a catalogue. Of course, nothing is as easy as it should be, and there are a lot of little hurdles to this which - a year or so ago - would have had me running off, tail between my legs. (Actually - it *did* have me running away from the problem: I did try to write something to solve this, and entirely failed!)

But now, with a bit more thought and general XML experience, I've managed to find ways to get around all those problems, and now the biggest problem we've got is that there's a certain amount of inconsistency in the data itself. (It was all imported from some other system, and some of the data there is well over fifteen years old. And wrong.)

Generally, though, I'm amazed at the power of XSL stylesheets and what they can do to a piece of XML. Pretty much anything, really. You can even create functions within the stylesheet like any other kind of language, which is incredibly useful.

However, there are some caveats. (There always are.)

Now, as with many things, there are different versions of the stylesheet implementaton, and they come with different functionalities. Sadly, the design package in use (InDesign), will only support version 1.0. Which is a shame, because all the really cool features (the functions mentioned above, and some of the date/time function as well) only came in in version 2.

However, if you've got a bunch of data in XML format and need to repurpose it for something a little different, I'd recommend checking out the possibilities of XSLT. There are plenty of free tools available and it's not too hard to get the head round it. And it might just open up a whole new purpose to your data - or in this case, save a whole rakeload of time. And those are two very noble purposes.

Friday, 14 January 2011

Quality of Service

OK, so today we're going to be thinking about the way you can look at Quality of Service, as regards Web Services. This is a pretty cool area, and can get kinda complex when you start chaining together services, too, so it's worth thinking about.

Right, when we're looking at Web Services (WS), we don't necessarily just mean the easy ones like OpenStreetMap, where you just send out your command (where's the nearest coffee-shop?*)to their server, and get back a bunch of data in return. Nope. What we're looking at is the wider world of WS, where instead of sending a command to OpenStreetMap, we could look up a directory of different services, and see that perhaps a whole selection them offer searches for nearby coffee shops. Brilliant! We have a choice! So, how do we choose which service to query?

This is where quality of service comes in. In that directory service, all those WS that are listed will have given their quality of service attributes - how much the service costs, how quickly the service will guarantee to respond to your request, any security standards they can support for data transfer, and so on.

This is all well and good and relatively simple - you just pick the service that matches any criteria you have, and if it's unavailable, you choose another one. Simple. But it gets more complicated, as I said, when you start chaining services together. Let's say that you want to know how much the coffee at the nearest coffee chop costs, and we'll imagine that there are webservices which (a) locate the nearest coffee shop and (b) keep track of the prices at coffee shops. Let's even get carried away and imagine that there are several services which do both of them.

Now we have to look at both services put together - the service composition - and total up the quality of service for both of them. So if we want the reply to come back within 10 seconds, we'll need to make sure to pick two services whose response times, when added together, come in below 10 seconds.

This is still pretty simple stuff, but let's now imagine a situation where this service composition contains ten or more services, all chained together to form one super-service. Each of those ten services that make it up might have ten or twenty different options to choose from out there on the web, which means that we end up with a variability in the overall quality of service that, frankly, makes your head spin.

In the algorithm I'm writing to try and help pick out the best combination of services to chain together, I'm going to have to come up with some kind of method to measure quality of service in some kind of meaningful way.

To start with, I'm going to be limiting the QoS factors I'm dealing with to the basic few: cost, response time and service availability. This gives a nice balance between positive and negative factors, but limits the complexity to something that should be easy enough to handle. So the next step is to try and decide how to store these. Obviously an array will give nice fast data access, but is it flexible enough to survive in the real world, where we might end up dealing with hundreds of QoS factors? Time for another thinking cap, I guess!

* I'm trying to cut down on my coffee intake. Does it show? ;-)

Wednesday, 12 January 2011

Fitness

No, it's not a New Year's Resolution that I'm talking about today. It's a set of thoughts that I'm having about how to measure the fitness of a solution to a difficult problem.

Let's imagine that we have a set of four tasks to do, and ten workers who can do them. Each worker has a skill rating at each of those four tasks, a cost to employ them to do the task, a reliability rating to see if they actually show up for the task, and maybe a few other attributes as well.

We're going to start by randomly assigning workers to tasks, and see how good those solutions are. So how do you measure the fitness of a given solution to the task at hand?

Well, we'll start by assigning positive or negative qualifiers to the different attributes of the workers:
Skill rating; reliability rating: these should clearly be maximised for any given task.
Cost: this should clearly be minimised for any given task.

Then we'll let the manager decide which of attributes is the most important, either by simply ranking them in order, or by assigning percentages to each one. (eg: they could decide that the priority should be 50% based on cost, 40% based on reliability, and 10% based on skill rating)

And now comes the hard part: normalisation! Let's say that we measure the time taken to complete the task in hours, the cost to complete the task in pounds, and the reliability of the worker as a percentage. How do we unite those completely different measurements?

Well, I've a few ideas, and they start with some kind of normalisation.

1) Normalise each value and then multiply by prioritisation:
- let's look over all of the values for a particular attribute (say skill), and then divide all of those attributes by the highest one. We'll end up with the highest skilled worker being rated with a skill of 1, and everyone else rated between 0 and 1.
- Repeat that for all of the other attributes available.
- multiply each of the attributes by the ranking factor, or the priority percentage.

2) Distance from average:
- again, we'll pick a particular attribute, and then calculate the average (mean) value of that attribute.
- Now, we'll calculate the distance from that average for every worker.
- Then, we'll normalise those distances to lie in the range -1 -- +1
- Finally, we can apply the priority factors.

Which of these would give a better set of results? Any gut feeling answers on a postcard?! In the meanwhile, I'm going to have a poke around in the literature to see if there are any recommendations there...