InTech: 2011

Sunday, 18 December 2011

Harnessing Hibernate

OK, so I finally cracked and bought a real textbook on Hibernate, rather than trying to muddle my way through a mass of random online tutorials and Java debugger statements. And, even more excitingly, it came as a Kindle file. Marvellous.*

Harnessing Hibernate is the title, and so far it's been very useful. I wouldn't recommend it to anyone except a near-total Hibernate beginner: you'd want to know a bit of Java and roughly what Hibernate is trying to achieve, and preferably you'll have seen a bit of Hibernate code flying around, just to get your eye in, as it were.

It's a little out of date now: all the examples are built using Ant rather than Maven. But still, I can't really criticise that - I still have regular and ongoing battles with Maven, and the Ant stuff means that it's been dead easy for the authors to introduce small sections at a time, and slowly upgrade from feature to feature over the chapters.

Overall, I'm impressed so far - but we'll reserve the real judgement for when I try to bring Hibernate into my own projects!

* I don't own a Kindle (yet) but the Kindle app for Windows does what it says on the tin. Not too sure about trying to read books on an Android phone but sure, I'll give it a go some day. For now, screen reading is grand. And yet another reason why a second monitor is an essential development tool, not just a nice extra.

Monday, 5 December 2011

Test Driven Development - Review 1

OK, so.

I have my first few classes down using the new test-driven method. It's been written about a thousand times before (at least), so I shan't bore you with the details, but a nice quick summary is here (linky!).

I like it, actually. It hasn't been getting in the way too much, and I've gotten a lot further with the project this time - when I've been writing tests about what to expect from the code before touching the code itself - than the first time I tried it, when I ended up bogged down in complexity and confusion.

Coverage of the code by the tests is still nice and high (thanks in no small part to EclEmma!) and motivation is also pretty good.

No real demonstrable product to show yet, though... that'll be a while before we've anything that even remotely resembles a game system, and I haven't even looked at an interface yet. Lots of fun to come.

Note to self: should also put together a website.

Progress so far: 1,172 instructions; 1090 instructions covered by tests.

Monday, 28 November 2011

Something... somthing... somthing... test-driven.

OK, so it's been a good long while since the last post to this - which, in turn, was probably a post muttering about how long it had been since the one before.

However, this time, I have a plan. There's a wee idea I've had kicking around for a little while, and I'm finally going to actually give it a shot. What's more, I'm going to have a stab at putting all this talk of "test driven development" into actual use.

The basic idea is to write a board game: a wargame simulating starship combat. It's going to try to incorporate actual approximations of vector physics, and generally as much sci fi goodness as I can cram in, before it falls apart under is own weight (I am at least realistic on my chances of finishing this!).

What I'll try to do is keep a running tally of the tests that I've written and gotten to pass, and maybe a log of the difficulties I find on the way. Fun for all the family!

Sunday, 28 August 2011

An ode to the Acorn Electron, or: Thoughts on Computing in Education

*beep*

Acorn Electron

BASIC

Ahh, nostalgia. I think my folks still have the first computer I ever used somewhere, up in the attic. An Acorn Electron, with a whole thirty-two kilobytes of RAM, and no disk drives - you had to plug in a temperamental tape deck and wait fifteen agonising minutes loading a game before you found out that you'd set something up wrong and you'd have to start all over again. But I loved it. And nearly as much as the games (Citadel! Palace of Magic! Frenzy!), I loved the idea that you could write your own programs for it, too. I spent hours trying to write my own versions of the games that I played, by typing in programs that I found in magazines, and then spending just as many hours trying to figure out which of the semi-colons or quotation marks that I'd typed wrongly.

I sometimes stop to think about how I would have turned out born just a few years later. The Electron, by booting straight into this (admittedly crude) programming environment, really made the point that you were supposed to try and do your own thing with the computer rather than just accepting what other people had written.

I don't know a great deal about the current state of computing in education, but it's something that I should find out more about. I remember that IT lessons were adventures in controlling a turtle onscreen, or (if we were lucky) controlling actual Lego connected to the serial port. It really opened your eyes to the possibilities of computers as tools. I'd like to think that there are some quick and easy programming languages for kids still out there. Whatever happens in the future, we can't just have computing lessons being restricted to learning how to use the office software suite du jour. What a way to kill any enthusiasm.

There's no shortage of resources for programming education. And the thing is, many of them are free: look at Greenfoot, for Java. It hides a lot of the complexity of the language: but that's a great way to start. It's got a graphical interface: that's great.

The problem is, of course, that to teach Java, you need to know some Java in the first place. And how do we get that kind of knowledge imparted? How many teachers are there in schools with any kind of programming knowledge? Not nearly enough. Following on from that, these programs also need to be installed, set up, and maintained. Who's going to take care of that? Much easier to just teach kids how to use the internet, a word processor and a spreadsheet.

Perhaps a special kind of week-long code camp could be run for children with any kind of interest in programming: do a week of classes in the summer holidays covering the very basics - and give pointers on places to look for further information. It might whet someone's interest, and - who knows - might produce the programmers of tomorrow.

Tuesday, 1 February 2011

Another note to self

Right - it's probably time to take a serious look at normalising the fitness factors of a genetic algorithm; otherwise you end up with absolutely mental fitness values that aren't a great deal of use to anyone. :/

Or I might just need to figure out a different way of picking parents.

Saturday, 29 January 2011

So, it works.

After another little while pottering away this afternoon, I've managed to get the code for the big GA into some kind of shape. All the essentials are there, as it were:

- chromosomes and genes

- a basic fitness function

- a basic parent-selection method

- a basic crossover system

- a basic mutation system

I've knocked together a rough initial dataset, and it's generating the sort of results that I'd expect; trending from worse to better. This is good :-)

Now we move onto the actual tricky part, which is working in the various other bits of the requirements, like the quality of service restrictions and so on. First up will be fixing the fitness function to be a bit more standardised.

Step 1: types of QoS attributes

First thing to consider is whether we should have different methods of combining attributes over the service. My initial reaction is "yes", and here's why:

For something like "cost", all we need to do to work out the total cost of a particular set of services is to add up the cost of each of them. Easy. But for something like "Availability", expressed in a percentage, we can't really just add up all the availability scores, can we? That'd produce an oddly skewed result at the end.

eg: we have three services, each of which has a 90% chance of being available. If we add those up, we get an availability score of 270 for the operation. We could divide that by the number of services to get back to the average availability, but that still only takes us to 90% -- in fact, the aggregate availability (the chance that all three services will be available during the invocation) is only 73%.

So, which will serve the algorithm better - the average availability of the services, or the actual computed aggregate availability for the whole set? I'm not sure. Possibly I should test both :-)

Step 2: weighting factors

Also, we should take into account a couple of other factors when calculating the fitness of a chromosome: partly a user-defined weighting (eg: if they would rather the algorithm brought the cost of the service composition down, rather than the execution time), and partly a dynamic weighting to try and meet any particular requirements (eg: if the maximum cost of the service composition is set to £10, and none of the current solutions match that, then the weighting factor assigned to the cost of the service composition should be increased).

What this means is that I'll have to take the fitness calculation of a chromosome out of the chromosome itself, and into the main population object; that way I can store a series of weighting factors that can be changed quickly and easily throughout the execution time.

Thursday, 27 January 2011

Note to self

NB: this will likely make no sense, unless you've recently seen the code for the genetic algorithm :-)

Ooh: idea.

What about dynamically altering the multiplying factors for the different quality of service attributes, based around the current best solution?

Wednesday, 19 January 2011

More than meets the eye

I've been working recently on a series of XSLT conversions for a big XML data set, and I'm really impressed with the stuff you can pull with that.

The job's been to take a whole load of bibliographic information from the backend of a publishing system and convert it to something which can be brought into a design package to quickly put together a catalogue. Of course, nothing is as easy as it should be, and there are a lot of little hurdles to this which - a year or so ago - would have had me running off, tail between my legs. (Actually - it *did* have me running away from the problem: I did try to write something to solve this, and entirely failed!)

But now, with a bit more thought and general XML experience, I've managed to find ways to get around all those problems, and now the biggest problem we've got is that there's a certain amount of inconsistency in the data itself. (It was all imported from some other system, and some of the data there is well over fifteen years old. And wrong.)

Generally, though, I'm amazed at the power of XSL stylesheets and what they can do to a piece of XML. Pretty much anything, really. You can even create functions within the stylesheet like any other kind of language, which is incredibly useful.

However, there are some caveats. (There always are.)

Now, as with many things, there are different versions of the stylesheet implementaton, and they come with different functionalities. Sadly, the design package in use (InDesign), will only support version 1.0. Which is a shame, because all the really cool features (the functions mentioned above, and some of the date/time function as well) only came in in version 2.

However, if you've got a bunch of data in XML format and need to repurpose it for something a little different, I'd recommend checking out the possibilities of XSLT. There are plenty of free tools available and it's not too hard to get the head round it. And it might just open up a whole new purpose to your data - or in this case, save a whole rakeload of time. And those are two very noble purposes.

Friday, 14 January 2011

Quality of Service

OK, so today we're going to be thinking about the way you can look at Quality of Service, as regards Web Services. This is a pretty cool area, and can get kinda complex when you start chaining together services, too, so it's worth thinking about.

Right, when we're looking at Web Services (WS), we don't necessarily just mean the easy ones like OpenStreetMap, where you just send out your command (where's the nearest coffee-shop?*)to their server, and get back a bunch of data in return. Nope. What we're looking at is the wider world of WS, where instead of sending a command to OpenStreetMap, we could look up a directory of different services, and see that perhaps a whole selection them offer searches for nearby coffee shops. Brilliant! We have a choice! So, how do we choose which service to query?

This is where quality of service comes in. In that directory service, all those WS that are listed will have given their quality of service attributes - how much the service costs, how quickly the service will guarantee to respond to your request, any security standards they can support for data transfer, and so on.

This is all well and good and relatively simple - you just pick the service that matches any criteria you have, and if it's unavailable, you choose another one. Simple. But it gets more complicated, as I said, when you start chaining services together. Let's say that you want to know how much the coffee at the nearest coffee chop costs, and we'll imagine that there are webservices which (a) locate the nearest coffee shop and (b) keep track of the prices at coffee shops. Let's even get carried away and imagine that there are several services which do both of them.

Now we have to look at both services put together - the service composition - and total up the quality of service for both of them. So if we want the reply to come back within 10 seconds, we'll need to make sure to pick two services whose response times, when added together, come in below 10 seconds.

This is still pretty simple stuff, but let's now imagine a situation where this service composition contains ten or more services, all chained together to form one super-service. Each of those ten services that make it up might have ten or twenty different options to choose from out there on the web, which means that we end up with a variability in the overall quality of service that, frankly, makes your head spin.

In the algorithm I'm writing to try and help pick out the best combination of services to chain together, I'm going to have to come up with some kind of method to measure quality of service in some kind of meaningful way.

To start with, I'm going to be limiting the QoS factors I'm dealing with to the basic few: cost, response time and service availability. This gives a nice balance between positive and negative factors, but limits the complexity to something that should be easy enough to handle. So the next step is to try and decide how to store these. Obviously an array will give nice fast data access, but is it flexible enough to survive in the real world, where we might end up dealing with hundreds of QoS factors? Time for another thinking cap, I guess!

* I'm trying to cut down on my coffee intake. Does it show? ;-)

Wednesday, 12 January 2011

Fitness

No, it's not a New Year's Resolution that I'm talking about today. It's a set of thoughts that I'm having about how to measure the fitness of a solution to a difficult problem.

Let's imagine that we have a set of four tasks to do, and ten workers who can do them. Each worker has a skill rating at each of those four tasks, a cost to employ them to do the task, a reliability rating to see if they actually show up for the task, and maybe a few other attributes as well.

We're going to start by randomly assigning workers to tasks, and see how good those solutions are. So how do you measure the fitness of a given solution to the task at hand?

Well, we'll start by assigning positive or negative qualifiers to the different attributes of the workers:
Skill rating; reliability rating: these should clearly be maximised for any given task.
Cost: this should clearly be minimised for any given task.

Then we'll let the manager decide which of attributes is the most important, either by simply ranking them in order, or by assigning percentages to each one. (eg: they could decide that the priority should be 50% based on cost, 40% based on reliability, and 10% based on skill rating)

And now comes the hard part: normalisation! Let's say that we measure the time taken to complete the task in hours, the cost to complete the task in pounds, and the reliability of the worker as a percentage. How do we unite those completely different measurements?

Well, I've a few ideas, and they start with some kind of normalisation.

1) Normalise each value and then multiply by prioritisation:
- let's look over all of the values for a particular attribute (say skill), and then divide all of those attributes by the highest one. We'll end up with the highest skilled worker being rated with a skill of 1, and everyone else rated between 0 and 1.
- Repeat that for all of the other attributes available.
- multiply each of the attributes by the ranking factor, or the priority percentage.

2) Distance from average:
- again, we'll pick a particular attribute, and then calculate the average (mean) value of that attribute.
- Now, we'll calculate the distance from that average for every worker.
- Then, we'll normalise those distances to lie in the range -1 -- +1
- Finally, we can apply the priority factors.

Which of these would give a better set of results? Any gut feeling answers on a postcard?! In the meanwhile, I'm going to have a poke around in the literature to see if there are any recommendations there...