Monday, November 14, 2011

Centroids

Here's another one from my clustering code, slightly modified for explanatory purposes. This post was spurred by a conversation with a couple of friends when we were talking about programming, who said that programming languages mostly had the same things (i.e. everything old is new again). While I'm not completely in opposition to that, I think that some of the newer languages (read Ruby) do have a lot to offer when it comes to compactness.

Here the problem is to find the centroid of a number of points. Basically, we want the averages of all the first values of the points, the second values of the points, etc. Let's show an example ...

If we have [[x0, y0, z0], [x1, y1, z1], ... [xn, yn, zn]] as our points then the centroid would be the point given by [sum(x's) / n, sum(y's)/n, sum(z's) / n] (and here I'm too lazy to figure out how to create the capital Sigma for sum). Think about how you might solve this in your favorite programming language. In C/C++ anyway, you'd need to know in advance how many points and how many dimensions (the example above uses three, but it could be more or less). In Java, you don't need that, but I don't think you could do something like this ...



def centroid(x)
x[0].zip(*x[1..x.length-1]).map { |v| v.inject(:+) / v.length.to_f }
end


OK, so what's going on here? First up we're going to take the first element of the array (above the [x0, y0, z0]) and zip it with the rest of the array with *x[1..x.length-1]. This should give us, once again from the example above [[x0, x1, ...xn], [y0, y1, ... yn], [z0, z1 ... zn]]. So basically, a new array with all the x's gathered together, the y's, etc. Now comes map. Here, we'll take each element, which is itself an array and then use inject to run through each of the items, say [x0, x1, ... xn] and sum them together. Finally, we'll divide by the length of the element to get an average. If it's hard to see this, break it into pieces and then run it through using irb. It should be a bit easier to see there.

So, what's the point here. Well, essentially you could this in any programming language or in fact any Turing machine, but the language itself can make it easier freeing you to do other things or harder.

Let me know if you have any questions or comments.

No comments:

Post a Comment