Not quite multidimensional scaling, but close...

Started by
-1 comments, last by kirkd 13 years, 5 months ago
I'm using a self-organizing map to model very high dimensional spaces in the chemistry domain, and I want to get a little extra information from the resulting map than which data points belong to which nodes. The short version of the story is this:

Once the map is trained, I can easily plot each of the nodes to a regular grid. The map is trained using a square grid and each node of the grid has a dual representation - 1) each node has a position in the grid, (X,Y), and 2) each node occupies a position in the high-dimensional space, a vector of weights, W. Here's a picture to simplify things a bit:



Uploaded with ImageShack.us

Once the map is trained, I typically show the grid representation on the right of the picture above, and you can find all the data points that are closest to a particular node. More information is embedded here, though, in that I could potentially see how a data point is positioned relative to all the other nodes as well. For, example, the red X represents a data point of interest which finds the center node as the closest, but it is closer to the lower right than the upper left.

What I would like to do is to find an appropriate mapping for that data point on the regular grid on the right. I'm considering the problem this way -

Construct the vector dSOM where dSOM-i = distance between the data point and the nodes of the SOM using the 2-dimensional positions.

Construct the vector dHIGHDIM where dHIGHDIM-i = distance between the data point and the SOM nodes in the high dimensional space on the left.

Normalize both vectors.

Define a stress function S = || dSOM - dHIGHDIM ||, which || . || being the 2-norm.

Find (X,Y) for the data point that minimizes the stress function. Note that the X,Y positions of the SOM nodes do not change, and that the representations in the high dimensional space do not change. I'm only needing to solve for (X,Y) for the data point of interest.

Does this seem like a remotely reasonable approach? Any alternate ideas or suggestions?

-Kirk

This topic is closed to new replies.

Advertisement