Jump to content

  • Log In with Google      Sign In   
  • Create Account





Article Updates.. And Search Can Be Challenging

Posted by Michael Tanczos, 24 April 2013 · 614 views

First off I am utterly blown away by how the community is coming together to start posting articles. One of my biggest hopes for this approach is that we can really start exposing the very best articles available on the internet in one place. At the end of the day learning is the most important thing for all of us.. and while stackoverflow is great for short answers, there isnt' really a good research counterpart for game developers.

This idea reflects a certain reality that at the end of the day people are going to write about what interests them, and a clinical treatment of a topic with rigid constraints isn't necessarily all that appealing. So my challenge instead is to automate the connection-building between articles.. which given the state of information retrieval is certainly doable without having to invent any new IR algorithms.

The ultimate goal is to really start scouring the Internet for game development resources so commonly scattered across blogs in varying formats and reproduce them in a common article format for everyone to freely use. If we are able to expose all the connections between articles it should make for a nice information archive with some permanence.. unlike some articles that just drop off the net once they close up their wordpress account.

My first tests have involved using Apache Solr to create an article index. Out of the box it's turned out to be quite versatile for this purpose. Over the past few days I did go about getting a little crazy though.. extracting all the document term vectors from the Solr Lucene index and running a cosine similarity check against them exhaustively. I had to write a job that would utilize Amazon's Elastic MapReduce service and turn a problem that was going to take roughly a week to process down to one that would take roughly six hours.

The funny thing is, after doing all that I ended up discovering a feature in Solr that when turned on would meet about 75% of my immediate needs. The next challenge is going to be finding a way to build document clusters.

The approach I'm currently shooting for may involve the community training some sort of learning machine to have an ideal model of articles for a particular subject and then it would work to classify other articles that match up closely. That way if you create a class of documents called "Component Entity Systems" then you would hopefully be able to find all articles that match up.

At the end of the day single well-organized lists of articles can still be very useful if they are well-organized. That is challenging to do automatically..

In my ideal world, as you use the site the site would begin to morph to show you more stuff that matches up with your tastes. Apache Mahout exposes something like that.. but I haven't experimented with it too much yet.




In my ideal world, as you use the site the site would begin to morph to show you more stuff that matches up with your tastes.

Hopefully you don't go too crazy with this. As often as I post/talk/ruminate/pontificate/wildly-gesticulate about Perlin noise and random stuffs, I'd sure hate if that's the only kind of article I ever saw... biggrin.png I have almost 0 interest in most of what might be called "graphics programming", but it would still be beneficial if a few such articles crossed my view every so often.

What I'm referring to is largely more of a recommendation engine, which gives you stuff you are most likely to be interested in.   Stackoverflow has a frontpage that does that once you start subscribing to tags.   That's something I'd like to be able to do more of here.. right now our frontpage does that based off of your forum participation.  But then.. it could be wrong.. if you participate in the beginner forum a lot and are a pro you are likely to see a ton of beginner posts (which may not actually interest you)... BUT, because you answer so many anyway that may be what you'd most likely end up looking at anyway if you were just randomly browsing the forums.

Check out http://prediction.io/ it's another prediction platform.

December 2014 »

S M T W T F S
 123456
78910111213
14151617181920
21 222324252627
28293031   
PARTNERS