Trivia game using internet as the question source

Started by
6 comments, last by Servant of the Lord 8 years, 3 months ago

I'm toying with the idea of building a trivia game, but instead of making the questions myself, or outsourcing them, I'm thinking of having the game go out onto the internet and create the questions itself.

I havn't decided if the questions will be multiple choice (chose A,B,C,D) , or fill in the blanks (you have to actually type the answer), or some other question type. Multiple choice makes for a better game (IMO), but I have no idea where I will get the 3 fake answers from. Fill in the blanks would be much easier to program, but it makes for a slower game because people have to type, and also its very tricky because there are usually multiple ways to spell things, and it would be nice to allow minor spelling mistakes.

Also, instead of using the entire internet as a question source, I might restrict the game to using Wikipedia only - that way I can use certain assumptions about the data format, etc. It might make things a lot easier. It would also give me a really easy way to have categories (eg Arts, History, Geography, etc) because Wikipedia articles are already grouped that way.

So for example I'm on Wikipedia now and I just clicked on "Random Article". This got me the page on Heathrow Airport:

https://en.wikipedia.org/wiki/Heathrow_Airport

Lets look at the first sentance in the article:

Heathrow Airport (IATA: LHR, ICAO: EGLL) is a major international airport in west London, England.

Suppose I wanted to use this for a trivia question. There are lots of ways I could make a question from this sentence, some ways no doubt being harder to accomplish than others.

Example 1: multiple choice

________ Airport (IATA: LHR, ICAO: EGLL) is a major international airport in west London, England.

A:) Edinburgh

B:) Bristol

C:) Heathrow

D:) Newcastle

How could I accomplish something like this? My question generator would have to know that Heathrow is an airport, and would have to go and look up 3 fake airports to display, along with the correct answer. Even further, These are all airports in the UK to make it a bit trickier. I have no idea how to write a program to do something like this. Would this require some fancy artificial intelligence, or am I missing something?

Example 2: fill in the blanks

________ Airport (IATA: LHR, ICAO: EGLL) is a major international airport in west London, England.

-> Player must type "Heathrow" to win

This seems much easier to program, however its not as fun of a game in my opinion. Also, I would need to have an algorithm that accepted minor spelling mistakes,so for example "Hethrow" should be accepted. I dont think this algorithm is that difficult but it is a consideration. Also there could be scenarios where a completely different answer with completely different spelling is also correct. For example if Heathrow also had a second unofficial name, I might want to accept that answer as well.

Backing up a bit, how do I even choose "Heathrow" as the guessword from this sentance? How could the question generator know that Heathrow is the most interesting and fun word to blank out? Afterall, words are just words to the program. What if instead of Heathrow, it chose "major" as the guessword? That would be a really stupid and furstrating question. I'm starting to think this program might be way to difficult to build.

Would I need a database or library that classifies each word as a noun, adjective, verb, etc? That might help with choosing which word to blank out.

Any ideas on this topic would be appreciated. I'm hoping I don't need a PHD in linguistics to pull this off!

Advertisement

There's actually a kid from Singapore that made a website that does exactly this for science fair.

It takes an input, and generates questions based on it. I don't recall what it was called, but the math behind it was insane.

What will you make?

Would I need a database or library that classifies each word as a noun, adjective, verb, etc? That might help with choosing which word to blank out.

Any ideas on this topic would be appreciated. I'm hoping I don't need a PHD in linguistics to pull this off!
This is in the middle of Web 2.0, I think, where websites annotate meaning to words. I don't know anything about it though.

If you manage to convert to a question, I think a next problem is trustworthiness of the information. Say you find "1+1=3". I know that everything on the Internet must be true, trouble is, it contradicts itself at times :p

Another issue is perhaps the amount of niche information, eg "A Bézier curve of degree n can be converted into a Bézier curve of degree n + 1 with the same shape." (https://en.wikipedia.org/wiki/B%C3%A9zier_curve#Higher-order_curves) how many people would find that interesting?

Would I need a database or library that classifies each word as a noun, adjective, verb, etc? That might help with choosing which word to blank out.

Any ideas on this topic would be appreciated. I'm hoping I don't need a PHD in linguistics to pull this off!
This is in the middle of Web 2.0, I think, where websites annotate meaning to words. I don't know anything about it though.

If you manage to convert to a question, I think a next problem is trustworthiness of the information. Say you find "1+1=3". I know that everything on the Internet must be true, trouble is, it contradicts itself at times tongue.png

Another issue is perhaps the amount of niche information, eg "A Bézier curve of degree n can be converted into a Bézier curve of degree n + 1 with the same shape." (https://en.wikipedia.org/wiki/B%C3%A9zier_curve#Higher-order_curves) how many people would find that interesting?

So let people vote for how good the trivia question they got is, and the system will regulate itself.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

So let people vote for how good the trivia question they got is, and the system will regulate itself.


Or go back a step and let people submit questions in the first place. I realize that doesn't fit the original plan to source questions from the Internet, but I fear that doing a good job of that would require building strong AI. Crowd-sourcing the questions would be much easier.

There are actually several web services out there already that provide trivia questions but, this is probably not what you want to do. I guess the way to go would be to make your own web service that does some kind of data mining to determine what your thing is then find other kinds of this thing.

In your first example you randomly found Heathrow Airport now from the description it tells you that it is a "international airport", this would be even more apparent if you use the wikimedia API to parse the result you will be able to find what categories it falls under and only search within these categories so then do a wiki search for international airports and filter by the ones that are in the UK but not in London and pick three of these to be your alternative results.

Obviously you are going to end up with some pages that just won't work for example you find an airport for a country that only has a single airport. In that case you need to ditch your first result and then try again.

You need to play around with the API and massage your algorithm until you end up with something that is suitable.

If you really wanted to get fancy you could write and teach a neural net to pick decent trivia questions which I think is what the kid from Singapore did for his Science fair entry.


Backing up a bit, how do I even choose "Heathrow" as the guessword from this sentance? How could the question generator know that Heathrow is the most interesting and fun word to blank out? Afterall, words are just words to the program. What if instead of Heathrow, it chose "major" as the guessword? That would be a really stupid and furstrating question. I'm starting to think this program might be way to difficult to build.

Would I need a database or library that classifies each word as a noun, adjective, verb, etc? That might help with choosing which word to blank out.

No there is no need to do this. You don't have to work with just raw text. Using the wikimedia API you can categorise things, list things, combine things, exclude things. You probably don't actually need to manually parse any text.


Your real task here is getting to grips with the wiki API.


Another issue is perhaps the amount of niche information, eg "A Bézier curve of degree n can be converted into a Bézier curve of degree n + 1 with the same shape." (https://en.wikipedia.org/wiki/B%C3%A9zier_curve#Higher-order_curves) how many people would find that interesting?

In this case you could make sure that you only ever search within specific categories in the first place. Most people wouldn't find maths questions suitable for a general purpose Trivia quiz so you can code your algorithm around trivial pursuits type categories rather than just any random wikipedia article.

Thanks for the ideas!

I'm not against letting people submit questions. That seems like it would only work once the game is big and successfull though, not good for early stages.

I'm also liking the idea of using an API/service to get questions. This one is pretty cool - I think its all the Jeopoardy questions: http://jservice.io/

I would want to find several services like that, and randomly get questions from them, therefore keeping things interesting and also protecting against a service being down halting the game.

Lots to think about, thanks!

This one is pretty cool - I think its all the Jeopoardy questions: http://jservice.io/

While the knowledge itself is not copyrighted, the wording of the question is automatically copyrighted. You can't infringe upon Jeopardy's labor.

For example, that service gave me the question:

Difficulty: 800
Category: presidential pastimes
Question: Even when pushing 300 pounds, this 27th president enjoyed playing golf & tennis
Answer: Taft
The phrasing of the question is copyrighted, and the question (of any wording) combined with that difficulty value and category wording would be arguably copyrighted.

This topic is closed to new replies.

Advertisement