JSON/CouchDb/Lucene searching via the web

Started by
8 comments, last by Dan Violet Sagmiller 11 years, 2 months ago

Engine = Unity3D

Language = C#

Data Provider Service = Cloudant (JSON, CouchDb, Lucene)

I'm constructing a city builder game, which does not require network latency for most things and the client will connect directly to the database service.

I can read and write JSON to and from the service, and to and from objects.

What I have not figured out how to do is search a Lucene database. everything I'm finding involves installing a Java library and working with that. Isn't there a way to do this by protocol? Is there a particular post/get structure or JSON object I need to pass in to make searching work?

I guess I'm also trying to make this work like SQL.

I have Player's, and I get their info, but now I would like to get all CreatedBuildings with the (non-key field) PlayerId = the player in question.

Perhaps I'm approaching JSON/Lucene incorrectly and require a paradigm shift.

Thanks.

Moltar - "Do you even know how to use that?"

Space Ghost - “Moltar, I have a giant brain that is able to reduce any complex machine into a simple yes or no answer."

Dan - "Best Description of AI ever."

Advertisement

Another question I would have on this, is in regards to how the search libraries work.

I presume I'm wrong, because this seems terrible, but it seems like the search happens by the service downloading all the records with keys in a particular range, then locally extracting only the ones they want. While the game I have is not in strong need of fast response times, if a user has to download 10,000 objects to get the 1 that they want, that seems bad.

Of course my original knowledge is based on SQL, and Again, perhaps there is a different method I should be using for Lucene/Couch Db queries.

Moltar - "Do you even know how to use that?"

Space Ghost - “Moltar, I have a giant brain that is able to reduce any complex machine into a simple yes or no answer."

Dan - "Best Description of AI ever."

the client will connect directly to the database service

How will you prevent clients from cheating?

In general, you will want to keep an application/game server between clients and your data store.

What I have not figured out how to do is search a Lucene database.

Use a wrapper such as Solr which provides a HTTP / XML interface on top.
enum Bool { True, False, FileNotFound };

How will you prevent clients from cheating?

The Service is connected via https, with encrypted key to start.

Use a wrapper such as Solr which provides a HTTP / XML interface on top.

My project is in C#. Solr is Java based. Also, I already have all the http/ssl code in place and functioning for direct object manipulation. What I lack is the object searching when its not the key field.

Essentially, I'm hoping to find a protocol based system, instead of code. If I can see what the communications are, I can easy modify my code to support it.

Moltar - "Do you even know how to use that?"

Space Ghost - “Moltar, I have a giant brain that is able to reduce any complex machine into a simple yes or no answer."

Dan - "Best Description of AI ever."

The Service is connected via https, with encrypted key to start.

Any user who knows how to use reflection to edit your C# assemblies to make it send the data they want, can cheat?
Any client that knows how to form HTTPS requests can cheat?

My project is in C#. Solr is Java based.

Why does it matter what language a service is written in, if you just use it as-is? Solr has a HTTP interface. You POST to it, telling it to index things. You then GET (or POST) from it to search the index. Nowhere in that situation do you need to know Java.

However, from your descriptions, it sounds like you are a web or IT developer heading down game development in a catastrophe course. The architecture you describe is highly unlikely to actually be well suited to the real challenges to create multiplayer online games, bot from a performance point of view, and a anti-cheat point of view.
enum Bool { True, False, FileNotFound };

I was hoping for more of a discussion on how to get the searching working. And had not detailed the security in my description, as that was not the focus here. However, I'm thankful for your proactive assistance in the other areas. I would rather have people post potential serious issues, than let them go. So to respond to your concerns:

Any user who knows how to use reflection to edit your C# assemblies to make it send the data they want, can cheat?

Reflection has security as well, and is incredibly easy to implement where I want it. http://msdn.microsoft.com/en-us/library/system.security.permissions.reflectionpermission.aspx

Any client that knows how to form HTTPS requests can cheat?

The prior text had also included the encrypted key. First an SSL connection is established with the service, but then that service also requires a Key & Token, which is already encrypted. It does not matter if someone can connect to the SSL, they still need to login information, which is only established after the SSL is.

Language = C#

What I have not figured out how to do is search a Lucene database

Why does it matter what language a service is written in, if you just use it as-is? Solr has a HTTP interface. You POST to it, telling it to index things.

I was referring to the client side. As I originally posted that my code is in C#, the use of a Java Library is not very effective here. I was originally asking how to Search the database. I understand how to connect to it already, how to add things and get things based on a key or a key range.

I can read and write JSON to and from the service, and to and from objects.

I have Player's, and I get their info, but now I would like to get all CreatedBuildings with the (non-key field) PlayerId = the player in question.


You then GET (or POST) from it to search the index.

Indeed, I have already clarified that I know how to get and post based on the index, or the ID, a Key Index field.
What I am asking about is how to SELECT * WHERE BuildingOwnerPlayerId = "[playerId]" players can have many buildings. In SQL, this was dealt with by giving the PlayerBuilding Table a field called OwnerId or PlayerId. then executing the query "SELECT * FROM PlayerBuilding WHERE OwnerId = 'playername'". I'm trying to figure out how to repeat that in JSON through Lucene, or any library that can connect to it.

Nowhere in that situation do you need to know Java.

I have been looking for a way to search non-indexed/non-key fields in a JSON Database. Particularly Cloudant.com. This is on a C# client, I was asking if anyone knew of either a Library to search with, or preferably, just the protocol that I would use to connect with cloudant/JSON/lucene, in order to search a non-Key field.

However, from your descriptions, it sounds like you are a web or IT developer heading down game development in a catastrophe course. The architecture you describe is highly unlikely to actually be well suited to the real challenges to create multiplayer online games, bot from a performance point of view, and a anti-cheat point of view.

I mean no disrespect, but I have over 20 years of game development experience, including teaching Design/Programming/Physics/AI/sql at multiple colleges. The Architecture, if you look at the original statements,

I'm constructing a city builder game, which does not require network latency for most things and the client will connect directly to the database service.

You will see that the client does not need to bother a larger scale service. At the point where the multiplayer intereactions begin to increase to a point where data updates are not reasonable, I will implement a service to manage those sections as need be. I.e. Cloudant is a service that automatically grows, spans servers, and maintains speed as the data source gets huge. And nearly every call to a service I would have to host, would simply be making calls to the data source itself and essentially feed the nearly raw data back to the client. In this case, I can use this service directly. putting a service in the way would require me to setup my own redundancies, add additional network layers to bounce data around and cost more in hosting services, all while still using the same data calls the client could have made.
I understand from traditional networked gaming, I would use TCP or better, UDP with some form of packet control per delivery importance. In fact the first thing I built was a UDP client Server that was sending this information back and forth. But after a few messages programmed into it, It occured to me that everything I was doing was just delivering the data in near original condition. After a review of the security and data protection, I decided to go this route. If I find issues, I can alwys throw a service in between later, by changing out the messaging classes to ones that connect to a service instead.
the network latency on most calls to this service have been proving to be around 100ms or less. That will be fine for a City Builder.
Now, I'm hoping that this architecture explanation is out of the way, and that we can focus on the real issue:
I would like to be able to search through the JSON documents for alternate fields. If there is a way to hook up a secondary searchable index, that will work, but I need to know how to do it. If there is a way to simply do this through a special post or get, great, I can add a method for that in my code with ease.
At present, I'm going to take a Parent links to the children approach, by adding arrays to the parent nodes to hold references to the children. In SQL we do the reverse, where the child row carries references to the parent. So I'm changing my data model to match what appears to be how JSON handles this.
What I'd like to hear, is how to search CouchDB/Lucene/JSON via a query string, when the field I'm searching is not the ID/key index. Also, any suggestions if I'm using JSON incorrectly by using a parent linking to children approach in the data.
Thanks.

Moltar - "Do you even know how to use that?"

Space Ghost - “Moltar, I have a giant brain that is able to reduce any complex machine into a simple yes or no answer."

Dan - "Best Description of AI ever."

I can't emphasize this enough: You do not show any understanding of the actual attack vectors of cheating clients. I won't say it again, though; twice should be enough. There are plenty of threads on this forum, and resources on the internet, about why talking about encryption keys or SSL certificates in the same breath as "cheat prevention" is pointless. They don't solve that problem; they solve another problem (that of a malicious third party without physical access to the client machine.)

The specific part I strongly advise you to re-consider, you actually quoted again:

the client will connect directly to the database service

Anyone who has actually shipped and operated a successful multiplayer game will tell you that you just *Don't* *Do* *That* (tm). So, the solution you're asking for may actually not be needed, because you're looking for a solution based on an architecture it sounds like you really shouldn't be using.

If you simply use the cloud service as a "save game" repository, and you don't care at all that certain players will figure out how to give themselves a bazillion zorkmids or whatever, then you *may* be OK, assuming the authorization system of your databases is water-tight. But it doesn't sound from your description like that's what you're trying to do.


So, regarding your question: Are you using a separate Lucene database on each client, with a separate data store on each client? You were listing it in the "data provider: Cloudant" section above, so I assume that it's a server-side instance.

If it's a server-side instance, your client already has an API to search a Lucene database on non-primary keys: Do a POST (or GET) to a Solr instance running on the server, where your Lucene store is (now, managed by Solr.) Given the question you have outlined, paraphrased: "How do I do secondary index searches from a client on a data store that's on a remote cloud-based host?" I still think that's a fine solution. If you don't think so, then there must be some other constraint I'm not understanding.

If, for some reason, this is still not good enough, then you can use another cloud-side data store that provides indexed secondary fields. These include Amazon SimpleDB, and Google BigTable. If you need to host the data yourself, you can choose a document store that actually supports secondary indices natively, like Riak, or HBase. Granted, they are different from Lucene, but I don't yet understand whether there is some hard requirement to use Lucene (without Solr.)
enum Bool { True, False, FileNotFound };

Ok, I think we are arguing on different points.

If I am not mistaken, one of your key points is that once a user has gained access to the data service, they have complete control of anyone's data.

And that by going through a service, if they gain direct access to that, they are limited to only their account, and limited further because the service would expect certain commands and have fail-safes to prevent paying -1000 and gaining a super structure 10 levels higher than they should have access too.

- I understand this. I see how dumb the idea of trusting that no one will ever somehow break the SSL/credentials needed. I had trusted that credentials over SSL was actually a bit more secure. And just for the risk I'll step away from this practice.

But this leads me to the item I thought we were arguing about: SSL with credential not actually being secure, that it can be hijacked and apparently with ease. I don't understand how credentials sent over SSL is not secure? If I were connecting to a web service, or some other server type, I would probably use SSL with credentials as well. I would appreciate it If you can post a link to an article that shows the fault there, along with what I should be doing instead. (to add a point of clarification, this is client to server credentials, not user login credentials) (of course in the confusion with the Database vs service access, perhaps you weren't touching the SSL/credentials issue at all?)

Finally, I come to Solr, which I can tell we both had mistakes on. First, Solr is not part of Lucene, but an addon, which is downloaded separately. And more importantly in my case, Cloudant does not use it. My mistake was my comment of C# to Java. When I first skimmed it, it was talking about downloading the java for it, I had presumed it was a client based tool. It would appear that based on this new insight to Solr, that I would require it to get the server-side non-key searches to work, but I just don't have that option where I am at.

Presuming that Solr was an option, my original question, is "what is the protocol". Post/Get is not the protocol. I would presume it would be something like GET="http://server/db/?PlayerId=testUser" to return all docs where it has a field called PlayerId with the value of testUser. This is the protocol information I'm missing, and will still need to know if I shift to a different data provider with Lucene w/Solr.

Thanks

Moltar - "Do you even know how to use that?"

Space Ghost - “Moltar, I have a giant brain that is able to reduce any complex machine into a simple yes or no answer."

Dan - "Best Description of AI ever."

But this leads me to the item I thought we were arguing about: SSL with credential not actually being secure, that it can be hijacked and apparently with ease. I don't understand how credentials sent over SSL is not secure?

Correctly implemented SSL is perfectly secure against a third-party man-in-the-middle attack, if you don't use Zlib compression (!)
The problem is that a cheating player is not a man in the middle. A cheating player has full control over the machine executing your code and sending the SSL data. Thus, the cheating player will attach to your executable, and force it to send a perfectly valid SSL message that says "I have eight bazillion Zorkmids" and your system will happily oblige. Tools to do this to *any* game are readily available for download.
Or the user will extract the SSL certificate from your executable installer, and write his or her own script that uses those certificates, but send their own packets entirely from scratch.
Because the client code is entirely under the control of the cheating player, nothing you do on the client can protect against this.

Regarding talking to Lucene from a remote client, I don't know how to do that or if that's possible. If you're locked into particular services from a particular host, perhaps you should ask that host what they suggest :-)
However, if you end up actually wanting to defend against cheating players, then you will end up putting authoritative game logic on the server instead of the client. And, at that point, the search query is all server-side, where presumably the provider already has libraries you can use.

That being said, I'm still not convinced your data model is going to work very well for a large scale game. Typically, you'll want to keep all the data entities in RAM, and your "find all buildings created by X" will be a simple hash table look-up. The RAM-to-persistent-storage operation is a separate step that happens on some kind of timer, and/or after especially important events, and the storage-to-RAM operation typically happens on system start-up and/or on user login.
enum Bool { True, False, FileNotFound };

As an update, I've switched to an encrypted TCP service for the easier 2-way traffic. And the service will only affect for the logged in user and apply logical limits to player abilities, and flag hack warnings, when boundaries are crossed.

I spoke with Cloudant and they don't appear to have non-key search abilities, so I'll switch to a local db, which is fine since I'm implementing the service anyway. and the original problem goes moot.

Thanks foryour help.

Moltar - "Do you even know how to use that?"

Space Ghost - “Moltar, I have a giant brain that is able to reduce any complex machine into a simple yes or no answer."

Dan - "Best Description of AI ever."

This topic is closed to new replies.

Advertisement