What do you think of a virtual computer user program?

Started by
4 comments, last by VanillaSnake21 4 years, 7 months ago

I'm not sure which forum this kind of a question belongs in, so sorry in advance if it's a bit off topic on this particular board as I'm neither looking for gameplay nor general programming advice but rather software design advice and assistance in selecting proper APIs/technologies to implement a design (I did considering posting this in Game Design but this is not a game so...).

I have an idea for a program that I've wanted to implement for a while so feel free to critique it or let me know if it's already been implemented.

Here's the idea: I'd like to make a computer automation program that could control your mouse and keyboard but with a twist that it would use computer vision to help it navigate. So in essence I'd like to make something like a virtual AI user, where the program can operate a computer by itself given a set of directions.

So just to give an example: I'd say "Open site gamedev, and create a new post in a gameplay programming category and title it What do you think of a.." the program would then take over the mouse, look for the browser icon on the desktop or in the start menu, click on it, look for the search bar, click on it, type in the site address, navigate to this site, find the right forum etc.

Of course I'm not looking to have this crazy AI that can do that all without being shown at least once or trained a bit. But after training I'd expect it to know the layout of this site, how the "Create New Post" button looks, how the search bar looks, etc. So kind of like how an actual person uses a computer, instead of iterating directories and issuing browser commands it would just use vision in order to interact with the computer solely using the mouse and keyboard (like we do).

It would also be a bit different than something like AutoHotKey because for example if something gets updated on the site, and the "Post Now" button moves to the bottom of the screen, since the program is not relying on exact positions but on visual input it would still be able to follow commands. 

Do you think this is something worth writing? Why or why not? And what are some difficulties that I might encounter?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

Advertisement

Worth considering, yes.  Doing countless menail tasks while you take a break is super useful.  Is it worth the effort?  I actually can't tell.  I personally don't think so, but I would suggest having a go at it, but not investing too much time.

I am an indie game developer who enjoys pixel art games.

Automated testing and unified functional testing is a thing. It does things like sending commands such as keyboard inputs to the local or a virtual computer. Is https://smartbear.com/product/testcomplete/overview/ the kind of thing you mean?

Hey VanillaSnake,

It might be fun to work on, and to test your grits as a software engineer, but it's certainly been done before. The type of software you are looking for is called, productivity applications 

On Windows, to name a couple:

  • Macro Recorder
  • RoboTask

On my Mac that I'm forced to use for work, I use alfred quite a bit.

However, if you're looking to automate some web-crawling process that you need to actually interact with pages as opposed to just scraping them, I've had quite a bit of luck with the Selenium Framework both for UI testing, and as a general web-crawler/driver

Edit

I didn't address the AI aspect. I suppose that can be an interesting twist to other applications that do similar things. However, not even delving into supervised/unsupervised learning, and providing training examples, but with such workflows that are set-in-stone/extraordinarily deterministic I'm not sure if it would be helpful?

It would be like, "I trained this AI application to do this thing" as opposed to, "I automated this task where there were no ambiguity in the steps". It depends on what you want to do ?It would be cool to see! But I don't know if practicality is a goal you have, or simply having fun. 

11 hours ago, TeaTreeTim said:

Automated testing and unified functional testing is a thing. It does things like sending commands such as keyboard inputs to the local or a virtual computer. Is https://smartbear.com/product/testcomplete/overview/ the kind of thing you mean?

Yep, this is kind of what I was looking for. After downloading a trial and playing with it for an hour I kind of get an idea of what such an app might feel like. It's extremely close to what I had in mind it's just taking a different approach. So I guess I'm now questioning my intention of using vision vs what this app does, which is just direct system hooks that allow it to read info directly. Thanks, saved me a bunch of time.

 

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

This topic is closed to new replies.

Advertisement