Good tools to start working with machine learning / data mining?

Started by
4 comments, last by warhound 6 years, 10 months ago

I'm looking for some tools to start creating some machine learning projects. I do have a little bit of experience with this topic, in that I've got a pretty good understanding of most fundamental classifiers and models. I've used WEKA a bit, and it seems good, but it wouldn't surprise me if there were newer/better tools.

TensorFlow is completely new since I've last done any of this stuff. I don't really know anything about it, although I've heard that it doesn't really "do" anything that's fundamentally different from what traditional neural nets do. In any event, I'm pretty sure it's not what I'm looking for right now. I'm more interested in something I can experiment with quickly, ideally with a UI of some kind and not too much setup. I'm lazy, so by the time I got TensorFlow to even work on Windows, I'd pretty much given up. I think I'll try going back to it once I'm actually comfortable with what I'm doing.

I'm also interested in finding large, free data sets for training models. Pretty much anything will do, including text, audio, video, images stock prices, betting markets, brain scans, etc. This and this sort of thing are extremely interesting to me, since they pretty much seem like magic, but so far I have no idea what projects I want to try myself.

Anyone else had some success with doing this?

-~-The Cow of Darkness-~-
Advertisement

I cannot really tell whether tensorflow is the best tool for what you plan to do, but it is easy to get working on Windows actually.

https://www.tensorflow.org/install/install_windows

Install Python 3.5

if you want gpu acceleration:

install CUDA8.0 from nvidia

install cuDNN5.1 from nVidia

pip install tensorflow

then launch python

You'll have to check a few dependencies, but that's it.

Now, if you want to ship tensorflow with your game and use the C++ API, there is a link to it on this page.

Thanks. I had managed to get it working on the GPU eventually, but it just wasn't as smooth of a process as I would've hoped, and I felt like I might be able to get just as much out of something simpler.

It's probably worth pointing out that this is strictly for recreational purposes . I'm much more interested in something to play around with so I can maybe build up some kind of foundation and intuition, and then ideally see where it takes me from there. If this lends itself to a game or even any kind of programming project at all, then great, but if not, also great. On the other hand, if talking to TensorFlow or some other library through a C++ API seems like the most direct route to making it do things, then I'll dive right in.

-~-The Cow of Darkness-~-
I use Scikit Learn with Python for all of my ML projects. It's really easy and straightforward to use and get of the ground.

I've done quite a bit of text mining work. If you're interested in tweets and sentiment, this was a good dataset:

http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/

I've done a lot of random stuff in machine learning. What specifically are you interested in?

No one expects the Spanish Inquisition!

I use Scikit Learn with Python for all of my ML projects. It's really easy and straightforward to use and get of the ground.

I've done quite a bit of text mining work. If you're interested in tweets and sentiment, this was a good dataset:

http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/

I've done a lot of random stuff in machine learning. What specifically are you interested in?

Ooh, this is definitely the sort of thing I'm looking for. I'm not sure if I'm looking for any particular type of data set at this point other than something that'll help me maintain interest. I guess if I had to choose, I'd pick stock prices and brain scan data. Brain scans seem just inherently interesting, and stock prices because it seems like being able to predict even incrementally better than what I can do without machine learning would have some tangible value.

-~-The Cow of Darkness-~-

I use Scikit Learn with Python for all of my ML projects. It's really easy and straightforward to use and get of the ground.

I've done quite a bit of text mining work. If you're interested in tweets and sentiment, this was a good dataset:

http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/

I've done a lot of random stuff in machine learning. What specifically are you interested in?

Ooh, this is definitely the sort of thing I'm looking for. I'm not sure if I'm looking for any particular type of data set at this point other than something that'll help me maintain interest. I guess if I had to choose, I'd pick stock prices and brain scan data. Brain scans seem just inherently interesting, and stock prices because it seems like being able to predict even incrementally better than what I can do without machine learning would have some tangible value.

Hmm, well stock prices have tons of datasets out there, since it's one of the most popular applications for ML out there. A google search turned up this, which looks pretty good:

https://archive.ics.uci.edu/ml/datasets/dow+jones+index

http://pages.swcp.com/stocks/

There's a ton more out there for stocks. Brain scans also seem to have sources:

http://www.oasis-brains.org

http://brain-development.org/ixi-dataset/

A note on brain scan data, you will probably have to do some image processing on those, depending on which method you use. Methods like Principle Component Analysis (PCA) and Cascade classifiers are really popular. For image based applications, definitely also have a look at OpenCV, (available in many different languages, including Python and C++) in conjunction with an ML library.

Generally, the more obscure the problem, the less data there is. I recently did some work on bank strength testing with machine learning techniques. The idea was that we wanted to predict if a bank would fail based on historical data, given some variables. We used a support vector machine model to classify banks as safe or unsafe. Problem is that there isn't a ton of data out there, especially for banks after the financial crisis, and especially for banks that failed. The results were pretty decent all things considered, but it's a good example of where data shortages can occur.

A lot of ML tho is just about formulating the problem in a way that it's solvable through ML techniques. Things like selecting variables, finding good data with those variables, etc. are where a lot of the work is. Then comparing the different methods. ML is really hit or miss though, in that either works really well or just is a terrible idea for any given application. Still really interesting though, one way or another!

No one expects the Spanish Inquisition!

This topic is closed to new replies.

Advertisement