classification with true/false classes

Started by
2 comments, last by Vorpy 16 years, 5 months ago
hello I got to pre-process the log I explained on my other thread, so now I have a much simpler classification problem to solve I'll try to explain simplifing the collect data: I have information of every time of day (Day,Hour,Minute,Second) when a light switch was toogle, ie: 1,10,25,52 1,11,15,30 1,22,53,04 1,23,55,04 2,22,50,06 2,23,57,10 now I want to make rules based on those data. For the example above, a good set of rules would be: Hour=10, Minute=25, Second=52 Hour=11, Minute=15, Second=30 Hour=22, 50<=Minute<=53 Hour=23, 53<=Minute<=57 This last 2 was simplified because similar events happened on different days (of couse a real data log will have many more days/events) I belive a Decision Tree could help with this problem, but... I don't know how to deal with the "I just have information on when the switch was toogle". I mean, if this is a CLASSIFICATION problem, I must have something like 2 classes: toogle, not toogle but would I need to transform the above example on something like that? 1,10,25,52,true 1,11,15,30,true 1,22,53,04,true 1,23,55,04,true 2,22,50,06,true 2,23,57,10,true and... 1,0,0,0,false 1,0,0,1,false 1,0,0,2,false 1,0,0,3,false 1,0,0,4,false 1,0,0,5,false ... (I mean, I'd need to create "not toogle" events for every other possibility?) thanks, Luis Fernando
Advertisement
Instead of learning when to toggle the switch you could try learning when the light should be on and when it should be off. Each node of the decision tree can be a boundary between light and dark, ie "if time < x then off, else on" as a leaf node.

Or is the time of the toggle more important than the states existing before and after the toggle? If you want to learn the times the switch is toggled, you could try a clustering algorithm like k-means. The times when the switch should be toggled will correspond to clusters with a large number of members.

You're also turning what is essentially a 1 dimensional problem into a 4 dimensional problem. The four inputs are just different amounts of time. Or do you expect that the switch toggling might take place around the 30 second mark during random times of the day, or something similiar where a smaller unit becomes more important than a larger unit?
in fact I tried to simplify the problem, there's much more data like in which room the user was before, when it enter/exit it, how long it stay on it (exit - enter), how much time after entering he toogle, how much time before exiting he toogle, if there was someone else on the room, etc
The same idea can still work. I find it helpful to try to visualize the input space. Each different input is a different dimension in the input space. Some dimensions might have a finite domain; others might be conceptually infinite or at least very large. A decision tree is really a binary space partition of the input space, and each leaf ends up with its own values describing the output (this could include probabilities instead of a simple true/false, possibly useful if part of the space is evenly divided between input and output, or could also have information about how many samples were in this space). A decision tree that only considers one dimension of the input at each node is like a BSP tree with axis-aligned divisions.

This topic is closed to new replies.

Advertisement