Monday, April 25, 2011

Update on the 3 todo's

1) I started working on the final code. Through www.dreamspark.com I was able to get Visual Studio Pro for free (cheer for being a student). Progress has been slow so far as I have never coded a windows application before, have never used C#, ect. But I am learning lots!

2) I started collecting data for Os, and finished labeling up data for Xs. For Xs I also mirrored all the data I have to effectively double the data, leaving me with 400 positive and currently 1600 negative (might add more if I need to bootstrap more).



3) I decided to go all out with my data collection... and am using 100,000 random comparisons. Progress has been slow because a lot of the scripts/what I was using before doesn't work so well with so much data... it just crashes (and not very gracefully). I was at the stage to do the machine learning point when one of my sticks of ram died (down to 8 gigs from 12), so that is also going to put a hinder on things. At this point though I am really curious to how much better if at all using so much data will work, as compared to only 3000 original data points. Hopefully I will know soon!

Sunday, April 17, 2011

This Week And Beyonds Todo List

1) Start working on the final code. Figure out what I need, if I'm going to do it on Windows/Linux, what libaries I will need, ect. I believe that I am in a good enough position for detecting an X from a non X, that it is time to move on from that sole problem. I can do somethings with real time data when detecting an X (like an average if there is an X over multiple frames), that is not possible when just detecting an X in a single image.


2) Collect data for Os and finish labeling data for Xs.


3) Out of the 3000 random pixel comparisions I extract and pass in for detecing a X compared to a non X, in the current ADT only 31 are used. Wut.

So what this means is that I can use 10,000, or mabye even 100,000 random comparisions, have jboost crunch numbers over night, and then just tell me what are the most useful comparisions. I can then extract what are the useful comparisions from the random values and then only pass in those and have extremely fast, extremely accurate code. When checking for an X there will be no need to extract all 3000 comparision, pass them all in, and then ONLY have 31 be used.

With the current 3000 random pixel comparisions the results are pretty good, but there are probably some comparisions that are better. By using 10,000 or 100,000 comparisions these better comparisions will be picked out and give even better results.

More Progress

This last week I tried a couple of things. First was instead of trying the comparision of lots of random points, I tried comparing random lines. This, didn't not work anywhere near as well as the random points did. The first major setback was how slow it was compared to random points. Maybe this was just because it was done in matlab, but comparing 100 random lines was slower than 4000 random points (and I usually don't use 4000 points). And with 100 random lines, there was not enough data for effective detection.

After that I went with another suggestion from class, and that was the idea of background subtraction. Using Kinect gives a unique feature that you don't get from RGB data, that is you can make assumtions about what is the background and then get rid of it. So I modified my image converting script to do background subtraction.

Example of image with background subtraction. The background is gone! (and replaced with black)

Also collected a lot more data labeling negitives and positives that I believed would help my results the most. The most recent results where ran with 120 positive and 600 negitive labels. Some of the test images included Xs that WERE NOT in the positive training examples, and Xs where still detected just as well as the Xs that were in the training data. Results will be in a future post.

Monday, April 11, 2011

Results!

So, results. So far the best way to classify Xs has been by:

Step 1) Generate a bunch of random numbers. I did 40000, which would allow for 10000 comparisons in a single image (x and y or each point, two points for one comparison). These numbers are between 0 and 1, and represent a percentage, thus to get the pixel you take the percentage value and times it by the width or height of the image/section you are looking at.

Step 2) Compare the darkness of the pairs of pixels, label it as 1, 0, or -1 depending on how they compare.

Step 3) Repeat. Alot.

Step 4) Allow jboost to do its magic.


For the results below I only used 2 thousand random comparisons.

Results 1

As you can see was scanning regions too big. Even with this fairly big error, the results were surprisingly good. There are two regions that it constantly detects as a positive that is clearly not. The crotch of the person and the random wall segment to the person's bottom right. For the next round of training more negative examples will be passed in of these regions to hopefully correct this issue.


Results 2


Scanning a smaller region, and for whatever reason had worse results as when compared to the bigger region, but still fairly good. Also, in both results sets it was better at detecting my girlfriends Xs than mine. Not fair.


Next things todo:
1) More data, I have a feeling that with 200 positive and 1000 negatives (as compared to the 100 positive 300 negatives for these results) that my results will be significantly better, but I will have to see. This is partially due to that I will focus more training on specific regions that I see need more help

2) Instead of just two points try on a line (top half - bottom half).

3) Starting working on making it real time in C/C#/C++ and with real time data off of the Kinect.



I tried graphing out relevant data but couldn't get any graphs that showed anything useful.

TODO List: Completed

1) Data gotten from Kinect is displayed and interpreted in a MUCH more consistent manner. The closer an object is to the Kinect, the lighter it is, the farther away the darker.

Example:

After collecting the data from Kinect, converted the image to HSV colorspace, set the Hue data as the Value data, and set Hue and Saturation as 0. The result: it makes some things visible that were invisible before.

------>






2) More data has been collected! The results in the next post were done with 100 positive and 300 negative examples. Slowly working my way up, think I am going to go towards 200 positive and 1000 negative (currently at 100/415). With 100 positive examples still don't feel like I have enough positive data.


3) To label positive examples I cropped out the regions from the top of the fingers (which ever finger was higher) to the bottom of the wrists.

Examples:


4) Tried multiple things. Well explain more the next post (the results post). The best working way (so far) has been to simply compare the darkness of two random pixels, then label this relationship as a 1, or -1 depending on which pixel is darker, or if they are the same then 0.

Wednesday, April 6, 2011

TODO

1) Figure out a better way to parse the data from Kinect in a more consistent fashion.

2) Get more, better data.

Microsoft uses a million images for three trees. (http://research.microsoft.com/pubs/145347/BodyPartRecognition.pdf)

Go up to 100 positive and 500 negative examples for now.

3) Better, more consistent way of labeling positive examples. Probably something like:
or



4) Try random lines, try Viola-Jones style, try other things!

Sunday, April 3, 2011

Machines learning what an X is

Today I worked on attempting to detect a user creating an 'X' with their hands/arms solely based on the depth data from the Kinect camera. I went through about 3 or 4 iterations till what I am at none, and unfortunately none of them have had much success :(.

Essentially I was feeding in positive and negative samples of an 'X', labeling them as such, and then trying to figure out a good way to classify these samples such that in the future a computer would be good at discovering them.

Examples (first 3 positive, last 3 are negative):








At first I was just doing general histograms of the value part (of the image in HSV format), without much success at all, as this had no relation to the location of the values. Several iterations later and I was breaking down the positive and negatives images into 3x3 grids and doing average values of value in relation to where they were located with some minor success...





Yah.... going to need to do something different.

The future....

Just a quick mock up that I created a few days ago of what it will somewhat look-like when its done....