Herro folks. Thought I would post a simple tutorial on grabbing images from your webcam with python.
This is Windows specific, sorry. I will try and get a Linux equivalent when I get around to getting OpenCV up and going. Also, like with all my tutorials, sorry for all the \n, not my fault el oh el.
What will we be doing?Grab an image from the webcam and display it with Pygame, using VideoCapture.
Sampling multiple images and creating a stream of what the webcam sees.
Some examples of what can be done to images and the image stream via Python Image Lib(PIL).
Learn how to use ctypes to use DLLs inside python to do facial recognition.
Required Library'sVideoCapturePygamePILFacial recognition dllPsyco (optional)
Isn't this on your blog?Kinda. I have examples of all the above, and one particular shows how to use the dll. This will be a bit more extensive, and will put all examples in one place. Also, lurkers and people sent by google will have never seen my blog, and this adds to the overall quality of EZ.
Installing the LibsGo to the respective sites and download the modules/libs.
Some, like pygame, have a binary you can use to install from
but I recommend using the
python setup.py install
method. Psyco is optional and is only used to help speed the execution time up since we are working with video feedback.
Getting VideoCapture workingAfter VideoCapture is installed, getting images takes 3 lines of code:
import VideoCapture
webcam = VideoCapture.Device()
im = webcam.saveSnapshot('new_image.jpg')
It is as simple as that. If you have multiple cams on your computer, you might need to specify the device by passing an int to Device(). Such as Device(1), or Device(2).
To make things easier you can import Device along with the VideoCapture lib like this:
from VideoCapture import Device
and call just call Device() by itself.
You might get a black image back, and we will now address this issue if it happens to you.
Getting the image happens really fast, so your webcam might be a touch slow and not initialize in time to get the image. To get around this issue, we will introduce a new method:
from VideoCapture import Device
webcam = Device()
for i in range(1, 25):
webcam.getImage()
webcam.saveSnapshot("new_image.jpg")
getImage() method grabs an image from the webcam, and stores it in memory. We run it through a for loop so it has time to initialize the webcam. This wont be needed once we put this in a "live" view mode because getImage is going be getting called a lot.
Now that we can get images from our cam, time to move on.
Getting the "live" feedNext I'm going to introduce pygame. We'll be using it to display the images so we can see what's going on. Pygame is very extensive and widely popular. It uses OpenGL and is used by many.
import pygame
from pygame.locals import *
from VideoCapture import Device
webcam = Device()
pygame.init()
screen = pygame.display.set_mode((640,480))
screen.display.set_caption('Herro Webcam')
while 1:
for event in pygame.event.get():
if event.type == pygame.QUIT: sys.exit()
im = webcam.getImage()
imblit = pygame.image.frombuffer(im.tostring(), (640,480), "RGB")
screen.blit(imblit, (0,0))
pygame.display.flip()
If you have never used pygame before, I'm going to explain a few things.
First we import all that we need. Then we initialize pygame, set display size, and add the window title. Then we grab event data from pygame, and if you couldn't tell this is for anyone who clicks the "x" to close the window. If the event handler wasn't there, your program might hang up when you try to exit. Next we grab the image and store it in memory. The here is the interesting part, pygame requires a canvas from that's from pygame itself. So we have to convert it to something pygame can use. Then we blit it to the screen created by pygame. Blitting puts what's going on the screen into a buffer, and pygame.display.flip() is what actually puts it on the screen. Rember, display.flip() is called AFTER you take care of blitting everything you want the user to see. If you blit after a display.flip and don't flip again, it's not going to show.
Now you have a live stream of your webcam. As you can see, it's not really a
stream since we are taking images very quick. It appears to be live, because of the "flip-book" effect. You know, you draw a smiley face on a bunch of pages of your school book, and each page is a bit different, so when you flip through it quick enough, it looks like the happy fucker is getting shot in the face.
I think you get the point, NEXT...
Lets take the red PILNow that we have images to work with, let us do something with them.
from VideoCapture import Device
import pygame
from pygame.locals import *
import Image
webcam = Device()
res = webcam.getImage()
res = res.size
pygame.init()
screen = pygame.display.set_mode(res)
screen.display.set_caption("Take the Red PIL")
while 1:
for event in pygame.event.get():
if event.type == pygame.QUIT: sys.exit()
im = webcam.getImage()
iml = im.load()
for x in range(0,res[0]):
for y in range(0,res[1]):
if iml[x, y] > (50, 50, 50):
iml[x, y] = [255, 0, 0]
imblit = pygame.image.frombuffer(im.tostring(), res, "RGB")
screen.blit(imblit, (0,0))
pygame.display.flip()
Okay, this looks a little different from what we have been doing, a bit to explain here. We'll start with the im.load(), if you take a look
here you see it loads image data. It suggests we don't really need it because PIL already allocated memory space for pixel data from images loaded through PIL. Well, we didn't load the image via PIL, it came from VideoCapture, so we create a buffer for pixel data for PIL so we can operate on it. Pixel data is stored in a 2D array, 1 for X pixels, and 1 for Y pixels. We also introduced the "size" to get the resolution of the image, which is also a 2D array of pixel data. Hence using res is used for setting the display size and using it to access the pixel data. To know the rest you will need to know
RGB values. It compares the values of the image we have, and sets new data to im because we loaded it to memory via load(). Then it displays with pygame with the methods mentioned above.
As you can see, PIL can be very powerful. Use it for steganography, or mixing your own filters and effects. I know by the creative arts section here on EZ, PIL can teach you how them filters and effects are done with Gimp and Photoshop. Go code your own, and learn the ins and outs of image processing. Hack images beyond comprehension.
Onward!!!
Get some face time.Okay, this is where the noobs stop. You will need prior knowledge in C/C++ and at least a basic understanding of pointers and shit.
I developed this wrapper on my own, and is kinda a hack job. Go read up on Ctypes to understand the following. First, the example provided by the creator of the DLL we're using.
#include "windows.h"
#include "loadbmp.h" // from http://gpwiki.org/index.php/LoadBMPCpp
#include "fdlib.h"
void main(int argc, char *argv[])
{
int i, n, x[256], y[256], size[256], w, h, threshold;
BMPImg *bi;
unsigned char *bgrdata, *graydata;
if (argc==1)
{
printf("usage: fdtest bmpfilename [threshold]\n");
exit(0);
}
bi = new BMPImg();
printf("\nloading %s\n", argv[1]);
bi->Load(argv[1]);
w = bi->GetWidth();
h = bi->GetHeight();
printf("image is %dx%d pixels\n", w, h);
bgrdata = bi->GetImg();
graydata = new unsigned char[w*h];
for (i=0; i
{
graydata[i] = (unsigned char) ((.11*bgrdata[3*i] + .59*bgrdata[3*i+1] + .3*bgrdata[3*i+2]));
//if (i<10) printf("%d ", graydata[i]);
}
threshold = argc>2 ? atoi(argv[2]) : 0;
printf("detecting with threshold = %d\n", threshold);
fdlib_detectfaces(graydata, w, h, threshold);
n = fdlib_getndetections();
if (n==1)
printf("%d face found\n", n);
else
printf("%d faces found\n", n);
for (i=0; i
{
fdlib_getdetection(i, x+i, y+i, size+i);
printf("x:%d y:%d size:%d\n", x[i], y[i], size[i]);
}
delete[] graydata;
delete bi;
}
About the only thing we'll be doing differently with Python is the graydata. We are going to use PIL for that.
This does come from my blog directly(almost) because I do not believe in doing things more than once. Unless it needs optimization or some hacking. Which feel free to do both with the following.
from VideoCapture import Device
from ctypes import *
import Image, ImageDraw, os, time, pygame, sys
from pygame.locals import *
from psyco import full
full()
fd = cdll.LoadLibrary("fdlib.dll")
#assignments
cam = Device()
pygame.init()
screen = pygame.display.set_mode((640,480))
pygame.display.set_caption('Facial Recognition')
font = pygame.font.SysFont("Curier",26)
pixX = 0
pixY = 0
pixList = []
w = c_int(640)
h = c_int(480)
threshold = c_int(0)#raise this number for more accuracy, and possibly less detection
x = c_int * 256
y = c_int * 256
size = c_int * 256
x = x()
y = y()
size = size()
graydata = c_ubyte * (640*480)
graydata = graydata()
fps = 25.0
while 1:
for event in pygame.event.get():
if event.type == pygame.QUIT: sys.exit()
im = cam.getImage()
draw = ImageDraw.Draw(im)
img = im.convert("L") #convert to grayscale
imd = list(img.getdata()) #graydata needed
cnt = 0
pixX = 0
pixY = 0
cnt = 0
for pix in imd: #Convert python data types to ctypes
graydata[cnt] = imd[cnt]
cnt+=1
fd.fdlib_detectfaces(byref(graydata), w, h, threshold)
n = fd.fdlib_getndetections() #number of faces
i = 0
while i < n:
fd.fdlib_getdetection(c_int(i), x, y, size)
bBoxTres = size[0]/2
draw.rectangle([(x[0]+bBoxTres,y[0]+bBoxTres),(x[0]-bBoxTres,y[0]-bBoxTres)], outline=224)
i += 1
faceNumber = font.render('Number of Faces: '+str(n), True, (46,224,1))
imn = pygame.image.frombuffer(im.tostring(), (640,480), "RGB")
screen.blit(imn, (0,0))
screen.blit(faceNumber,(0,0))
pygame.display.flip()
pygame.time.delay(int(1000 * 1.0/fps))
New stuff here. First we use ctypes to load the provided DLL, then create the datatypes we need. Grab and image and convert it to greyscale, pass the data to the DLL, get the detection's. Blit to the screen with some added data, and a method to keep track of framerate.
I'm not going to go over a whole lot on this last bit because it's not really for people without prior experience with doing C/C++ in python. If you look closely at the first snippet and compare it the
wrapper you will see what is going on. The only thing that might throw you off is the math involved with the drawing of the red square on the detected face. Just know the draw.rectangle() needs top left x,y point and a bottom right x,y point which comes from the fdlib_getndetections(). The math will come to you when you see whats going on when you run it.
FINWell, I hope I reached a wider audience than just noobs or just not noobs? A bit for everyone eh?
I did this on image processing because I have prior experiences with it, and it is something you can actually
see what you are doing. Which is what drew me to it in the first place. And since links are almost the same color as the other text, you'll need to dig around in the tut to find more info on what has been done. Also if framerate is still lacking after psyco, instead of a while loop, use a for loop with a stepping iteration or equivalent. Happy hacking!
Hope you enjoyed the read and learned something.
report any tracebacks you get and I will supply a fix when I have access to a Windows machine. Also, admins insist on the color coded code tags, just highlight the code and you will see the code in full. I would have used regular code tags, but the color highlighting feature is nice, just the colors can be hard to read on the background. Enjoy.