Hello EZ,
Im working on a tool that I might post here once its done.
Point is Im experiencing some diffuculty.
I need to extract written text from big blobs of data.
Its proving to be more difficult than I thought it would be.
Currently I filter on things like AaAaAa for example, this wouldnt indicate normal written text.
Also things like AAAaaAA etc.
If percentage vowels is below 30% im not considering it to be written text either.
So far its sorta working out as Im filtering a lot of garbage but I need some ideas to make it more accurate.
Im still trying a couple other things but I would like some input from you guys.
Anyone ?
And I will probably be running into performance issues later on as it should be capable of sucking vast amounts in a short period.