Author Topic: Filtering written text. (Read 799 times)

proxx · « **on:** September 15, 2013, 07:41:12 pm »

Hello EZ,

Im working on a tool that I might post here once its done.
Point is Im experiencing some diffuculty.
I need to extract written text from big blobs of data.

Its proving to be more difficult than I thought it would be.
Currently I filter on things like AaAaAa for example, this wouldnt indicate normal written text.
Also things like AAAaaAA etc.

If percentage vowels is below 30% im not considering it to be written text either.

So far its sorta working out as Im filtering a lot of garbage but I need some ideas to make it more accurate.
Im still trying a couple other things but I would like some input from you guys.
Anyone ?

And I will probably be running into performance issues later on as it should be capable of sucking vast amounts in a short period.

EvilZone

News:

Author Topic: Filtering written text. (Read 799 times)

proxx

Filtering written text.