Author Topic: Filtering written text.  (Read 646 times)

0 Members and 1 Guest are viewing this topic.

Offline proxx

  • Avatarception
  • Global Moderator
  • Titan
  • *
  • Posts: 2803
  • Cookies: 256
  • ФФФ
    • View Profile
Filtering written text.
« on: September 15, 2013, 07:41:12 pm »
Hello EZ,

Im working on a tool that I might post here once its done.
Point is Im experiencing some diffuculty.
I need to extract written text from big blobs of data.

Its proving to be more difficult than I thought it would be.
Currently I filter on things like AaAaAa for example, this wouldnt indicate normal written text.
Also things like AAAaaAA etc.

If percentage vowels is below 30% im not considering it to be written text either.

So far its sorta working out as Im filtering a lot of garbage but I need some ideas to make it more accurate.
Im still trying a couple other things but I would like some input from you guys.
Anyone ?

And I will probably be running into performance issues later on as it should be capable of sucking vast amounts in a short period.
« Last Edit: September 15, 2013, 07:42:44 pm by proxx »
Wtf where you thinking with that signature? - Phage.
This was another little experiment *evillaughter - Proxx.
Evilception... - Phage