Author Topic: [Python] EvilZone Recent Posts Fetcher  (Read 1762 times)

0 Members and 1 Guest are viewing this topic.

Offline Matriplex

  • Knight
  • **
  • Posts: 323
  • Cookies: 66
  • Java
    • View Profile
[Python] EvilZone Recent Posts Fetcher
« on: December 23, 2013, 10:14:57 pm »
I got a little bored this afternoon and decided to learn a bit of regex. Then I decided to see if I could fetch the most recent posts section from good ol' ez using said method. I guess it worked.

It prints out the topic, who posted, and in which section. Then below, it prints the url. If you are running *nix you should be able to right click and go to url which would obviously open it in your browser.

Example:

Code: [Select]
Re: How to find the owner name and details of a mobile number? by boriswc (General discussion)
http://evilzone.org/general-discussion/how-to-find-the-owner-name-and-details-of-a-mobile-number/msg73088/

Re: Best free proxy? by boriswc (Hacking and Security)
http://evilzone.org/hacking-and-security/best-free-proxy/msg73087/

Feel free to remove the colors if you they hurt your eyes. Do so by just removing the red.format() and blue.format()'s

Code: (Python) [Select]
import urllib
import urllib2
import re

red = "\033[01;31m{0}\033[00m"
blue = "\033[1;36m{0}\033[00m"

url = ('http://www.evilzone.org/')
sock = urllib.urlopen(url)
ch = sock.read()
sock.close()

x = ch.find('<dl id="ic_recentposts" class="middletext">')

patingr = re.compile('<strong><a.+?>.+?</a></strong> by <a.+?>.+?</a> \(<a.+?>.+?</a>\)')

list =  patingr.findall(ch, x)

print

for s in list:
  print blue.format(re.sub('<[^>]*>', '', s)) 
  url = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', s)
  for l in url:
    if not "profile" in l and ";PHPSESSID" in l:
      print red.format(''.join((l.split(';', 1)[0]).split('?topicseen', 1)))
      print

Also, I am relatively new to regex so keep that in mind. I know using .+? is probably not the most efficient way, but it gets the job done. However I would very much like to know if you guys would do it a different way.

Enjoy
« Last Edit: December 26, 2013, 04:02:27 pm by RedBullAddicted »
\x64\x6F\x75\x65\x76\x65\x6E\x00

Offline proxx

  • Avatarception
  • Global Moderator
  • Titan
  • *
  • Posts: 2803
  • Cookies: 256
  • ФФФ
    • View Profile
Re: [Source] EvilZone Recent Posts Fetcher
« Reply #1 on: December 23, 2013, 11:09:13 pm »
I have done something similar a while ago because I wanted to feed conky with latest topics etc.
Since a new version of the board will be coming out and an API will probably be implemented I didnt bother do anything else on the code.
But guess the mechanism is pretty much the same.
« Last Edit: December 23, 2013, 11:09:39 pm by proxx »
Wtf where you thinking with that signature? - Phage.
This was another little experiment *evillaughter - Proxx.
Evilception... - Phage

Offline Matriplex

  • Knight
  • **
  • Posts: 323
  • Cookies: 66
  • Java
    • View Profile
Re: [Source] EvilZone Recent Posts Fetcher
« Reply #2 on: December 23, 2013, 11:11:19 pm »
Since a new version of the board will be coming out and an API will probably be implemented I didnt bother do anything else on the code.

Hm, I forgot about the new version. I guess I'll just update the code when it does come out.
And an API? Why didn't I hear about this?? My day has been made.
« Last Edit: December 23, 2013, 11:11:30 pm by Matriplex »
\x64\x6F\x75\x65\x76\x65\x6E\x00

Offline proxx

  • Avatarception
  • Global Moderator
  • Titan
  • *
  • Posts: 2803
  • Cookies: 256
  • ФФФ
    • View Profile
Re: [Source] EvilZone Recent Posts Fetcher
« Reply #3 on: December 24, 2013, 07:04:08 am »
my best guess would be that this is python? doesnt actually state that anywhere though?
Its pretty obvious isn't it.
Wtf where you thinking with that signature? - Phage.
This was another little experiment *evillaughter - Proxx.
Evilception... - Phage

Offline Kulverstukas

  • Administrator
  • Zeus
  • *
  • Posts: 6627
  • Cookies: 542
  • Fascist dictator
    • View Profile
    • My blog
Re: [Source] EvilZone Recent Posts Fetcher
« Reply #4 on: December 24, 2013, 10:01:34 am »
This is weird. I don't see where it logs in?
Also wouldn't it be easier to parse RSS feeds? :)

Offline proxx

  • Avatarception
  • Global Moderator
  • Titan
  • *
  • Posts: 2803
  • Cookies: 256
  • ФФФ
    • View Profile
Re: [Source] EvilZone Recent Posts Fetcher
« Reply #5 on: December 24, 2013, 10:15:41 am »
This is weird. I don't see where it logs in?
Also wouldn't it be easier to parse RSS feeds? :)

Recent posts and unread posts are 2 different things, recent does not require login.
RSS feed is indeed much easier than HTML parsing, man I hate that.
Wtf where you thinking with that signature? - Phage.
This was another little experiment *evillaughter - Proxx.
Evilception... - Phage

Offline proxx

  • Avatarception
  • Global Moderator
  • Titan
  • *
  • Posts: 2803
  • Cookies: 256
  • ФФФ
    • View Profile
Re: [Source] EvilZone Recent Posts Fetcher
« Reply #6 on: December 24, 2013, 04:12:00 pm »
Well, im not a python coder. Its just obviously not PHP or Perl. doesnt leave many options as far as popular scripting languages go... but there are potentially other niche languages i wouldnt know as well... so...

Well its one of the few languages that uses forced indention instead of brackets and stuff.
Wtf where you thinking with that signature? - Phage.
This was another little experiment *evillaughter - Proxx.
Evilception... - Phage

Offline Matriplex

  • Knight
  • **
  • Posts: 323
  • Cookies: 66
  • Java
    • View Profile
Re: [Source] EvilZone Recent Posts Fetcher
« Reply #7 on: December 24, 2013, 05:44:07 pm »
This is weird. I don't see where it logs in?
Also wouldn't it be easier to parse RSS feeds? :)

I wanted to see if I could parse the HTML using regex, just for a learning experience :)
Suppose it would be easier to parse RSS feeds, but as I said, learning experience.

Well, im not a python coder. Its just obviously not PHP or Perl. doesnt leave many options as far as popular scripting languages go... but there are potentially other niche languages i wouldnt know as well... so...

No worries, not everybody knows a language at first glance. Next time, look at some methods or functions that are called that look like they are language specific, e.g; import urllib2, and research that one line on Google. You should be able to find out pretty fast :)
« Last Edit: December 24, 2013, 05:46:01 pm by Matriplex »
\x64\x6F\x75\x65\x76\x65\x6E\x00

Offline Kulverstukas

  • Administrator
  • Zeus
  • *
  • Posts: 6627
  • Cookies: 542
  • Fascist dictator
    • View Profile
    • My blog
Re: [Source] EvilZone Recent Posts Fetcher
« Reply #8 on: December 24, 2013, 07:10:14 pm »
I wanted to see if I could parse the HTML using regex, just for a learning experience :)
Read this if you haven't already: http://stackoverflow.com/a/1732454/1552152
Also try this next time: http://www.crummy.com/software/BeautifulSoup/

For extracting tags and very simple HTML parsing Regex is fine, but if you want to do complex stuff, consider a proper HTML parsing lib.