EvilZone

Programming and Scripting => Projects and Discussion => : Huntondoom November 01, 2011, 09:45:39 PM

: Post Grabber
: Huntondoom November 01, 2011, 09:45:39 PM
for a project Im working on right now I need to make a Post grabber, it needs to get the contents of a post (with username etc) in HTML

so far I have this:
: (C#)
string[] Posts = new string[0];
            string Start = "<div class=\"windowbg\">".ToLower();

            for (int S = 0; S < page.Length - Start.Length; ++S)
            {
                string part = page.Substring(S, Start.Length).ToLower();
                if (part==Start)
                {
                    int A = 0;
                    for (int E = S; E < page.Length - 2; ++E)
                    {
                        part = page.Substring(E, 2);
                        if (part.StartsWith("<") & !part.EndsWith("/")) {++A; }
                        if (part == "</") { --A; }
                        if (part == "</" & A == 0)
                        {
                            Posts = AddToArray(page.Substring(S, E - S), Posts);
                            S = E;
                            break;
                        }
                    }
                }
            }
but is doesn't always give me the post or sometimes not enough
: Re: Post Grabber
: Kulverstukas November 01, 2011, 10:25:52 PM
Consider using Regex. Much more elegant and probably faster.
And is that Java?
: Re: Post Grabber
: Huntondoom November 01, 2011, 11:25:39 PM
Consider using Regex. Much more elegant and probably faster.
And is that Java?
Visual C# which would be the mircosoft .NET version of java
: Re: Post Grabber
: Huntondoom November 02, 2011, 03:01:33 PM
but I dont want to be depend on Regex, cause its a suprise and I want it to work on everyones computer (everyone with Windows) without having them to upgrade their .net version
: Re: Post Grabber
: ande November 02, 2011, 03:13:32 PM
but I dont want to be depend on Regex, cause its a suprise and I want it to work on everyones computer (everyone with Windows) without having them to upgrade their .net version

Isn't regex a part of .net framework 2.0?
: Re: Post Grabber
: Huntondoom November 02, 2011, 03:40:08 PM
Isn't regex a part of .net framework 2.0?
dont know, I never use Regex
: Re: Post Grabber
: Kulverstukas November 02, 2011, 03:45:01 PM
dont know, I never use Regex
Well learn it. I did and now I couldn't live without it!
: Re: Post Grabber
: ande November 02, 2011, 03:47:33 PM
dont know, I never use Regex


Okay, found out. Infact, its a part of the 4.0 framework... Which sucks.
: Re: Post Grabber
: Huntondoom November 02, 2011, 07:29:02 PM
I made this so far, it now returns almost every post
: (C#)
string[] Posts = new string[0];
            string Start = "<div class=\"windowbg\">".ToLower();
            string End = "<hr".ToLower();

            for (int S = 0; S < page.Length - Start.Length; ++S)
            {
                string part = page.Substring(S, Start.Length).ToLower();
                if (part==Start)
                {
                    for (int E = S; E < page.Length - End.Length; ++E)
                    {
                        part = page.Substring(E, End.Length).ToLower();
                        if (part==End)
                        {
                            Posts = AddToArray(page.Substring(S, E - S), Posts);
                            S = E;
                            break;
                        }
                    }
                }
            }
            return Posts;
: Re: Post Grabber
: Kulverstukas November 02, 2011, 07:41:03 PM
Kinda fucked up to read :D
When I was doing a client for an online dictionary, I didn't know regex at that time. If you want I can show you the extracting part. It's in Delphi though, but you may still get the algorithm idea.
: Re: Post Grabber
: Huntondoom November 02, 2011, 08:34:46 PM
Kinda fucked up to read :D
When I was doing a client for an online dictionary, I didn't know regex at that time. If you want I can show you the extracting part. It's in Delphi though, but you may still get the algorithm idea.
sure but regex is .net 4.0 so im not going to use it :S
: Re: Post Grabber
: Kulverstukas November 02, 2011, 09:16:10 PM
sure but regex is .net 4.0 so im not going to use it :S
I said it doesn't use Regex.

Here is the code from few years ago :D

I used this to get an explanation for the word.
:
function pavyzdys(yKur : ansistring) : ansistring;
 var Pavyzdys : Record
                 Start,Endd : integer;
                end;
     i,m : integer;
begin
//====
 result := '';
//====
   if AnsiContainsText(yKur,'<p class=''pavyzdys''>') then
    begin
     Pavyzdys.Start   := AnsiPos('<p class=''pavyzdys''>',yKur);
     i := Pavyzdys.Start;
     m := 0;
      repeat
       Inc(i);
       m := m + 1
      until (AnsiPos(Copy(yKur,Pavyzdys.Start,m),'</p>') <> 0);
    end;
//====
  result := Copy(yKur,Pavyzdys.Start,m);
end;