Author Topic: [VB.NET] Extracting specific lines from a website's XML file (using regex)  (Read 4301 times)

0 Members and 1 Guest are viewing this topic.

Offline uNk

  • Knight
  • **
  • Posts: 197
  • Cookies: 9
    • View Profile
The other day IFailStuff was asking me about this, so I made a simple vb.net app to help him extract some specific stuff from a XML file.

On the form, you will need a RichBoxText (resize it and make it big).

Code: [Select]
Imports System.Net, System.Text.RegularExpressions
Public Class Form1
    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim wc As New WebClient 'webclient
        Dim gets As String = wc.DownloadString("http://site.com/sitemap.xml") 'read the XML file from the site, edit this to your own
        Dim stuff As New System.Text.StringBuilder 'string builder
        For Each fail As Match In Regex.Matches(gets, "\<loc\>.*\<\/loc\>") 'getting the specific text in the <loc></loc> tags, you can modify this to your own
            stuff.AppendLine(fail.Value.Replace("<loc>", "").Replace("</loc>", ""))
        Next
        richboxtext1.Text = stuff.ToString 'put the results on a richboxtext
        'richboxtext1.SaveFile("C:/failstuff.txt") 'this is optional, saving the txt file into that location (win7/vista needs admin permission)
    End Sub
End Class

In that case, IFS needed to extract URLs from the "loc" tags, so you can modify this and make it extract what ever you want.

Offline Huntondoom

  • Baron
  • ****
  • Posts: 856
  • Cookies: 17
  • Visual C# programmer
    • View Profile
can you explain me the as match point
because I have never seen it before
Aslong as you are connected to the internet, you'll have no privacy

Advanced Internet Search
Clean Up!

Offline ande

  • Owner
  • Titan
  • *
  • Posts: 2664
  • Cookies: 256
    • View Profile
can you explain me the as match point
because I have never seen it before

"Match" is a part of the regex class, and used in a for-each loop you can say for each match in matches (...). A for-each loop is simply a way of getting values out of a array of anything. Its the same as saying;

Code: [Select]
Dim MyStringArray() as string = "a b c d".split(" ")
For Each InnerString as string in MyStringArray
'InnerString will be a, then b, then c, then d
next

So in theory you can think of the "fail" variable of type match as a string and the matches as type string()(array).
« Last Edit: April 25, 2011, 07:04:09 pm by ande »
if($statement) { unless(!$statement) { // Very sure } }
https://evilzone.org/?hack=true

Offline Huntondoom

  • Baron
  • ****
  • Posts: 856
  • Cookies: 17
  • Visual C# programmer
    • View Profile
"Match" is a part of the regex class, and used in a for-each loop you can say for each match in matches (...). A for-each loop is simply a way of getting values out of a array of anything. Its the same as saying;

Code: [Select]
Dim MyStringArray() as string = "a b c d".split(" ")
For Each InnerString as string in MyStringArray
'InnerString will be a, then b, then c, then d
next

So in theory you can think of the "fail" variable of type match as a string and the matches as type string()(array).
so it gets anything that matches out of a array or String?
and by using the * between the <Stuff> </Stuff> you could extract anything from it?
Aslong as you are connected to the internet, you'll have no privacy

Advanced Internet Search
Clean Up!

Offline ande

  • Owner
  • Titan
  • *
  • Posts: 2664
  • Cookies: 256
    • View Profile
so it gets anything that matches out of a array or String?
and by using the * between the <Stuff> </Stuff> you could extract anything from it?

Nah, not quiet. The "Regex.Matches(gets, "\<loc\>.*\<\/loc\>")" is a inline function. It will return a array of type match. Therefore you can say for each match in matches(Where matches is simply an array of match). The regex class is a pattern match and return class, very neat.
« Last Edit: April 25, 2011, 07:28:38 pm by ande »
if($statement) { unless(!$statement) { // Very sure } }
https://evilzone.org/?hack=true

Offline uNk

  • Knight
  • **
  • Posts: 197
  • Cookies: 9
    • View Profile
can you explain me the as match point
because I have never seen it before

Ande explained it in a very advanced way, simply if you use "for each" function the outcome of the code would be more than 1 result, now I've used the "fail" variable to identify the matches in the continuation of the part. About the syntax "as match in Regex.matches" is to match the specified variable and identify it in here:

(fail.Value.Replace("<loc>", "").Replace("</loc>", "")).

EDIT:

Fixed my typing errors, lot of stuff going on in my head right now.
« Last Edit: April 25, 2011, 08:02:18 pm by uNk »

Offline Huntondoom

  • Baron
  • ****
  • Posts: 856
  • Cookies: 17
  • Visual C# programmer
    • View Profile
Thank you all! +1
Aslong as you are connected to the internet, you'll have no privacy

Advanced Internet Search
Clean Up!