mn8 Language Reference | Index    

HTMLPage

SUMMARY: ATTRIBUTES SUMMARY  ELEMENTS SUMMARY  CONSTRUCTORS SUMMARY  NO OPERATORS  METHODS SUMMARYDETAIL: ATTRIBUTE DETAILS  ELEMENT DETAILS  CONSTRUCTOR DETAILS  NO OPERATORS  METHOD DETAILS

Description

This is the concept which will be returned by all functions (FROM) which have as main purpose the document retrieval and the documents happens to be HTML pages. The content of the HTML page will be available as a Stream stored in the content element of the Concept. The concept does not alter in this way the source of the HTML Page. It also allows the extraction of one ore more pieces of the page using regular expressions through the select method provided by the String concept. Once the desired HTML pieces are extracted it can be transformed in a Series of Tags through the Tag/getTags method. It also provides a method to return the content of the page directly as a Series of tags through the getTags method.

Usage

        $page = HTMLPage.create("http://www.nolimits.ro")
        PRINT $page@url
            http://www.nolimits.ro

        PRINT $page/getLinks
            http://www.nolimits.ro/corpinfo_en.shtml
            http://www.nolimits.ro/documents/index.shtml
            http://www.nolimits.ro/product_en.shtml
            http://www.nolimits.ro/sl_en.shtml
            http://www.nolimits.ro/news_en.shtml
            http://www.nolimits.ro/contact_en.shtml
            http://www.nolimits.ro/search_en.shtml
            http://www.nolimits.ro/forum/
            http://www.nolimits.ro/help_en.shtml
            http://www.nolimits.ro/ro/
            http://www.nolimits.ro/
    

This example will show how you can loggin in into a webpage using forms and cookies. (I suppose you have already completed the needed form, if not, see the examples in HTMLForm)

        $url = "http://192.168.1.22/oursite/"

        # Store the HTMLPage from URL into the $page
        # We need this step only to obtain and store the cookies.
        # You will not loggin in with this step !!!
        $page from $url + "index.php"

        # Saving cookies
        $cookies = $page.getCookies
        each $i in $cookies do [
          $i.storeCookie 
        ]

        # creating a simplex expression
        $expr = Simplex.create( $url + "*" )

        # collecting the stored forms using the simplex
        # We need to use only the first form
        $form = HTMLForm.getStoredForms($expr)/1

        # applying the form (with POST method) it will loggin in the script
        HTMLPage.create( $form )

        # finally We can get the informations from URL, we are already logged in
        $page from $url + "index.php"

    

Version: 0.1
Authors:Remus Pereni (http://neuro.nolimits.ro)
Inherits: Concept

Attribute List

 @contentType TYPEOF String LABEL "contentType"
 @lastModified TYPEOF Integer LABEL "lastModified"
 @length TYPEOF Integer LABEL "length"
 @title TYPEOF integer LABEL "title"
 @url TYPEOF String LABEL "url"
top

Element List

 content TYPEOF String LABEL "content"
top

Constructor List

create (String $url)
create (HTMLForm $form)
top

Method List

StringgetContent
SeriesgetCookies
SeriesgetForms
MapgetHeaders
SeriesgetLinks
SeriesgetTags
SeriesgetTagsWithText
SeriesgetURIs
IntegergetResponseCode
StringgetResponseMessage
static setFollowRedirects (Logical $bol)
static LogicalisFollowRedirects
top
Methods inherited from: Concept
cloneConcept, extendsConcept, fromXML, getAllInheritedConcepts, getConceptAttribute, getConceptAttributeField, getConceptAttributeFields, getConceptAttributes, getConceptConstructors, getConceptElement, getConceptElementField, getConceptElementFields, getConceptElements, getConceptLabel, getConceptMethod, getConceptMethods, getConceptOperators, getConceptType, getConceptsAtPath, getErrorHandler, getInheritedConcepts, hasConceptAttribute, hasConceptElement, hasConceptMethod, hasPath, isHidden, loadContent, setConceptLabel, setErrorHandler, setHidden, setShowEmpty, showEmpty, toTXT, toXML

Detailed Attribute Info

@contentType

Label:contentType
Type:String
Is Static:false
Is Hidden:false
Show Empty:true

Contains the type of the content of this page which is the value of the content-type header field.

top

@lastModified

Label:lastModified
Type:Integer
Is Static:false
Is Hidden:false
Show Empty:true

This is an Integer representing the time the file was last modified, measured in milliseconds since the epoch (00:00:00 GMT, January 1, 1970)

top

@length

Label:length
Type:Integer
Is Static:false
Is Hidden:false
Show Empty:true

Contains the length of this page.

top

@title

Label:title
Type:integer
Is Static:false
Is Hidden:false
Show Empty:true

Contains the Title of this document, extracted between the <title>... </title> tags of the HTML page.

top

@url

Label:url
Type:String
Is Static:false
Is Hidden:false
Show Empty:true

Contains the URL of the document reproduced in this HTMLPage concept.

top

Detailed Element Info

content

Label:content
Type:String
Is Static:false
Is Hidden:false
Is Multi:false
Show Empty:true

Contains a stream with the actual page content. This is the primary way of keeping the content of the page. The reason for this is that on the stream can be applied diverse regular expressions to extract the relevant pieces of the page, and then to apply the selected pieces to a Tag constructor for further processing.

top

Detailed Constructor Info

create (String $url)
Parameters:
$url :The URL from which this page will be constructed.
Exceptions:
badURL :
(Error)
If the URL is not valid.
httpOperationFailed :
(Error)
If can't get HTML page handler or page content.

Constructor which will produce a HTMLPage concept from the URL given as parameter. The concept produced in this way will have no headers.

top
create (HTMLForm $form)
Parameters:
$form :The HTMLForm from which this page will be constructed.
Exceptions:
badURL :
(Error)
If the URL isn't valid.
httpOperationFailed :
(Error)
If can't get HTML page handler or page content.

Constructor which will produce a HTMLPage concept with the HTMLForm given as parameter. The concept produced in this way will have no headers.

top

Detailed Method Info

getContent
Returns: String
Exceptions:
httpOperationFailed :
(Error)
If can't get HTML page content.

Returns a String which represents this page content.

top
getCookies
Returns: Series

Returns a Series containing Cookie concepts of all the cookie directives found in the HTTP header of this page.

top
getForms
Returns: Series

Returns a Series containing HTMLForm concepts of all the HTML forms found in this particular page.

top
getHeaders
Returns: Map

Returns a Map containing all the key, value HTTP header pairs, as returned by the server which served this page.

top
getLinks
Returns: Series

Returns a Series of Elements containing all the Links found in the page. The relative links will be returned to, but in their absolute form (the URL of the page appended before).

top
getTags
Returns: Series

The method will return a Series containing Tags and Strings resulted from processing the content of this HTML page.

The rules are: * all tags denoted by the < symbol and closed by the > symbol will be transformed in a Tag type of concept. * all strings outside tags will be transformed in the String type of concept. * the CR LF, CR, LF symbols will determine the parser to create a new String type concept.

top
getTagsWithText
Returns: Series

Returns a Series containing the tags with text resulted from processing the content of this HTMLPage.

top
getURIs
Returns: Series

Returns a Series with all URIs contained by this HTMLPage.

top
getResponseCode
Returns: Integer

Returns a response code of this HTMLPage header.

top
getResponseMessage
Returns: String

Returns a response message of this HTMLPage header.

top
static setFollowRedirects (Logical $bol)
Parameters:
$bol :Boolean expression.

Sets whether HTTP redirects (requests with response code 3xx) should be automatically followed by your connection

top
static isFollowRedirects
Returns: Logical

Returns a boolean indicating whether or not HTTP redirects (3xx) should be automatically followed.

top