From 787d03e48471ba62cd830379428f04d996f0b74b Mon Sep 17 00:00:00 2001 From: polo Date: Thu, 17 Feb 2022 18:13:00 +0100 Subject: model update --- lib/htmlawed/htmLawed_TESTCASE.txt | 910 ++++++++++++++++++------------------- 1 file changed, 455 insertions(+), 455 deletions(-) (limited to 'lib/htmlawed/htmLawed_TESTCASE.txt') diff --git a/lib/htmlawed/htmLawed_TESTCASE.txt b/lib/htmlawed/htmLawed_TESTCASE.txt index 24b00e7..2e64421 100755 --- a/lib/htmlawed/htmLawed_TESTCASE.txt +++ b/lib/htmlawed/htmLawed_TESTCASE.txt @@ -1,455 +1,455 @@ -/* -htmLawed_TESTCASE.txt, 24 September 2019 -To test htmLawed -Copyright Santosh Patnaik -Dual licensed with LGPL 3 and GPL 2+ -A PHP Labware internal utility - www.bioinformatics.org/phplabware/internal_utilities/htmLawed -*/ - -This file has UTF-8-encoded text with both correct and incorrect/malformed HTML/XHTML code snippets to test htmLawed (test cases/samples). The entire text may also be used as a unit. - -************************************************ -when viewing this file in a web browser, set the -character encoding to Unicode/UTF-8 -************************************************ - ---------------------- start -------------------- - -Try different $config and $spec values. Some text even when filtered in will not be displayed in a rendered web-page
- -
Attributes
- -Xml:lang:, ,
-Standard, predefined value, or empty attribute: , ,
-Required: , image
-Quote & space variation: a, a, a
-Invalid: a
-Duplicated: a
-Deprecated: a,

-Casing:
-Custom: image
-Data-*: a
-Admin-restricted?: - -
Attribute values
- -Duplicate ID value:, ,
-(try 'my_' for prefix)
-Double-quotes in value:, ,
-(try filter for CSS expression)
-CSS expression:

-Other: ,
-(try 'maxlen', 'maxval', etc., for 'input' in '$spec') - -
Blockquotes
- -
abc

-
abc
def

-
abc
def

-
abc
def
ghi

-abc
def
ghi
-
QQQ
x

-
x
QQQ

-
x
QQQ
x

-
x
QQQ

x


-
-(try with blockquote parent) - -
CDATA sections
- -Special characters inside: ]]>, 3.5, & 4 > 4 ]]>
-Normal: , CDATA follows:
-Malformed: , < ![CDATA check ]]>, , < ![CDATA check ] ]>
-Invalid: >CDATA in tag content,
text not allowed
- -
Complex-1: deprecated elements
- -
-The PHP software script used for this web-page webpage is htmLawedTest.php, from PHP Labware. -
- -
Complex-2: deprecated attributes
- -aa -
-
-image - - - - - -
-
-

Section

-

Para

-
  1. First item
-
-
-
  1. First item
-
-
- -
Complex-3: embed, object, area
- -
- -
- - -

navigate the site: 1 | 3 | 4

- -
- -value - - - - - - - - -
Complex-4: nested and other tables
- -
Cell
Cell
Cell
Cell Cell Cell
Cell
Cell Cell Cell

-PCDATA wrong: Well
Hello

-Missing tr:
Well

- -
Complex-5: pseudo, disallowed or non-HTML tags
- -(Try different 'keep_bad' values) -<*> Pseudotags <*> -Non-HTML tag xml -

-Disallowed tag p -

- - -
Elements
- -Unbalanced: check
-Non-XHTML:

-Malformed: < a href="">, , , , < /a>, < a href="">, a, a,
-Invalid: a
-Empty: a, a, atext
-Content invalid: 12
-Content invalid?:

(try setting 'form' as parent)
-Casing:
-Check for tidy:



hi
- -
Entities
- -Special: & 3 < 2 & 5>4 and j >i >a & ia
-Padding: B B f f  
-Malformed: & #x27;, &x27;, ' &TILDE;, &tilde
-Invalid: , �, , �, ￿, &bad;
-Discouraged characters: , „, ﷠, 􏿾
-Context: '>', <?
-Casing: ', ', &TILDE;, ˜ -
-(also check named-to-numeric and hexdec-to-decimal, and vice versa, conversions) - -
Format
- -Valid but ill-formatted: text -text - text text
p r e
- text text

-text none text -text none t e x t -
text none t e x t - -text none t e x t - -
-
p r e  
-
-				pre
-		
-
-
Cell
Cell
Cell
CellCellCell
Cell
CellCellCell
-(try to compact or beautify) - -
Forms
- -(note nesting of 'form', missing required attributes, etc.)
-
- -
pl
- h - -

-


-
B:C:

-(try each of these lines separately)
-
what
-what -(try with container as div and as form)
-c a b - -
HTML comments (also CDATA)
- -Script inside:
-Special characters inside: , , , c
-Normal: , , comment:,
text not allowed

-Malformed: , < ![CDATA check ]]>, < ![CDATA check ] ]>
-Invalid:
>comment in tag content, - -
HTML5
- -figure and figcaption:
picture
Caption for the awesome picture
-article:

A

B

C

E

F

G

-meter:

Heat 150.

-datalist: - -
Ins-Del
- -(depending on context, these elements can be of either block or inline type)
-

block


-

d


-

d

d

d
- -
Lists
- -Invalid character data:
  • (item
  • )

-Definition list:
a
bad
first one
b
second

-Definition list, close-tags omitted:
a
bad
first one
b
second

-Definition lists, nested:
-
T1
-
D1
-
T2
-
D2
t1
d1
t2
d2
-
T3
-
D3
-
T4
-
D4
t1
d1
-

-Definition lists, nested, close-tags omitted:
-
T1 -
D1
-
T2
-
D2
t1
d1
t2
d2
-
T3 -
D3 -
T4 -
D4
t1
d1
-

-Nested:
    -
  • l1
  • -
  • l2
    1. lo1
    2. lo2
  • -
  • l3
  • -
  • l4
    1. lo3
    2. lo4
      1. lo5
  • -

-Nested, directly:
    -
  • l1
  • -
      l2
    -
  • l3
  • -

-Nested, close-tags omitted:
    -
  • l1
  • -
  • l2
    1. lo1
    2. lo2
    -
  • l3 -
  • l4
    1. lo3
    2. lo4
      1. lo5
    -

-Complex: -
  1. -
    -
-Menu:
  • - -
  • -
    - -
    Microdata
    - -
    -I am X but people call me Y. -Find me at -
    - -
    Microsoft Word
    - -Proprietary tag:

     


    -XML declaration:
    -XML-invalid character code-point (may not replicate):

    “Where is he?” asked both Mary – the one so lovely – and Jane.

    - -
    Nesting
    - -Block or inline a:

    text

    hi

    - -
    Non-English text-1
    - -Inscrieţi-vă acum la a Zecea Conferinţă Internaţională
    -გთხოვთ ახლავე გაიაროთ რეგისტრაცია
    -večjezično računalništvo
    -อ.อ่าง
    -Зарегистрируйтесь сейчас -на Десятую Международную Конференцию по
    -(this file should have utf-8 encoding; some characters may not be displayed because of missing fonts, etc.) - -
    Non-English text-2: entities
    - -用统一码
    -გთხოვთ
    -Inscreva-se agora para a Décima Conferência Internacional Sobre O Unicode, realizada entre os dias 10 e 12 de março de 1997 em Mainz -na Alemanha. - -
    Ruby
    - -(need compatible browser)
    - - - - - - - - - さい - とう - のぶ - - - - W3C Associate Chairman - -
    - - WWW - (World Wide Web) -
    - - A - (aaa) - - - -
    Tables
    - -Omitted closing tags: -- - -
    h1c1h1c2 -
    r1c1r1c2 -
    r2c1r2c2 -

    -Nested, omitted closing tags: -- - -
    h1c1h1c2 -
    r1c1r1c2 -- - -
    h1c1h1c2 -
    r1c1r1c2 -
    r2c1r2c2 -
    -
    r2c1r2c2 -

    - -
    Tag transformation
    -Font element with malicious code:


    -Font element intended as 'inline' element:

    hi


    -Font element intended as 'block' element:
    hi

    -Font element intended as 'block' element:
    hi
    QQQ

    - -
    Tidy
    -White-space handling: abc def ghi abc def ghi - -
    URLs
    - -Relative and absolute: , , , , , ,
    -(try base URL value of 'http://a.com/b/')
    -CSS URLs:
    ,
    ,
    ,
    ,

    -Double URLs: b
    -Anti-spam: (try regex for 'http://a.com', etc.) , , , , , , ,
    -Soft-hyphen: ídis­c - -
    XSS
    - -<img onmouseover=confirm(1)// -'';!--"=&{()}
    -
    -
    -
    -
    -test - -

    -

    -

    -
    -
    -

    -test
    -Bad IE7: x
    -Opera: link -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: xxx
    -Bad IE7: x
    -Bad IE7: x
    -Bad IE7: x
    -Bad IE7: x
    -Bad IE7: exp/*x
    -Bad IE7: hi
    -Bad IE7: hi
    -Bad IE7: test
    -Bad IE7: hi
    -Bad IE7: hi
    - -
    Other
    - -3 < 4
    -3 > 4
    - > 3
    -<._.> hi!
    -<<< ALERT >>>
    - some stuff
    -
    -
    -
    -if(13age){say 'teen'}
    -age >51 and a smoking history of >51 pack-years was
    -age > 51 and a smoking history of >51 pack-years was
    -age <51 and a smoking history of <51 pack-years was
    -age < 51 and a smoking history of < 51 pack-years was
    -age >51 and a smoking history of >51 pack-years
    -age > 51 and a smoking history of >51 pack-years
    -age <51 and a smoking history of <51 pack-years
    -age < 51 and a smoking history of < 51 pack-years
    +/* +htmLawed_TESTCASE.txt, 24 September 2019 +To test htmLawed +Copyright Santosh Patnaik +Dual licensed with LGPL 3 and GPL 2+ +A PHP Labware internal utility - www.bioinformatics.org/phplabware/internal_utilities/htmLawed +*/ + +This file has UTF-8-encoded text with both correct and incorrect/malformed HTML/XHTML code snippets to test htmLawed (test cases/samples). The entire text may also be used as a unit. + +************************************************ +when viewing this file in a web browser, set the +character encoding to Unicode/UTF-8 +************************************************ + +--------------------- start -------------------- + +Try different $config and $spec values. Some text even when filtered in will not be displayed in a rendered web-page
    + +
    Attributes
    + +Xml:lang:, ,
    +Standard, predefined value, or empty attribute: , ,
    +Required: , image
    +Quote & space variation: a, a, a
    +Invalid: a
    +Duplicated: a
    +Deprecated: a,

    +Casing:
    +Custom: image
    +Data-*: a
    +Admin-restricted?: + +
    Attribute values
    + +Duplicate ID value:, ,
    +(try 'my_' for prefix)
    +Double-quotes in value:, ,
    +(try filter for CSS expression)
    +CSS expression:

    +Other: ,
    +(try 'maxlen', 'maxval', etc., for 'input' in '$spec') + +
    Blockquotes
    + +
    abc

    +
    abc
    def

    +
    abc
    def

    +
    abc
    def
    ghi

    +abc
    def
    ghi
    +
    QQQ
    x

    +
    x
    QQQ

    +
    x
    QQQ
    x

    +
    x
    QQQ

    x


    +
    +(try with blockquote parent) + +
    CDATA sections
    + +Special characters inside: ]]>, 3.5, & 4 > 4 ]]>
    +Normal: , CDATA follows:
    +Malformed: , < ![CDATA check ]]>, , < ![CDATA check ] ]>
    +Invalid: >CDATA in tag content,
    text not allowed
    + +
    Complex-1: deprecated elements
    + +
    +The PHP software script used for this web-page webpage is htmLawedTest.php, from PHP Labware. +
    + +
    Complex-2: deprecated attributes
    + +aa +
    +
    +image + + + + + +
    +
    +

    Section

    +

    Para

    +
    1. First item
    +
    +
    +
    1. First item
    +
    +
    + +
    Complex-3: embed, object, area
    + +
    + +
    + + +

    navigate the site: 1 | 3 | 4

    + +
    + +value + + + + + + + + +
    Complex-4: nested and other tables
    + +
    Cell
    Cell
    Cell
    Cell Cell Cell
    Cell
    Cell Cell Cell

    +PCDATA wrong: Well
    Hello

    +Missing tr:
    Well

    + +
    Complex-5: pseudo, disallowed or non-HTML tags
    + +(Try different 'keep_bad' values) +<*> Pseudotags <*> +Non-HTML tag xml +

    +Disallowed tag p +

    +
      Bad
    • OK
    + +
    Elements
    + +Unbalanced: check
    +Non-XHTML:

    +Malformed: < a href="">, , , , < /a>, < a href="">, a, a,
    +Invalid: a
    +Empty: a, a, atext
    +Content invalid: 12
    +Content invalid?:

    (try setting 'form' as parent)
    +Casing:
    +Check for tidy:



    hi
    + +
    Entities
    + +Special: & 3 < 2 & 5>4 and j >i >a & ia
    +Padding: B B f f  
    +Malformed: & #x27;, &x27;, ' &TILDE;, &tilde
    +Invalid: , �, , �, ￿, &bad;
    +Discouraged characters: , „, ﷠, 􏿾
    +Context: '>', <?
    +Casing: ', ', &TILDE;, ˜ +
    +(also check named-to-numeric and hexdec-to-decimal, and vice versa, conversions) + +
    Format
    + +Valid but ill-formatted: text +text + text text
    p r e
    + text text

    +text none text +text none t e x t +
    text none t e x t + +text none t e x t + +
    +
    p r e  
    +
    +				pre
    +		
    +
    +
    Cell
    Cell
    Cell
    CellCellCell
    Cell
    CellCellCell
    +(try to compact or beautify) + +
    Forms
    + +(note nesting of 'form', missing required attributes, etc.)
    +
    + +
    pl
    + h + +

    +


    +
    B:C:

    +(try each of these lines separately)
    +
    what
    +what +(try with container as div and as form)
    +c a b + +
    HTML comments (also CDATA)
    + +Script inside:
    +Special characters inside: , , , c
    +Normal: , , comment:,
    text not allowed

    +Malformed: , < ![CDATA check ]]>, < ![CDATA check ] ]>
    +Invalid:
    >comment in tag content, + +
    HTML5
    + +figure and figcaption:
    picture
    Caption for the awesome picture
    +article:

    A

    B

    C

    E

    F

    G

    +meter:

    Heat 150.

    +datalist: + +
    Ins-Del
    + +(depending on context, these elements can be of either block or inline type)
    +

    block


    +

    d


    +

    d

    d

    d
    + +
    Lists
    + +Invalid character data:
    • (item
    • )

    +Definition list:
    a
    bad
    first one
    b
    second

    +Definition list, close-tags omitted:
    a
    bad
    first one
    b
    second

    +Definition lists, nested:
    +
    T1
    +
    D1
    +
    T2
    +
    D2
    t1
    d1
    t2
    d2
    +
    T3
    +
    D3
    +
    T4
    +
    D4
    t1
    d1
    +

    +Definition lists, nested, close-tags omitted:
    +
    T1 +
    D1
    +
    T2
    +
    D2
    t1
    d1
    t2
    d2
    +
    T3 +
    D3 +
    T4 +
    D4
    t1
    d1
    +

    +Nested:
      +
    • l1
    • +
    • l2
      1. lo1
      2. lo2
    • +
    • l3
    • +
    • l4
      1. lo3
      2. lo4
        1. lo5
    • +

    +Nested, directly:
      +
    • l1
    • +
        l2
      +
    • l3
    • +

    +Nested, close-tags omitted:
      +
    • l1
    • +
    • l2
      1. lo1
      2. lo2
      +
    • l3 +
    • l4
      1. lo3
      2. lo4
        1. lo5
      +

    +Complex: +
    1. +
      +
    +Menu:
  • + +
  • +
    + +
    Microdata
    + +
    +I am X but people call me Y. +Find me at +
    + +
    Microsoft Word
    + +Proprietary tag:

     


    +XML declaration:
    +XML-invalid character code-point (may not replicate):

    “Where is he?” asked both Mary – the one so lovely – and Jane.

    + +
    Nesting
    + +Block or inline a:

    text

    hi

    + +
    Non-English text-1
    + +Inscrieţi-vă acum la a Zecea Conferinţă Internaţională
    +გთხოვთ ახლავე გაიაროთ რეგისტრაცია
    +večjezično računalništvo
    +อ.อ่าง
    +Зарегистрируйтесь сейчас +на Десятую Международную Конференцию по
    +(this file should have utf-8 encoding; some characters may not be displayed because of missing fonts, etc.) + +
    Non-English text-2: entities
    + +用统一码
    +გთხოვთ
    +Inscreva-se agora para a Décima Conferência Internacional Sobre O Unicode, realizada entre os dias 10 e 12 de março de 1997 em Mainz +na Alemanha. + +
    Ruby
    + +(need compatible browser)
    + + + + + + + + + さい + とう + のぶ + + + + W3C Associate Chairman + +
    + + WWW + (World Wide Web) +
    + + A + (aaa) + + + +
    Tables
    + +Omitted closing tags: ++ + +
    h1c1h1c2 +
    r1c1r1c2 +
    r2c1r2c2 +

    +Nested, omitted closing tags: ++ + +
    h1c1h1c2 +
    r1c1r1c2 ++ + +
    h1c1h1c2 +
    r1c1r1c2 +
    r2c1r2c2 +
    +
    r2c1r2c2 +

    + +
    Tag transformation
    +Font element with malicious code:


    +Font element intended as 'inline' element:

    hi


    +Font element intended as 'block' element:
    hi

    +Font element intended as 'block' element:
    hi
    QQQ

    + +
    Tidy
    +White-space handling: abc def ghi abc def ghi + +
    URLs
    + +Relative and absolute: , , , , , ,
    +(try base URL value of 'http://a.com/b/')
    +CSS URLs:
    ,
    ,
    ,
    ,

    +Double URLs: b
    +Anti-spam: (try regex for 'http://a.com', etc.) , , , , , , ,
    +Soft-hyphen: ídis­c + +
    XSS
    + +<img onmouseover=confirm(1)// +'';!--"=&{()}
    +
    +
    +
    +
    +test + +

    +

    +

    +
    +
    +

    +test
    +Bad IE7: x
    +Opera: link +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: xxx
    +Bad IE7: x
    +Bad IE7: x
    +Bad IE7: x
    +Bad IE7: x
    +Bad IE7: exp/*x
    +Bad IE7: hi
    +Bad IE7: hi
    +Bad IE7: test
    +Bad IE7: hi
    +Bad IE7: hi
    + +
    Other
    + +3 < 4
    +3 > 4
    + > 3
    +<._.> hi!
    +<<< ALERT >>>
    + some stuff
    +
    +
    +
    +if(13age){say 'teen'}
    +age >51 and a smoking history of >51 pack-years was
    +age > 51 and a smoking history of >51 pack-years was
    +age <51 and a smoking history of <51 pack-years was
    +age < 51 and a smoking history of < 51 pack-years was
    +age >51 and a smoking history of >51 pack-years
    +age > 51 and a smoking history of >51 pack-years
    +age <51 and a smoking history of <51 pack-years
    +age < 51 and a smoking history of < 51 pack-years
    -- cgit v1.2.3