htex

simple incorrect html parser
git clone git://git.relim.de/htex.git
Log | Files | Refs | README

htex.1 (1555B)


      1 .TH HTEX "1" "August 2023" "User Commands"
      2 .SH NAME
      3 htex \- \fI\,ex\/\fRtract \fI\,ht\/\fRml
      4 .SH SYNOPSIS
      5 .B htex
      6 [-o \fI\,OUTPUT_TYPE\/\fR] [-e] [-l \fI\,NUM\/\fR] \fI\,PATTERN\/\fR [\fI\,FILE\/\fR]
      7 .SH DESCRIPTION
      8 .PP
      9 Receives text from stdin or
     10 .I FILE
     11 and parses it as html. htex will
     12 filter the parsed html based on the
     13 .I PATTERN
     14 and write the result to stdout.
     15 
     16 The
     17 .I PATTERN
     18 has the following format: <tag_name>[<attr_key>=<attr_value>]
     19 
     20 There are two shortcuts available: .<class_name> means [class=<class_name>] and #<id_name>
     21 means [id=<id_name>]
     22 
     23 By default the outerHTML will be written to stdout.
     24 .TP
     25 \fB\,-o\/\fR, \fB\,--output\/\fR \fI\,OUTPUT_TYPE\/\fR
     26 Specify what part of an html tag should be printed to stdout.
     27 Possible values: \fB\,outerhtml\/\fR, \fB\,innerhtml\/\fR, \fB\,innertext\/\fR or \fB\,attr_value\/\fR.
     28 
     29 \fB\,innertext\/\fR is different from what a browser would consider innerText.
     30 See section \fB\INNER_TEXT\/\fR.
     31 .TP
     32 \fB\,-e\/\fR, \fB\,--except\/\fR
     33 Prints everything except the found html tags' outerHTML.
     34 .TP
     35 \fB\,-l\/\fR, \fB\,--limit\/\fR \fI\,NUM\/\fR
     36 Find maximum \fI\,NUM\/\fR html tags.
     37 .SH INNER_TEXT
     38 Coming soon.
     39 .SH EXAMPLES
     40 .sp
     41 .RS 4
     42 cat test.html | htex -o innerhtml ".o-headline"
     43 
     44 htex span test.html | htex -e div
     45 
     46 htex -o innertext "input[name=blub]" test.html
     47 
     48 htex -o attr_value 'a[href]' test.html
     49 
     50 htex "[=someattrvalue]" test.html
     51 .SH NOTES
     52 This parser was written partly by reading the html spec at whatwg.org
     53 and partly by just thinking logically and testing. It is not fully HTML5 compliant.