htex.1 (1555B)
1 .TH HTEX "1" "August 2023" "User Commands" 2 .SH NAME 3 htex \- \fI\,ex\/\fRtract \fI\,ht\/\fRml 4 .SH SYNOPSIS 5 .B htex 6 [-o \fI\,OUTPUT_TYPE\/\fR] [-e] [-l \fI\,NUM\/\fR] \fI\,PATTERN\/\fR [\fI\,FILE\/\fR] 7 .SH DESCRIPTION 8 .PP 9 Receives text from stdin or 10 .I FILE 11 and parses it as html. htex will 12 filter the parsed html based on the 13 .I PATTERN 14 and write the result to stdout. 15 16 The 17 .I PATTERN 18 has the following format: <tag_name>[<attr_key>=<attr_value>] 19 20 There are two shortcuts available: .<class_name> means [class=<class_name>] and #<id_name> 21 means [id=<id_name>] 22 23 By default the outerHTML will be written to stdout. 24 .TP 25 \fB\,-o\/\fR, \fB\,--output\/\fR \fI\,OUTPUT_TYPE\/\fR 26 Specify what part of an html tag should be printed to stdout. 27 Possible values: \fB\,outerhtml\/\fR, \fB\,innerhtml\/\fR, \fB\,innertext\/\fR or \fB\,attr_value\/\fR. 28 29 \fB\,innertext\/\fR is different from what a browser would consider innerText. 30 See section \fB\INNER_TEXT\/\fR. 31 .TP 32 \fB\,-e\/\fR, \fB\,--except\/\fR 33 Prints everything except the found html tags' outerHTML. 34 .TP 35 \fB\,-l\/\fR, \fB\,--limit\/\fR \fI\,NUM\/\fR 36 Find maximum \fI\,NUM\/\fR html tags. 37 .SH INNER_TEXT 38 Coming soon. 39 .SH EXAMPLES 40 .sp 41 .RS 4 42 cat test.html | htex -o innerhtml ".o-headline" 43 44 htex span test.html | htex -e div 45 46 htex -o innertext "input[name=blub]" test.html 47 48 htex -o attr_value 'a[href]' test.html 49 50 htex "[=someattrvalue]" test.html 51 .SH NOTES 52 This parser was written partly by reading the html spec at whatwg.org 53 and partly by just thinking logically and testing. It is not fully HTML5 compliant.