htex

simple incorrect html parser
git clone git://git.relim.de/htex.git
Log | Files | Refs | README

commit b1f55963f829e894123a128ee1a5d593e0cb30a8
parent 3be05c0cb6825138a7a7fdaa28466b90713d56e3
Author: Robin <kroekerrobin@gmail.com>
Date:   Sun, 13 Aug 2023 21:58:20 +0200

Update documentation

Diffstat:
Mhtex.1 | 53+++++++++++++++++++++++++----------------------------
1 file changed, 25 insertions(+), 28 deletions(-)

diff --git a/htex.1 b/htex.1 @@ -1,45 +1,42 @@ -.TH HTEX "1" "August 2022" "User Commands" +.TH HTEX "1" "August 2023" "User Commands" .SH NAME htex \- \fI\,ex\/\fRtract \fI\,ht\/\fRml .SH SYNOPSIS .B htex --a \fI\,attribute_name\/\fR [-e] [-i] -|\fI\,filename\/\fR +[PATTERN] [-e] [-i] [-t] [FILE] .SH DESCRIPTION .PP Receives text from stdin or a file -and interprets it as html. You provide -an \fI\,attribute_name\/\fR via the -.B -a -option and htex will write the html tag -found by \fI\,attribute_name\/\fR to stdout. -Pass the -.B -i -option to only output the content (innerHTML) of the -html tag. -.TP -\fB\,-a\/\fR, \fB\,--attribute\/\fR \fI\,attribute_name\/\fR -Filter html by the attribute name of a html tag. -If \fI\,attribute_name\/\fR starts with a dot (.) then -the following characters will be taken as the class name -of a tag. If \fI\,attribute_name\/\fR starts with a hashtag (#) -the following characters will be taken as the id name of a tag. -If \fI\,attribute_name\/\fR starts neither with a dot nor a hashtag it -will be taken as a tag name. +and interprets it as html. htex will +filter the html based on the +.I PATTERN +and write the result to stdout. + +The +.I PATTERN +has the following format: <tag_name>[<attr_key>=<attr_value>] + +There are two shortcuts: .<class_name> means [class=<class_name>] and #<id_name> +means [id=<id_name>] + +By default the outerHTML will be written to stdout. .TP \fB\,-i\/\fR, \fB\,--innerhtml\/\fR -Instead of returning the html tag only return -the content (innerHTML) of the tag. Cannot be used together with the -e option. +Return the innerHTML instead of outerHTML. +.TP +\fB\,-t\/\fR, \fB\,--innertext\/\fR +Return the innerText instead of outerHTML. Warning: innerText is different from +what a browser sees as innerText. .TP \fB\,-e\/\fR, \fB\,--except\/\fR -Output everything except the html tag specified in -a. -Cannot be used together with the -i option. +FUTURE. .SH EXAMPLES .sp .RS 4 -cat test.html | htex -i -a ".o-headline" - +cat test.html | htex -i ".o-headline" -htex -a span test.html +htex span test.html -htex --innerhtml --attribute "#container" test.html +htex --innertext "input[name=blub]" test.html -htex -e -a ".unnecessary-class" test.html +htex -t "[=someattrvalue]" test.html