Artificial intelligent assistant

Error getting the string between two patterns I want to get a string between two patterns. The pattern is the first environment `<p> </p>` in an html file. <p>Sorcery, R (1) </p> <p class="ctext"><b>As an additional cost to cast Goblin Grenade, sacrifice a Goblin.<br><br>Goblin Grenade deals 5 damage to target creature or player.</b></p> <p><i>Don't underestimate the aerodynamic qualities of the common goblin.</i></p> <p>Illus. Kev Walker</p> That environment is the first of the file so I discard everything matched until the `<p>` and I want to delete everything after the `</p>`. name="goblin grenade" wget -O- | grep -oP '<p>\K[^<]+' I don't know why it doesn't work properly. I get Sorcery, Illus. Kev Walker

Don't parse HTML with regex, instead, use a proper HTML parser.

### theory :

According to the compiling theory, HTML can't be parsed using regex based on finite state machine. Due to hierarchical construction of HTML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.

### realLife©®™ everyday tool :

instead, you should using a correct tool for a correct job.

...and it's a job for xmllint :

by _string matching_ :


string="Sorcery"
xmllint --html --xpath "//p[contains(text(), '$string')]/text()" file_or_URL


by the Nth `

` node where N is 1 here :


xmllint --html --xpath "//p[1]/text()" file_or_URL


Check <

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy 3cab5215abbec046a4943ad1871bc99b