Error getting the string between two patterns I want to get a string between two patterns. The pattern is the first environment ` ` in an html file. Sorcery, R (1) ...--prophetes.ai

Error getting the string between two patterns I want to get a string between two patterns. The pattern is the first environment ` ` in an html file. Sorcery, R (1) As an additional cost to cast Goblin Grenade, sacrifice a Goblin. Goblin Grenade deals 5 damage to target creature or player. Don't underestimate the aerodynamic qualities of the common goblin. Illus. Kev Walker That environment is the first of the file so I discard everything matched until the `` and I want to delete everything after the ``. name="goblin grenade" wget -O- | grep -oP '\K[^<]+' I don't know why it doesn't work properly. I get Sorcery, Illus. Kev Walker

Don't parse HTML with regex, instead, use a proper HTML parser.

### theory :

According to the compiling theory, HTML can't be parsed using regex based on finite state machine. Due to hierarchical construction of HTML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.

### realLife©®™ everyday tool :

instead, you should using a correct tool for a correct job.

...and it's a job for xmllint :

by _string matching_ :

string="Sorcery"
xmllint --html --xpath "//p[contains(text(), '$string')]/text()" file_or_URL

by the Nth `

` node where N is 1 here :

xmllint --html --xpath "//p[1]/text()" file_or_URL

Check <