regex - How to cut html tag from very large multiline text file with content with use perl, sed or awk? -

- February 15, 2013

i want transform text (remove <math>.*?</math>) sed, awk or perl:

{| |- | colspan="2"| : <math> [\underbrace{\color{red}4,2}_{4 > 2},5,1,7] \rightarrow [2,\underbrace{\color{olivegreen}4,5}_{4 < 5},1,7] \rightarrow [2,4,\underbrace{\color{red}5,1}_{5 > 1},7] \rightarrow [2,4,1,\underbrace{\color{olivegreen}5,7}_{5 < 7}] </math> |- | : <math> [\underbrace{\color{olivegreen}2,4}_{2 < 4},1,5,{\color{blue}7}] \rightarrow [2,\underbrace{\color{red}4,1}_{4 > 1},5,{\color{blue}7}] \rightarrow [2,1,\underbrace{\color{olivegreen}4,5}_{4 < 5},{\color{blue}7}] </math> : <math> [\underbrace{\color{red}2,1}_{2 > 1},4,{\color{blue}5},{\color{blue}7}] \rightarrow [1,\underbrace{\color{olivegreen}2,4}_{2 < 4},{\color{blue}5},{\color{blue}7}] </math> : <math> [\underbrace{\color{olivegreen}1,2}_{1 < 2},{\color{blue}4},{\color{blue}5},{\color{blue}7}] </math> |}

into such text (please forgive me if remove - should remove <math>.*?</math>):

{| |- | colspan="2"| :  |- | :  :  :  |}

i read 20 page , tested 10 scripts without results. best is:

cat dirt-math.txt | awk '/<math>/{cut=1; print;}/<\/math>/{cut=0}!cut'

whatever not works correctly since lefts <math></math> not bad not know awk improve more.

if data nicely formatted in example, solution close. modified slightly

in awk:

sub(/<math>.*/, "") {print; cut=1} /<\/math>/          {cut=0; next} !cut

Search This Blog

Maxid

regex - How to cut html tag from very large multiline text file with content with use perl, sed or awk? -

Comments

Post a Comment

Popular posts from this blog

How to show in django cms breadcrumbs full path? -

php - Invalid Cofiguration - yii\base\InvalidConfigException - Yii2 -

ruby on rails - npm error: tunneling socket could not be established, cause=connect ETIMEDOUT -