regex - How to cut html tag from very large multiline text file with content with use perl, sed or awk? -
i want transform text (remove <math>.*?</math>
) sed, awk or perl:
{| |- | colspan="2"| : <math> [\underbrace{\color{red}4,2}_{4 > 2},5,1,7] \rightarrow [2,\underbrace{\color{olivegreen}4,5}_{4 < 5},1,7] \rightarrow [2,4,\underbrace{\color{red}5,1}_{5 > 1},7] \rightarrow [2,4,1,\underbrace{\color{olivegreen}5,7}_{5 < 7}] </math> |- | : <math> [\underbrace{\color{olivegreen}2,4}_{2 < 4},1,5,{\color{blue}7}] \rightarrow [2,\underbrace{\color{red}4,1}_{4 > 1},5,{\color{blue}7}] \rightarrow [2,1,\underbrace{\color{olivegreen}4,5}_{4 < 5},{\color{blue}7}] </math> : <math> [\underbrace{\color{red}2,1}_{2 > 1},4,{\color{blue}5},{\color{blue}7}] \rightarrow [1,\underbrace{\color{olivegreen}2,4}_{2 < 4},{\color{blue}5},{\color{blue}7}] </math> : <math> [\underbrace{\color{olivegreen}1,2}_{1 < 2},{\color{blue}4},{\color{blue}5},{\color{blue}7}] </math> |}
into such text (please forgive me if remove - should remove <math>.*?</math>
):
{| |- | colspan="2"| : |- | : : : |}
i read 20 page , tested 10 scripts without results. best is:
cat dirt-math.txt | awk '/<math>/{cut=1; print;}/<\/math>/{cut=0}!cut'
whatever not works correctly since lefts <math></math>
not bad not know awk improve more.
if data nicely formatted in example, solution close. modified slightly
in awk:
sub(/<math>.*/, "") {print; cut=1} /<\/math>/ {cut=0; next} !cut
Comments
Post a Comment