regexide.com/index.html

491 lines
117 KiB
HTML
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html><html><head>
<title>Regexide</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.16.9/dist/katex.min.css">
<style>
code[class*=language-],pre[class*=language-]{color:#333;background:0 0;font-family:Consolas,"Liberation Mono",Menlo,Courier,monospace;text-align:left;white-space:pre;word-spacing:normal;word-break:normal;word-wrap:normal;line-height:1.4;-moz-tab-size:8;-o-tab-size:8;tab-size:8;-webkit-hyphens:none;-moz-hyphens:none;-ms-hyphens:none;hyphens:none}pre[class*=language-]{padding:.8em;overflow:auto;border-radius:3px;background:#f5f5f5}:not(pre)>code[class*=language-]{padding:.1em;border-radius:.3em;white-space:normal;background:#f5f5f5}.token.blockquote,.token.comment{color:#969896}.token.cdata{color:#183691}.token.doctype,.token.macro.property,.token.punctuation,.token.variable{color:#333}.token.builtin,.token.important,.token.keyword,.token.operator,.token.rule{color:#a71d5d}.token.attr-value,.token.regex,.token.string,.token.url{color:#183691}.token.atrule,.token.boolean,.token.code,.token.command,.token.constant,.token.entity,.token.number,.token.property,.token.symbol{color:#0086b3}.token.prolog,.token.selector,.token.tag{color:#63a35c}.token.attr-name,.token.class,.token.class-name,.token.function,.token.id,.token.namespace,.token.pseudo-class,.token.pseudo-element,.token.url-reference .token.variable{color:#795da3}.token.entity{cursor:help}.token.title,.token.title .token.punctuation{font-weight:700;color:#1d3e81}.token.list{color:#ed6a43}.token.inserted{background-color:#eaffea;color:#55a532}.token.deleted{background-color:#ffecec;color:#bd2c00}.token.bold{font-weight:700}.token.italic{font-style:italic}.language-json .token.property{color:#183691}.language-markup .token.tag .token.punctuation{color:#333}.language-css .token.function,code.language-css{color:#0086b3}.language-yaml .token.atrule{color:#63a35c}code.language-yaml{color:#183691}.language-ruby .token.function{color:#333}.language-markdown .token.url{color:#795da3}.language-makefile .token.symbol{color:#795da3}.language-makefile .token.variable{color:#183691}.language-makefile .token.builtin{color:#0086b3}.language-bash .token.keyword{color:#0086b3}pre[data-line]{position:relative;padding:1em 0 1em 3em}pre[data-line] .line-highlight-wrapper{position:absolute;top:0;left:0;background-color:transparent;display:block;width:100%}pre[data-line] .line-highlight{position:absolute;left:0;right:0;padding:inherit 0;margin-top:1em;background:hsla(24,20%,50%,.08);background:linear-gradient(to right,hsla(24,20%,50%,.1) 70%,hsla(24,20%,50%,0));pointer-events:none;line-height:inherit;white-space:pre}pre[data-line] .line-highlight:before,pre[data-line] .line-highlight[data-end]:after{content:attr(data-start);position:absolute;top:.4em;left:.6em;min-width:1em;padding:0 .5em;background-color:hsla(24,20%,50%,.4);color:#f4f1ef;font:bold 65%/1.5 sans-serif;text-align:center;vertical-align:.3em;border-radius:999px;text-shadow:none;box-shadow:0 1px #fff}pre[data-line] .line-highlight[data-end]:after{content:attr(data-end);top:auto;bottom:.4em}html body{font-family:'Helvetica Neue',Helvetica,'Segoe UI',Arial,freesans,sans-serif;font-size:16px;line-height:1.6;color:#333;background-color:#fff;overflow:initial;box-sizing:border-box;word-wrap:break-word}html body>:first-child{margin-top:0}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{line-height:1.2;margin-top:1em;margin-bottom:16px;color:#000}html body h1{font-size:2.25em;font-weight:300;padding-bottom:.3em}html body h2{font-size:1.75em;font-weight:400;padding-bottom:.3em}html body h3{font-size:1.5em;font-weight:500}html body h4{font-size:1.25em;font-weight:600}html body h5{font-size:1.1em;font-weight:600}html body h6{font-size:1em;font-weight:600}html body h1,html body h2,html body h3,html body h4,html body h5{font-weight:600}html body h5{font-size:1em}html body h6{color:#5c5c5c}html body strong{color:#000}html body del{color:#5c5c5c}html body a:not([href]){color:inherit;text-decoration:none}html body a{color:#08c;text-decoration:none}html body a:hover{color:#00a3f5;text-decoration:none}html body img{max-width:100%}html body>p{margin-top:0;margin-bottom:16px;word-wrap:break-word}html body>ol,html body>ul{margin-bottom:16px}html body ol,html body ul{padding-left:2em}html body ol.no-list,html body ul.no-list{padding:0;list-style-type:none}html body ol ol,html body ol ul,html body ul ol,html body ul ul{margin-top:0;margin-bottom:0}html body li{margin-bottom:0}html body li.task-list-item{list-style:none}html body li>p{margin-top:0;margin-bottom:0}html body .task-list-item-checkbox{margin:0 .2em .25em -1.8em;vertical-align:middle}html body .task-list-item-checkbox:hover{cursor:pointer}html body blockquote{margin:16px 0;font-size:inherit;padding:0 15px;color:#5c5c5c;background-color:#f0f0f0;border-left:4px solid #d6d6d6}html body blockquote>:first-child{margin-top:0}html body blockquote>:last-child{margin-bottom:0}html body hr{height:4px;margin:32px 0;background-color:#d6d6d6;border:0 none}html body table{margin:10px 0 15px 0;border-collapse:collapse;border-spacing:0;display:block;width:100%;overflow:auto;word-break:normal;word-break:keep-all}html body table th{font-weight:700;color:#000}html body table td,html body table th{border:1px solid #d6d6d6;padding:6px 13px}html body dl{padding:0}html body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:700}html body dl dd{padding:0 16px;margin-bottom:16px}html body code{font-family:Menlo,Monaco,Consolas,'Courier New',monospace;font-size:.85em;color:#000;background-color:#f0f0f0;border-radius:3px;padding:.2em 0}html body code::after,html body code::before{letter-spacing:-.2em;content:'\00a0'}html body pre>code{padding:0;margin:0;word-break:normal;white-space:pre;background:0 0;border:0}html body .highlight{margin-bottom:16px}html body .highlight pre,html body pre{padding:1em;overflow:auto;line-height:1.45;border:#d6d6d6;border-radius:3px}html body .highlight pre{margin-bottom:0;word-break:normal}html body pre code,html body pre tt{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;word-wrap:normal;background-color:transparent;border:0}html body pre code:after,html body pre code:before,html body pre tt:after,html body pre tt:before{content:normal}html body blockquote,html body dl,html body ol,html body p,html body pre,html body ul{margin-top:0;margin-bottom:16px}html body kbd{color:#000;border:1px solid #d6d6d6;border-bottom:2px solid #c7c7c7;padding:2px 4px;background-color:#f0f0f0;border-radius:3px}@media print{html body{background-color:#fff}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{color:#000;page-break-after:avoid}html body blockquote{color:#5c5c5c}html body pre{page-break-inside:avoid}html body table{display:table}html body img{display:block;max-width:100%;max-height:100%}html body code,html body pre{word-wrap:break-word;white-space:pre}}.markdown-preview{width:100%;height:100%;box-sizing:border-box}.markdown-preview ul{list-style:disc}.markdown-preview ul ul{list-style:circle}.markdown-preview ul ul ul{list-style:square}.markdown-preview ol{list-style:decimal}.markdown-preview ol ol,.markdown-preview ul ol{list-style-type:lower-roman}.markdown-preview ol ol ol,.markdown-preview ol ul ol,.markdown-preview ul ol ol,.markdown-preview ul ul ol{list-style-type:lower-alpha}.markdown-preview .newpage,.markdown-preview .pagebreak{page-break-before:always}.markdown-preview pre.line-numbers{position:relative;padding-left:3.8em;counter-reset:linenumber}.markdown-preview pre.line-numbers>code{position:relative}.markdown-preview pre.line-numbers .line-numbers-rows{position:absolute;pointer-events:none;top:1em;font-size:100%;left:0;width:3em;letter-spacing:-1px;border-right:1px solid #999;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none}.markdown-preview pre.line-numbers .line-numbers-rows>span{pointer-events:none;display:block;counter-increment:linenumber}.markdown-preview pre.line-numbers .line-numbers-rows>span:before{content:counter(linenumber);color:#999;display:block;padding-right:.8em;text-align:right}.markdown-preview .mathjax-exps .MathJax_Display{text-align:center!important}.markdown-preview:not([data-for=preview]) .code-chunk .code-chunk-btn-group{display:none}.markdown-preview:not([data-for=preview]) .code-chunk .status{display:none}.markdown-preview:not([data-for=preview]) .code-chunk .output-div{margin-bottom:16px}.markdown-preview .md-toc{padding:0}.markdown-preview .md-toc .md-toc-link-wrapper .md-toc-link{display:inline;padding:.25rem 0}.markdown-preview .md-toc .md-toc-link-wrapper .md-toc-link div,.markdown-preview .md-toc .md-toc-link-wrapper .md-toc-link p{display:inline}.markdown-preview .md-toc .md-toc-link-wrapper.highlighted .md-toc-link{font-weight:800}.scrollbar-style::-webkit-scrollbar{width:8px}.scrollbar-style::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}.scrollbar-style::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,.66);border:4px solid rgba(150,150,150,.66);background-clip:content-box}html body[for=html-export]:not([data-presentation-mode]){position:relative;width:100%;height:100%;top:0;left:0;margin:0;padding:0;overflow:auto}html body[for=html-export]:not([data-presentation-mode]) .markdown-preview{position:relative;top:0;min-height:100vh}@media screen and (min-width:914px){html body[for=html-export]:not([data-presentation-mode]) .markdown-preview{padding:2em calc(50% - 457px + 2em)}}@media screen and (max-width:914px){html body[for=html-export]:not([data-presentation-mode]) .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for=html-export]:not([data-presentation-mode]) .markdown-preview{font-size:14px!important;padding:1em}}@media print{html body[for=html-export]:not([data-presentation-mode]) #sidebar-toc-btn{display:none}}html body[for=html-export]:not([data-presentation-mode]) #sidebar-toc-btn{position:fixed;bottom:8px;left:8px;font-size:28px;cursor:pointer;color:inherit;z-index:99;width:32px;text-align:center;opacity:.4}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] #sidebar-toc-btn{opacity:1}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc{position:fixed;top:0;left:0;width:300px;height:100%;padding:32px 0 48px 0;font-size:14px;box-shadow:0 0 4px rgba(150,150,150,.33);box-sizing:border-box;overflow:auto;background-color:inherit}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar{width:8px}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,.66);border:4px solid rgba(150,150,150,.66);background-clip:content-box}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc a{text-decoration:none}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc .md-toc{padding:0 16px}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc .md-toc .md-toc-link-wrapper .md-toc-link{display:inline;padding:.25rem 0}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc .md-toc .md-toc-link-wrapper .md-toc-link div,html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc .md-toc .md-toc-link-wrapper .md-toc-link p{display:inline}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc .md-toc .md-toc-link-wrapper.highlighted .md-toc-link{font-weight:800}html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{left:300px;width:calc(100% - 300px);padding:2em calc(50% - 457px - 300px / 2);margin:0;box-sizing:border-box}@media screen and (max-width:1274px){html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for=html-export]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{width:100%}}html body[for=html-export]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .markdown-preview{left:50%;transform:translateX(-50%)}html body[for=html-export]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .md-sidebar-toc{display:none}@font-face{font-family:"Material Icons";font-style:normal;font-weight:400;src:local("Material Icons"),local("MaterialIcons-Regular"),url("data:application/x-font-woff;charset=utf-8;base64,d09GRgABAAAAAAfIAAsAAAAADDAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAABHU1VCAAABCAAAADMAAABCsP6z7U9TLzIAAAE8AAAARAAAAFZW7kosY21hcAAAAYAAAADTAAACjtP6ytBnbHlmAAACVAAAAxgAAAQ4zRtvlGhlYWQAAAVsAAAALwAAADYRwZsnaGhlYQAABZwAAAAcAAAAJAeKAzxobXR4AAAFuAAAABIAAAA8OGQAAGxvY2EAAAXMAAAAIAAAACAG5AfwbWF4cAAABewAAAAfAAAAIAEfAERuYW1lAAAGDAAAAVcAAAKFkAhoC3Bvc3QAAAdkAAAAYgAAAK2vz7wkeJxjYGRgYOBikGPQYWB0cfMJYeBgYGGAAJAMY05meiJQDMoDyrGAaQ4gZoOIAgCKIwNPAHicY2BkPsQ4gYGVgYOpk+kMAwNDP4RmfM1gxMjBwMDEwMrMgBUEpLmmMDgwVLy4xKzzX4chhrmK4QpQmBEkBwAZygyweJzFkr0NwjAQhZ+TEP6CRUfHBEwRUWaQTICyQbpMwRCskA5RUIONxG0RnnNpKAIV4qzPku/8c353ACYAYrIjCWCuMAh2ptf0/hiL3p/gyPUWa3osqlt0L1zu9r71z8dGrJRykFoauXQd932Lj5vhG+MjxGeYI8MKETObMpslf5EyP8tg+vHun5r539PvlvXzaVhRFVQDTPEWKVQR90KhnnC5Ek67vUKN4VuFasM/ldARj43CCkCsEjpJSoVVgRyU0GVSK6wUpFFCx8lFgX0BiXpRPQB4nE2TTWjcRhTH3xttpDhxN7uxPlp3u/FK7moRPixafRijNosxSw/LUsIwNcaEHPZggo/FmEKMCKWU4kNOOftQSlhE8alnH0Ix9BqWnHooPRrTQ0+mnu2bXTu2pPdGM9LM/6c3fwECTM4gBBMYQNqxzLrZAjqYSlqu2TAHZQA0/DQJH6FtzqGDnvbt4Ggwvzw/nL8EfH8kW0fsuRqhgWXZnY7M1picaUL7Du5BHeDzMIl83dAt016wH1qmvtSMo5R6YRJHTR//FXsff/nj/tc/5K9P5d+nP22+fFK5u7v3K39SW3y+OtDKO3L85vD09PD9z5X17a2N1g4tqk01RlqX7gyoEmnsWQtVr4rtZMmukEaFBZxzefkCn11cyKMLZgshRwgTYNoLNXCBz2ja7HvZG7hDpPSNfoo5vs0knK/9hb+rNpu+8kHPgk/Ao4kK3tWtTpSEtvkA9c+wE6UaUdwieNkaHg55tBEtRiEPw1s0+FtrtTcc9two2lhMknV7PZF/cs6+uUFTmpTGbEx7sQCPSLOttHS3GRltqp7SNzVSKzl6aWnZT/CX5k6/v9N3Hh8fHBwffJVjhrC6OgH5dkIt/tPsq+d/PD5Qz7G7efzq1THFjdZVPe/N6ulQ3JnDWSE5junsFsVIiFwL/htf1S5gJ3BfOcUxfHKLnzqpFpyfZ9cX+/5WB6a+Y0pHpzkNrYNVDwMsikK+y7WuLCRg/oFHkA8VT3rDg5ZnU6ktzzINymV0m74Xd5pfIGXyFeVEQSShkzqG7TBBa2OxVRKitLXv7h3uuftXnXq7lz2tZ/WnWa9dx9dCjDhHzmuVQATlmljr9dZErUydSo2Hbi/b1vXtrOeGCk2/8s3ZlO8+ueJT8BVlw5pGw2oYccdSiHHqx0RlabHqdNR9jAETl6PreJcPBnnfpTLnOQ8C3OV8AmQGzouV1iZdeb5SSIoVc8W8/kcDtksUH5FrU6/aqBqNWcMEzxG4DAQ14qRQhi9mWU0rzepKezbjfgCwQKxVYq5ajRgpRqy45CqwkJydcEkbTkvRz8P5/2ZpDTN4nGNgZGBgAOKb6v+/xvPbfGXgZmEAgeuB2kkI+v8bFgbmKiCXg4EJJAoAPyAKhQB4nGNgZGBg1vmvwxDDwgACQJKRARXwAwAzZQHQeJxjYQCCFAYGFgbSMQAcWACdAAAAAAAAAAwALgBgAIQAmADSAQgBIgE8AVABoAHeAfwCHHicY2BkYGDgZ7BgYGMAASYg5gJCBob/YD4DAA/hAWQAeJxlkbtuwkAURMc88gApQomUJoq0TdIQzEOpUDokKCNR0BuzBiO/tF6QSJcPyHflE9Klyyekz2CuG8cr7547M3d9JQO4xjccnJ57vid2cMHqxDWc40G4Tv1JuEF+Fm6ijRfhM+oz4Ra6eBVu4wZvvMFpXLIa40PYQQefwjVc4Uu4Tv1HuEH+FW7i1mkKn6Hj3Am3sHC6wm08Ou8tpSZGe1av1PKggjSxPd8zJtSGTuinyVGa6/Uu8kxZludCmzxMEzV0B6U004k25W35fj2yNlCBSWM1paujKFWZSbfat+7G2mzc7weiu34aczzFNYGBhgfLfcV6iQP3ACkSaj349AxXSN9IT0j16JepOb01doiKbNWt1ovippz6sVYYwsXgX2rGVFIkq7Pl2PNrI6qW6eOshj0xaSq9mpNEZIWs8LZUfOouNkVXxp/d5woqebeYIf4D2J1ywQB4nG3LOw6AIBAE0B384B+PAkgEa+QwNnYmHt+EpXSal5lkSBBnoP8oCFSo0aCFRIceA0ZMmLFAYSW88rmvtMUjG3RiQ9HvpfusM6zWNmtc5H/iPewha50tOt5PS/QBx2IeSwAA") format("woff")}.admonition{box-shadow:0 2px 2px 0 rgba(0,0,0,.14),0 1px 5px 0 rgba(0,0,0,.12),0 3px 1px -2px rgba(0,0,0,.2);position:relative;margin:1.5625em 0;padding:0 1.2rem;border-left:.4rem solid rgba(68,138,255,.8);border-radius:.2rem;background-color:rgba(255,255,255,.05);overflow:auto}.admonition>p{margin-top:.8rem}.admonition>.admonition-title{margin:0 -1.2rem;padding:.8rem 1.2rem .8rem 3.6rem;border-bottom:1px solid rgba(68,138,255,.2);background-color:rgba(68,138,255,.1);font-weight:700}.admonition>.admonition-title:before{position:absolute;left:1.2rem;font-size:1.5rem;color:rgba(68,138,255,.8);content:"\E3C9"}.admonition>.admonition-title:before{font-family:Material Icons;font-style:normal;font-variant:normal;font-weight:400;line-height:2rem;text-transform:none;white-space:nowrap;speak:none;word-wrap:normal;direction:ltr}.admonition.abstract,.admonition.summary,.admonition.tldr{border-left-color:rgba(0,176,255,.8)}.admonition.abstract>.admonition-title,.admonition.summary>.admonition-title,.admonition.tldr>.admonition-title{background-color:rgba(0,176,255,.1);border-bottom-color:rgba(0,176,255,.2)}.admonition.abstract>.admonition-title:before,.admonition.summary>.admonition-title:before,.admonition.tldr>.admonition-title:before{color:#00b0ff;content:"\E8D2"}.admonition.hint,.admonition.tip{border-left-color:rgba(0,191,165,.8)}.admonition.hint>.admonition-title,.admonition.tip>.admonition-title{background-color:rgba(0,191,165,.1);border-bottom-color:rgba(0,191,165,.2)}.admonition.hint>.admonition-title:before,.admonition.tip>.admonition-title:before{color:#00bfa5;content:"\E80E"}.admonition.info,.admonition.todo{border-left-color:rgba(0,184,212,.8)}.admonition.info>.admonition-title,.admonition.todo>.admonition-title{background-color:rgba(0,184,212,.1);border-bottom-color:rgba(0,184,212,.2)}.admonition.info>.admonition-title:before,.admonition.todo>.admonition-title:before{color:#00b8d4;content:"\E88E"}.admonition.check,.admonition.done,.admonition.success{border-left-color:rgba(0,200,83,.8)}.admonition.check>.admonition-title,.admonition.done>.admonition-title,.admonition.success>.admonition-title{background-color:rgba(0,200,83,.1);border-bottom-color:rgba(0,200,83,.2)}.admonition.check>.admonition-title:before,.admonition.done>.admonition-title:before,.admonition.success>.admonition-title:before{color:#00c853;content:"\E876"}.admonition.faq,.admonition.help,.admonition.question{border-left-color:rgba(100,221,23,.8)}.admonition.faq>.admonition-title,.admonition.help>.admonition-title,.admonition.question>.admonition-title{background-color:rgba(100,221,23,.1);border-bottom-color:rgba(100,221,23,.2)}.admonition.faq>.admonition-title:before,.admonition.help>.admonition-title:before,.admonition.question>.admonition-title:before{color:#64dd17;content:"\E887"}.admonition.attention,.admonition.caution,.admonition.warning{border-left-color:rgba(255,145,0,.8)}.admonition.attention>.admonition-title,.admonition.caution>.admonition-title,.admonition.warning>.admonition-title{background-color:rgba(255,145,0,.1);border-bottom-color:rgba(255,145,0,.2)}.admonition.attention>.admonition-title:before{color:#ff9100;content:"\E417"}.admonition.caution>.admonition-title:before,.admonition.warning>.admonition-title:before{color:#ff9100;content:"\E002"}.admonition.fail,.admonition.failure,.admonition.missing{border-left-color:rgba(255,82,82,.8)}.admonition.fail>.admonition-title,.admonition.failure>.admonition-title,.admonition.missing>.admonition-title{background-color:rgba(255,82,82,.1);border-bottom-color:rgba(255,82,82,.2)}.admonition.fail>.admonition-title:before,.admonition.failure>.admonition-title:before,.admonition.missing>.admonition-title:before{color:#ff5252;content:"\E14C"}.admonition.bug,.admonition.danger,.admonition.error{border-left-color:rgba(255,23,68,.8)}.admonition.bug>.admonition-title,.admonition.danger>.admonition-title,.admonition.error>.admonition-title{background-color:rgba(255,23,68,.1);border-bottom-color:rgba(255,23,68,.2)}.admonition.danger>.admonition-title:before{color:#ff1744;content:"\E3E7"}.admonition.error>.admonition-title:before{color:#ff1744;content:"\E14C"}.admonition.bug>.admonition-title:before{color:#ff1744;content:"\E868"}.admonition.example,.admonition.snippet{border-left-color:rgba(0,184,212,.8)}.admonition.example>.admonition-title,.admonition.snippet>.admonition-title{background-color:rgba(0,184,212,.1);border-bottom-color:rgba(0,184,212,.2)}.admonition.example>.admonition-title:before,.admonition.snippet>.admonition-title:before{color:#00b8d4;content:"\E242"}.admonition.cite,.admonition.quote{border-left-color:rgba(158,158,158,.8)}.admonition.cite>.admonition-title,.admonition.quote>.admonition-title{background-color:rgba(158,158,158,.1);border-bottom-color:rgba(158,158,158,.2)}.admonition.cite>.admonition-title:before,.admonition.quote>.admonition-title:before{color:#9e9e9e;content:"\E244"}
/* Please visit the URL below for more information: */
/* https://shd101wyy.github.io/markdown-preview-enhanced/#/customize-css */
</style>
<!-- The content below will be included at the end of the <head> element. --><script type="text/javascript">
document.addEventListener("DOMContentLoaded", function () {
// your code here
});
</script></head><body for="html-export">
<div class="crossnote markdown-preview ">
<h1><img src="./regexide.png" style="height: 4rem; display: inline-block" alt="Regexide Logo"></h1>
<p>This story begins with a simple question:</p>
<div class="admonition question">
<p class="admonition-title">How do I remove XML comments in JavaScript?</p>
</div>
<p>The Internet hivemind converged on one general approach: <b>regular expressions</b>.</p>
<p>The most frequently recommended answer is:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code>str <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--[\s\S]*?--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">g</span></span><span class="token punctuation">,</span> <span class="token string">""</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// bad, do not use</span>
<span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span>
</code></pre><p><strong>There are known flaws with this family of regular expressions.</strong></p>
<p>This discussion focuses on "Regexide", the act of identifying and replacing flawed regular expressions with other techniques that better reflect the intended effect.</p>
<div class="md-toc">
<div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:18px">
<a href="#why-xml-comments-matter" class="md-toc-link">
<p>Why XML Comments matter</p>
</a></div><details style="padding:0;;padding-left:0px;" open="">
<summary class="md-toc-link-wrapper">
<a href="#how-the-regular-expression-works" class="md-toc-link"><p>How the regular expression works</p>
</a>
</summary>
<div>
<div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:42px">
<a href="#usage-in-open-source-projects" class="md-toc-link">
<p>Usage in Open Source Projects</p>
</a></div><div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:42px">
<a href="#a-rare-consensus" class="md-toc-link">
<p>A rare consensus</p>
</a></div>
</div>
</details>
<details style="padding:0;;padding-left:0px;" open="">
<summary class="md-toc-link-wrapper">
<a href="#the-internet-failed-us" class="md-toc-link"><p>The Internet Failed Us</p>
</a>
</summary>
<div>
<div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:42px">
<a href="#why-the-regular-expression-is-slow" class="md-toc-link">
<p>Why the regular expression is slow</p>
</a></div><div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:42px">
<a href="#vulnerability" class="md-toc-link">
<p>Vulnerability</p>
</a></div><div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:42px">
<a href="#a-side-note-about-rust" class="md-toc-link">
<p>A side note about Rust</p>
</a></div>
</div>
</details>
<details style="padding:0;;padding-left:0px;" open="">
<summary class="md-toc-link-wrapper">
<a href="#workarounds" class="md-toc-link"><p>Workarounds</p>
</a>
</summary>
<div>
<div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:42px">
<a href="#use-a-different-engine" class="md-toc-link">
<p>Use a Different Engine</p>
</a></div><div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:42px">
<a href="#exogenous-constraints" class="md-toc-link">
<p>Exogenous Constraints</p>
</a></div><div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:42px">
<a href="#remove-the-regular-expression" class="md-toc-link">
<p>Remove the Regular Expression</p>
</a></div><div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:42px">
<a href="#validate-data" class="md-toc-link">
<p>Validate Data</p>
</a></div><div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:42px">
<a href="#limit-to-trusted-data" class="md-toc-link">
<p>Limit to Trusted Data</p>
</a></div>
</div>
</details>
<div class="md-toc-link-wrapper" style="padding:0;;display:list-item;list-style:square;margin-left:18px">
<a href="#special-thanks" class="md-toc-link">
<p>Special Thanks</p>
</a></div>
</div>
<h2 id="why-xml-comments-matter">Why XML Comments matter </h2>
<p>XML is a popular format for storing and sharing data. It was explicitly designed for people and programs to read and write data.<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup> From spreadsheets to save states, most modern software and games parse and write XML.</p>
<p>XML comments are special notes that parsers should not treat as data. XML comments start with <code>&lt;!--</code> and end with <code>--&gt;</code>.</p>
<p>Technically XML comments must not contain the string <code>--</code> within the comment body. Many programs and people write invalid XML comments, so parsers will typically allow for nested <code>--</code>.</p>
<p>The following XML comment is technically invalid but accepted by many parsers:</p>
<pre data-role="codeBlock" data-info="xml" class="language-xml xml"><code><span class="token tag"><span class="token tag"><span class="token punctuation">&lt;</span>!--</span> <span class="token attr-name">I</span> <span class="token attr-name">used</span> <span class="token attr-name">to</span> <span class="token attr-name">be</span> <span class="token attr-name">a</span> <span class="token attr-name">programmer</span> <span class="token attr-name">like</span> <span class="token attr-name">you,</span>
<span class="token attr-name">then</span> <span class="token attr-name">I</span> <span class="token attr-name">took</span> <span class="token attr-name">an</span> <span class="token attr-name">&lt;!--</span> <span class="token attr-name">in</span> <span class="token attr-name">the</span> <span class="token attr-name">Kleene</span> <span class="token attr-name">--</span><span class="token punctuation">&gt;</span></span>
</code></pre><p>(Kleene wrote a seminal paper<sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup> on regular expressions.)</p>
<h2 id="how-the-regular-expression-works">How the regular expression works </h2>
<p>The regular expression body <code>/&lt;!--[\s\S]*?--&gt;/</code> has three parts:</p>
<p>A) <code>&lt;!--</code> matches the four literal characters<br>
B) <code>[\s\S]*?</code> matches any number of characters<br>
C) <code>--&gt;</code> matches the three literal characters</p>
<p>In (B), <code>[\s\S]</code> matches any character. The <code>*?</code> is a "non-greedy quantifier" that instructs the regular expression engine to take the shortest match.</p>
<details><summary><b>Example of greedy and non-greedy matches</b> (click to show)</summary>
<p>Consider the following string:</p>
<pre class="language-text">&lt;!-- &lt;!-- &lt;!-- --&gt; --&gt; --&gt;
</pre>
<p>The "greedy" <code>/&lt;!--[\s\S]*--&gt;/</code> will match from the first <code>&lt;!--</code> to the last <code>--&gt;</code>:</p>
<pre class="language-text"><span style="background-color: #FFFF00">&lt;!-- &lt;!-- &lt;!-- --&gt; --&gt; --&gt;</span> /&lt;!--[\s\S]*--&gt;/
</pre>
<p>The non-greedy <code>/&lt;!--[\s\S]*?--&gt;/</code> will match from the first <code>&lt;!--</code> to the first <code>--&gt;</code>:</p>
<pre class="language-text"><span style="background-color: #FFFF00">&lt;!-- &lt;!-- &lt;!-- --&gt;</span> --&gt; --&gt; /&lt;!--[\s\S]*?--&gt;/
</pre>
</details>
<p>The modern variant of the regular expression uses the <code>/s</code> flag:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code>str <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--.*?--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">gs</span></span><span class="token punctuation">,</span> <span class="token string">""</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// even worse</span>
<span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span><span class="token operator">^</span>
</code></pre><p>The <code>/s</code> flag modifies the <code>.</code> character class to include line terminators.</p>
<h3 id="usage-in-open-source-projects">Usage in Open Source Projects </h3>
<p>Many popular open source projects use problematic regular expressions.</p>
<p><a href="https://github.com/mozilla/nunjucks/blob/ea0d6d5396d39d9eed1b864febb36fbeca908f23/nunjucks/src/filters.js#L491">Nunjucks</a> used this regular expression within in the <code>striptags</code> filter expression:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code> <span class="token keyword keyword-let">let</span> tags <span class="token operator">=</span> <span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;\/?([a-z][a-z0-9]*)\b[^&gt;]*&gt;|&lt;!--[\s\S]*?--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">gi</span></span><span class="token punctuation">;</span>
</code></pre><p><a href="https://github.com/prettier/prettier/blob/45ad4668ebc133621c7f94e678ce399cab318068/scripts/lint-changelog.js#L51">PrettierJS</a> used this regular expression in the build sequence:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code><span class="token keyword keyword-const">const</span> templateComments <span class="token operator">=</span> template<span class="token punctuation">.</span><span class="token method function property-access">match</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--.*?--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">gs</span></span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code></pre><p><a href="https://github.com/rollup/rollup/blob/18372035f167ec104280e1e91ef795e4f7033f1e/scripts/release-helpers.js#L76">RollupJS</a> used this regular expression in the build sequence:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code><span class="token keyword keyword-const">const</span> bodyWithoutComments <span class="token operator">=</span> data<span class="token punctuation">.</span><span class="token property-access">body</span><span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--[\S\s]*?--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">g</span></span><span class="token punctuation">,</span> <span class="token string">''</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code></pre><p><a href="https://github.com/SheetJS/sheetjs/blob/master/xlsx.mjs#L18117">SheetJS</a> used this regular expression in parsing:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code>str <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--([\s\S]*?)--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">mg</span></span><span class="token punctuation">,</span><span class="token string">""</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code></pre><p><a href="https://github.com/vitejs/vite/blob/9fc5d9cb3a1b9df067e00959faa9da43ae03f776/packages/vite/src/node/optimizer/scan.ts#L259">ViteJS</a> used the nascent <code>s</code> flag to ensure <code>.</code> matches newline characters:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code><span class="token keyword module keyword-export">export</span> <span class="token keyword keyword-const">const</span> commentRE <span class="token operator">=</span> <span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--.*?--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">gs</span></span>
<span class="token comment">// Avoid matching the content of the comment</span>
raw <span class="token operator">=</span> raw<span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span>commentRE<span class="token punctuation">,</span> <span class="token string">'&lt;!----&gt;'</span><span class="token punctuation">)</span>
</code></pre><p><a href="https://github.com/vuejs/vue/blob/v2.2.3/dist/vue.esm.js#L7404">VueJS 2</a> used regular expressions in processing:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code>text <span class="token operator">=</span> text
<span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--([\s\S]*?)--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">g</span></span><span class="token punctuation">,</span> <span class="token string">'$1'</span><span class="token punctuation">)</span>
<span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!\[CDATA\[([\s\S]*?)]]&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">g</span></span><span class="token punctuation">,</span> <span class="token string">'$1'</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code></pre><p><a href="https://github.com/WordPress/WordPress/blob/master/wp-admin/js/word-count.js#L73">WordPress</a> used regular expressions in the word count calculator:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code> <span class="token literal-property property">HTMLcommentRegExp</span><span class="token operator">:</span> <span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--[\s\S]*?--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">g</span></span><span class="token punctuation">,</span>
</code></pre><p><a href="https://github.com/element-plus/element-plus/blob/4ac4750158fa634aa9da186111bce86c2898fda2/internal/build/src/tasks/helper.ts#L60">Element Plus</a> used a similar regular expression to match blocks starting with <code>&lt;del&gt;</code> and ending with <code>&lt;/del&gt;</code>:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code> <span class="token keyword keyword-const">const</span> str <span class="token operator">=</span> <span class="token function">removeTag</span><span class="token punctuation">(</span>value<span class="token punctuation">)</span>
<span class="token punctuation">.</span><span class="token method function property-access">replaceAll</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;del&gt;.*&lt;\/del&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">g</span></span><span class="token punctuation">,</span> <span class="token string">''</span><span class="token punctuation">)</span>
<span class="token comment">// ---------^^^^^^^^^^^^^^^^^^ -- start &lt;del&gt; end &lt;/del&gt;</span>
</code></pre><h3 id="a-rare-consensus">A rare consensus </h3>
<p>Most resources recommend this approach.</p>
<p><strong>Books</strong> recommend this approach. "Regular Expressions Cookbook"<sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup> section 9.9 explicitly recommends <code>/&lt;!--[\s\S]*?--&gt;/</code> for matching XML comments.</p>
<p><strong>StackOverflow Answers</strong> recommend this regular expression and variants such as <code>/&lt;!--[\s\S\n]*?--&gt;/</code> (which are, for all practical purposes, equivalent).</p>
<p><strong>ChatGPT4</strong> has recommended the previous regular expression. It also generated code for a complete unrelated tag.</p>
<p><strong>Bing AI</strong> proposed unrelated command line tools for JavaScript.</p>
<details><summary><b>ChatGPT4 and Bing AI Screenshots</b> (click to show)</summary>
<p><em>ChatGPT4 Incorrect interpretation</em></p>
<p><img src="gpt4.png" alt="ChatGPT incorrect interpretation"></p>
<p><em>ChatGPT4 Correct interpretation, solution uses vulnerable regular expression</em></p>
<p><img src="chatgpt.png" alt="ChatGPT correct interpretation"></p>
<p><em>Bing AI Correct Interpretation, solution uses vulnerable regular expression</em></p>
<p><img src="bing.png" alt="Bing AI correct interpretation"></p>
</details>
<h2 id="the-internet-failed-us">The Internet Failed Us </h2>
<p>There are deep performance issues with the regular expression. To see this, consider a string that repeats the header part <code>&lt;!--</code> many times. In general, this type of string can be generated in JavaScript using <code>String#repeat</code>:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code><span class="token keyword keyword-var">var</span> string_repeated_65536_times <span class="token operator">=</span> <span class="token string">"&lt;!--"</span><span class="token punctuation">.</span><span class="token method function property-access">repeat</span><span class="token punctuation">(</span><span class="token number">65536</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code></pre><p>The <code>replace</code> operation is surprisingly slow. Try the following snippet in a new browser window or NodeJS terminal:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code><span class="token comment">// this loop doubles each time</span>
<span class="token keyword control-flow keyword-for">for</span><span class="token punctuation">(</span><span class="token keyword keyword-var">var</span> n <span class="token operator">=</span> <span class="token number">64</span><span class="token punctuation">;</span> n <span class="token operator">&lt;</span> <span class="token number">1000000</span><span class="token punctuation">;</span> n<span class="token operator">*=</span><span class="token number">2</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token keyword keyword-var">var</span> s <span class="token operator">=</span> <span class="token string">"&lt;!--"</span><span class="token punctuation">.</span><span class="token method function property-access">repeat</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// generate repeated string</span>
<span class="token console class-name">console</span><span class="token punctuation">.</span><span class="token method function property-access">time</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span><span class="token punctuation">;</span>
s<span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--([\s\S]*?)--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">mg</span></span><span class="token punctuation">,</span><span class="token string">""</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// replace</span>
<span class="token console class-name">console</span><span class="token punctuation">.</span><span class="token method function property-access">timeEnd</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
</code></pre><p>Results are from local tests on a 2019 Intel i9 MacBook Pro. The following chart displays runtime in seconds (vertical axis) as a function of repetitions (horizontal axis). The quadratic trend line closely fits the data.</p>
<img src="./data/js.png" style="max-height:200px" alt="javascript performance test - quadratic complexity">
<p><a href="./data/js.csv">Download the raw data as a CSV</a></p>
<p>When the number of repetitions doubled, the runtime roughly quadrupled. This is a "quadratic" relationship.</p>
<h3 id="why-the-regular-expression-is-slow">Why the regular expression is slow </h3>
<p>The regular expression matches a string that starts with <code>&lt;!--</code> and ends with <code>--&gt;</code>. Consider a function that repeatedly looks for the <code>&lt;!--</code> string and tries to find the first <code>--&gt;</code> that appears afterwards. Computer scientists classify this algorithm as "Backtracking"<sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup>:</p>
<pre data-role="codeBlock" data-info="js {.line-numbers}" class="language-javascript js line-numbers"><code><span class="token keyword keyword-function">function</span> <span class="token function">match_all_regex_comments</span><span class="token punctuation">(</span><span class="token parameter">str</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token keyword keyword-const">const</span> results <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
<span class="token comment">/* look for the first instance of &lt;!-- */</span>
<span class="token keyword keyword-let">let</span> start_index <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">indexOf</span><span class="token punctuation">(</span><span class="token string">"&lt;!--"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">/* this loop runs while start_index is valid */</span>
<span class="token keyword control-flow keyword-while">while</span><span class="token punctuation">(</span>start_index <span class="token operator">&gt;</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token comment">/* look for the first instance of --&gt; after &lt;!-- */</span>
<span class="token keyword keyword-let">let</span> end_index <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">indexOf</span><span class="token punctuation">(</span><span class="token string">"--&gt;"</span><span class="token punctuation">,</span> start_index <span class="token operator">+</span> <span class="token number">4</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">/* if --&gt; is found, then we have a match! */</span>
<span class="token keyword control-flow keyword-if">if</span><span class="token punctuation">(</span>end_index <span class="token operator">&gt;</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token comment">/* add to array */</span>
results<span class="token punctuation">.</span><span class="token method function property-access">push</span><span class="token punctuation">(</span>str<span class="token punctuation">.</span><span class="token method function property-access">slice</span><span class="token punctuation">(</span>start_index<span class="token punctuation">,</span> end_index <span class="token operator">+</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">/* start scanning from the end of the `--&gt;` */</span>
start_index <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">indexOf</span><span class="token punctuation">(</span><span class="token string">"&lt;!--"</span><span class="token punctuation">,</span> end_index <span class="token operator">+</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token keyword control-flow keyword-else">else</span> <span class="token punctuation">{</span>
<span class="token comment">/* jump to the next potential starting point */</span>
start_index <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">indexOf</span><span class="token punctuation">(</span><span class="token string">"&lt;!--"</span><span class="token punctuation">,</span> start_index <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
<span class="token comment">/* return the final list */</span>
<span class="token keyword control-flow keyword-return">return</span> results<span class="token punctuation">;</span>
<span class="token punctuation">}</span>
</code><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></pre><div class="admonition info">
<p class="admonition-title">Optimization</p>
<p>The keen-eyed reader will notice that the loop can be terminated once the search for <code>--&gt;</code> fails (line 25 should be <code>break;</code>).</p>
<p><strong>Engines designed for JavaScript regular expressions do not currently perform this optimization.</strong></p>
<p>It can be shown that the runtime complexity of the modified algorithm is <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">Θ</mi><mo stretchy="false">(</mo><mi>L</mi><mo>+</mo><mi>M</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\Theta(L+M)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">Θ</span><span class="mopen">(</span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mclose">)</span></span></span></span> where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>L</mi></mrow><annotation encoding="application/x-tex">L</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal">L</span></span></span></span> is the string length and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi></mrow><annotation encoding="application/x-tex">M</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span></span></span></span> is the number of matches</p>
</div>
<p>If <code>--&gt;</code> is not in the string, the scan <code>str.indexOf("--&gt;", start_index + 4)</code> will look at every character in the string starting from <code>start_index + 4</code>. In the worst case, with repeated <code>&lt;!--</code>, the scan will start from index <code>4</code>, then index <code>8</code>, then index <code>12</code>, etc.</p>
<p>The following diagram shows the first three scans when running the function against the string formed by repeating <code>&lt;!--</code> 5 times. The <code>&lt;!--</code> matches are highlighted in yellow and the scans for the <code>--&gt;</code> are highlighted in blue.</p>
<pre><span style="background-color: #FFFF00">&lt;!--</span><span style="background-color: rgb(0,0,255,0.125)">&lt;!--&lt;!--&lt;!--&lt;!--</span>
^^^^ (first match of &lt;!-- 0 - 3)
............ (scan for --&gt; from index 4 to end) L - 4 characters
&lt;!--<span style="background-color: #FFFF00">&lt;!--</span><span style="background-color: rgb(0,0,255,0.125)">&lt;!--&lt;!--&lt;!--</span>
^^^^ (second match of &lt;!-- 4 - 7)
........ (scan for --&gt; from index 8 to end) L - 8 characters
&lt;!--&lt;!--<span style="background-color: #FFFF00">&lt;!--</span><span style="background-color: rgb(0,0,255,0.125)">&lt;!--&lt;!--</span>
^^^^ (third match of &lt;!-- 8 - 11)
.... (scan for --&gt; from index 12 to end) L - 12 characters
</pre>
<p>For <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>N</mi></mrow><annotation encoding="application/x-tex">N</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span></span></span></span> repetitions of <code>&lt;!--</code>, the total string length is <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>L</mi><mo>=</mo><mn>4</mn><mo></mo><mi>N</mi></mrow><annotation encoding="application/x-tex">L = 4 * N</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span></span></span></span>. There will be <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>N</mi></mrow><annotation encoding="application/x-tex">N</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span></span></span></span> matches.</p>
<details><summary><b>Mathematical Analysis</b> (click to show)</summary>
<p>The first scan will start on character 4 (end of the first match) and inspect <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>L</mi><mo></mo><mn>4</mn></mrow><annotation encoding="application/x-tex">L-4</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em;"></span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">4</span></span></span></span> characters (to the end of the string).</p>
<p>The second scan will start on character <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>2</mn><mo></mo><mn>4</mn><mo>=</mo><mn>8</mn></mrow><annotation encoding="application/x-tex">2 * 4 = 8</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">2</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">8</span></span></span></span> and inspect <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>L</mi><mo></mo><mn>8</mn></mrow><annotation encoding="application/x-tex">L-8</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em;"></span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">8</span></span></span></span> characters.</p>
<p>In general, the <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span></span></span></span>-th scan will start on character <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>4</mn><mo></mo><mi>K</mi></mrow><annotation encoding="application/x-tex">4 * K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span></span></span></span> and inspect <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>L</mi><mo></mo><mn>4</mn><mo></mo><mi>K</mi></mrow><annotation encoding="application/x-tex">L - 4*K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em;"></span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span></span></span></span> characters.</p>
<p>The total number of characters scanned when looking for the end tag (line 11 in the code) is:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.16em" columnalign="right left" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi>S</mi><mi>c</mi><mi>a</mi><mi>n</mi><mi>n</mi><mi>e</mi><mi>d</mi></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mo>=</mo><mo stretchy="false">(</mo><mi>L</mi><mo></mo><mn>4</mn><mo stretchy="false">)</mo><mo>+</mo><mo stretchy="false">(</mo><mi>L</mi><mo></mo><mn>4</mn><mo></mo><mn>2</mn><mo stretchy="false">)</mo><mo>+</mo><mo></mo><mo>+</mo><mo stretchy="false">(</mo><mi>L</mi><mo></mo><mo stretchy="false">(</mo><mi>N</mi><mo></mo><mn>1</mn><mo stretchy="false">)</mo><mo></mo><mn>4</mn><mo stretchy="false">)</mo><mo>+</mo><mo stretchy="false">(</mo><mi>L</mi><mo></mo><mi>N</mi><mo></mo><mn>4</mn><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mo>=</mo><mi>N</mi><mo></mo><mi>L</mi><mo></mo><mn>4</mn><mo></mo><mn>1</mn><mo></mo><mn>4</mn><mo></mo><mn>2</mn><mo></mo><mo></mo><mo></mo><mn>4</mn><mo></mo><mo stretchy="false">(</mo><mi>N</mi><mo></mo><mn>1</mn><mo stretchy="false">)</mo><mo></mo><mn>4</mn><mo></mo><mi>N</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mo>=</mo><mi>N</mi><mo></mo><mi>L</mi><mo></mo><mn>4</mn><mo></mo><mstyle scriptlevel="0" displaystyle="true"><munderover><mo></mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>N</mi></munderover><mi>i</mi></mstyle></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mo>=</mo><mn>4</mn><mo></mo><msup><mi>N</mi><mn>2</mn></msup><mo></mo><mn>4</mn><mo></mo><mstyle displaystyle="true" scriptlevel="0"><mfrac><mrow><mi>N</mi><mo></mo><mo stretchy="false">(</mo><mi>N</mi><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><mn>2</mn></mfrac></mstyle></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mo>=</mo><mn>4</mn><mo></mo><msup><mi>N</mi><mn>2</mn></msup><mo></mo><mn>2</mn><mo></mo><msup><mi>N</mi><mn>2</mn></msup><mo></mo><mn>2</mn><mo></mo><mi>N</mi><mo>=</mo><mstyle displaystyle="true" scriptlevel="0"><mfrac><msup><mi>L</mi><mn>2</mn></msup><mn>8</mn></mfrac></mstyle><mo></mo><mstyle displaystyle="true" scriptlevel="0"><mfrac><mi>L</mi><mn>2</mn></mfrac></mstyle></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{array}{rl}
Scanned &amp;= (L-4) + (L-4\cdot 2) + \ldots + (L-(N-1)\cdot 4) + (L - N\cdot 4) \\
&amp;= N\cdot L - 4 \cdot 1 - 4 \cdot 2 - \ldots - 4 \cdot (N-1) - 4 \cdot N \\
&amp;= N\cdot L - 4 \cdot \displaystyle\sum_{i=1}^{N} i \\
&amp;= 4\cdot N^2 - 4 \cdot \dfrac{N \cdot (N+1)}{2} \\
&amp;= 4 \cdot N^2 - 2 \cdot N^2 - 2 \cdot N = \dfrac{L^2}{8} - \dfrac{L}{2}
\end{array}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:9.7961em;vertical-align:-4.6481em;"></span><span class="mord"><span class="mtable"><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:5.1481em;"><span style="top:-8.1364em;"><span class="pstrut" style="height:3.8283em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05764em;">S</span><span class="mord mathnormal">c</span><span class="mord mathnormal">ann</span><span class="mord mathnormal">e</span><span class="mord mathnormal">d</span></span></span><span style="top:-6.9364em;"><span class="pstrut" style="height:3.8283em;"></span><span class="mord"></span></span><span style="top:-4.7481em;"><span class="pstrut" style="height:3.8283em;"></span><span class="mord"></span></span><span style="top:-2.0434em;"><span class="pstrut" style="height:3.8283em;"></span><span class="mord"></span></span><span style="top:0.1337em;"><span class="pstrut" style="height:3.8283em;"></span><span class="mord"></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:4.6481em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:5.1481em;"><span style="top:-8.1364em;"><span class="pstrut" style="height:3.8283em;"></span><span class="mord"><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mopen">(</span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mopen">(</span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">2</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="minner"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mopen">(</span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">1</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mopen">(</span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span><span class="mclose">)</span></span></span><span style="top:-6.9364em;"><span class="pstrut" style="height:3.8283em;"></span><span class="mord"><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">2</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="minner"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">1</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span></span></span><span style="top:-4.7481em;"><span class="pstrut" style="height:3.8283em;"></span><span class="mord"><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.8283em;"><span style="top:-1.8723em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span style="top:-3.05em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op"></span></span></span><span style="top:-4.3em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.10903em;">N</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:1.2777em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">i</span></span></span><span style="top:-2.0434em;"><span class="pstrut" style="height:3.8283em;"></span><span class="mord"><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.427em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">1</span><span class="mclose">)</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:0.1337em;"><span class="pstrut" style="height:3.8283em;"></span><span class="mord"><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">2</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">2</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.4911em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">8</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin"></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3603em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathnormal">L</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:4.6481em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span></span></span></span></span></span></span></p>
</details>
<p>In the worst case, the number of characters scanned is roughly proportional to the square of the length of the string. In "Big-O Notation", the complexity is <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><msup><mi>L</mi><mn>2</mn></msup><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(L^2)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0641em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">O</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>. This is colloquially described as a "quadratic blowup".</p>
<h3 id="vulnerability">Vulnerability </h3>
<p>This is generally considered a vulnerability since relatively small data can cause browsers or servers to freeze for extended periods of time.</p>
<p>The official category for this weakness is "CWE-1333"<sup class="footnote-ref"><a href="#fn5" id="fnref5">[5]</a></sup> "Inefficient Regular Expression Complexity".</p>
<p>Some resources use the phrase "Catastrophic backtracking" to describe the issue.</p>
<h3 id="a-side-note-about-rust">A side note about Rust </h3>
<p>Everyone writes high-performance code in Rust, right?</p>
<p>Rust does not have built-in support for regular expressions. Third-party libraries fill the gap.</p>
<p>The Rust <code>regress</code><sup class="footnote-ref"><a href="#fn6" id="fnref6">[6]</a></sup> crate is designed for JavaScript regular expressions. It represents a true apples-to-apples comparison with JavaScript.</p>
<pre data-role="codeBlock" data-info="rust" class="language-rust rust"><code> <span class="token keyword keyword-let">let</span> re <span class="token operator">=</span> <span class="token namespace">regress<span class="token punctuation">::</span></span><span class="token class-name">Regex</span><span class="token punctuation">::</span><span class="token function">new</span><span class="token punctuation">(</span><span class="token string">r"&lt;!--([\s\S]*?)--&gt;"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">unwrap</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> <span class="token keyword keyword-mut">mut</span> <span class="token keyword keyword-str">str</span> <span class="token operator">=</span> <span class="token string">"&lt;!--&lt;!--&lt;!--"</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> _match <span class="token operator">=</span> re<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token keyword keyword-str">str</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code></pre><details><summary><b>Complete Example</b> (click to show)</summary>
<pre data-role="codeBlock" data-info="rust {.line-numbers}" class="language-rust rust line-numbers"><code><span class="token keyword keyword-fn">fn</span> <span class="token function-definition function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token keyword keyword-let">let</span> re <span class="token operator">=</span> <span class="token namespace">regress<span class="token punctuation">::</span></span><span class="token class-name">Regex</span><span class="token punctuation">::</span><span class="token function">new</span><span class="token punctuation">(</span><span class="token string">r"&lt;!--([\s\S]*?)--&gt;"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">unwrap</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">/* construct string by repeating with itself */</span>
<span class="token keyword keyword-let">let</span> <span class="token keyword keyword-mut">mut</span> <span class="token keyword keyword-str">str</span> <span class="token operator">=</span> <span class="token string">"&lt;!--"</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> <span class="token keyword keyword-mut">mut</span> _str <span class="token operator">=</span> <span class="token macro property">format!</span><span class="token punctuation">(</span><span class="token string">"{}{}"</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> <span class="token keyword keyword-mut">mut</span> rept<span class="token punctuation">:</span> <span class="token keyword keyword-u64">u64</span> <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">;</span>
<span class="token keyword keyword-for">for</span> _i <span class="token keyword keyword-in">in</span> <span class="token number">1</span><span class="token punctuation">..</span><span class="token number">8</span> <span class="token punctuation">{</span>
_str <span class="token operator">=</span> <span class="token macro property">format!</span><span class="token punctuation">(</span><span class="token string">"{}{}"</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-str">str</span> <span class="token operator">=</span> _str<span class="token punctuation">.</span><span class="token function">as_str</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
rept <span class="token operator">*=</span> <span class="token number">2</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token keyword keyword-for">for</span> _j <span class="token keyword keyword-in">in</span> <span class="token number">1</span><span class="token punctuation">..</span><span class="token number">11</span> <span class="token punctuation">{</span>
<span class="token comment">/* test regular expression against string */</span>
<span class="token keyword keyword-let">let</span> start_time <span class="token operator">=</span> <span class="token namespace">std<span class="token punctuation">::</span>time<span class="token punctuation">::</span></span><span class="token class-name">Instant</span><span class="token punctuation">::</span><span class="token function">now</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> _caps <span class="token operator">=</span> re<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token keyword keyword-str">str</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> elapsed_time <span class="token operator">=</span> start_time<span class="token punctuation">.</span><span class="token function">elapsed</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token macro property">println!</span><span class="token punctuation">(</span><span class="token string">"{}: {:?}"</span><span class="token punctuation">,</span> rept<span class="token punctuation">,</span> elapsed_time<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">/* double string length by repeating with itself */</span>
_str <span class="token operator">=</span> <span class="token macro property">format!</span><span class="token punctuation">(</span><span class="token string">"{}{}"</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-str">str</span> <span class="token operator">=</span> _str<span class="token punctuation">.</span><span class="token function">as_str</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
rept <span class="token operator">*=</span> <span class="token number">2</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
</code><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></pre></details>
<p>Results are from local tests on a 2019 Intel i9 MacBook Pro. <code>regress</code> shows the same quadratic behavior as other JavaScript regular expression engines.</p>
<img src="./data/regress.png" style="max-height:200px" alt="rust regress performance test - quadratic complexity">
<p><a href="./data/regress.csv">Download the raw data as a CSV</a></p>
<h2 id="workarounds">Workarounds </h2>
<p>There are a few general approaches to address the issue.</p>
<h3 id="use-a-different-engine">Use a Different Engine </h3>
<p>By limiting the supported featureset, other regular expression engines have stricter performance guarantees.</p>
<h4 id="nodejs">NodeJS </h4>
<p>The <code>re2</code><sup class="footnote-ref"><a href="#fn7" id="fnref7">[7]</a></sup> C++ engine sacrifices backreference and lookaround support for performance. There are bindings for many server-side programming languages.</p>
<p>The <code>re2</code><sup class="footnote-ref"><a href="#fn8" id="fnref8">[8]</a></sup> NodeJS package is a native binding to the C++ engine and can be used in server-side environments. With modern versions of NodeJS, normal regular expressions can be wrapped with <code>RE2</code>:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code><span class="token keyword keyword-var">var</span> out <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span><span class="token keyword keyword-new">new</span> <span class="token class-name">RE2</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--([\s\S]*?)--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">mg</span></span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token string">""</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// replace</span>
</code></pre><details><summary><b>Complete Example</b> (click to show)</summary>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code><span class="token keyword keyword-var">var</span> <span class="token constant">RE2</span> <span class="token operator">=</span> <span class="token function">require</span><span class="token punctuation">(</span><span class="token string">"re2"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">// this loop doubles each time</span>
<span class="token keyword control-flow keyword-for">for</span><span class="token punctuation">(</span><span class="token keyword keyword-var">var</span> n <span class="token operator">=</span> <span class="token number">64</span><span class="token punctuation">;</span> n <span class="token operator">&lt;</span> <span class="token number">100000000</span><span class="token punctuation">;</span> n<span class="token operator">*=</span><span class="token number">2</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token keyword keyword-var">var</span> s <span class="token operator">=</span> <span class="token string">"&lt;!--"</span><span class="token punctuation">.</span><span class="token method function property-access">repeat</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// generate repeated string</span>
<span class="token console class-name">console</span><span class="token punctuation">.</span><span class="token method function property-access">time</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span><span class="token punctuation">;</span>
s<span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span><span class="token keyword keyword-new">new</span> <span class="token class-name">RE2</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--([\s\S]*?)--&gt;</span><span class="token regex-delimiter">/</span><span class="token regex-flags">mg</span></span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token string">""</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// replace</span>
<span class="token console class-name">console</span><span class="token punctuation">.</span><span class="token method function property-access">timeEnd</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
</code></pre></details>
<p>The <code>re2</code> implementation uses algorithms whose performance scales linearly with the size of the input.</p>
<img src="./data/re2.png" style="max-height:200px" alt="nodejs re2 performance test - linear complexity">
<p><a href="./data/re2.csv">Download the raw data as a CSV</a></p>
<h4 id="rust">Rust </h4>
<p>The Rust <code>regex</code><sup class="footnote-ref"><a href="#fn9" id="fnref9">[9]</a></sup> crate sacrifices support for performance. It is the same tradeoff made by the <code>re2</code> engine.</p>
<p>Since it does not use lookaround or backreferences, the original regular expression is compatible with the <code>regex</code> crate:</p>
<pre data-role="codeBlock" data-info="rust" class="language-rust rust"><code> <span class="token keyword keyword-let">let</span> re <span class="token operator">=</span> <span class="token namespace">regex<span class="token punctuation">::</span></span><span class="token class-name">Regex</span><span class="token punctuation">::</span><span class="token function">new</span><span class="token punctuation">(</span><span class="token string">r"&lt;!--([\s\S]*?)--&gt;"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">unwrap</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> <span class="token keyword keyword-mut">mut</span> <span class="token keyword keyword-str">str</span> <span class="token operator">=</span> <span class="token string">"&lt;!--&lt;!--&lt;!--"</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> _match <span class="token operator">=</span> re<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token keyword keyword-str">str</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code></pre><details><summary><b>Complete Example</b> (click to show)</summary>
<pre data-role="codeBlock" data-info="rust" class="language-rust rust"><code><span class="token keyword keyword-fn">fn</span> <span class="token function-definition function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token keyword keyword-let">let</span> re <span class="token operator">=</span> <span class="token namespace">regex<span class="token punctuation">::</span></span><span class="token class-name">Regex</span><span class="token punctuation">::</span><span class="token function">new</span><span class="token punctuation">(</span><span class="token string">r"&lt;!--([\s\S]*?)--&gt;"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">unwrap</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">/* construct string by repeating with itself */</span>
<span class="token keyword keyword-let">let</span> <span class="token keyword keyword-mut">mut</span> <span class="token keyword keyword-str">str</span> <span class="token operator">=</span> <span class="token string">"&lt;!--"</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> <span class="token keyword keyword-mut">mut</span> _str <span class="token operator">=</span> <span class="token macro property">format!</span><span class="token punctuation">(</span><span class="token string">"{}{}"</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> <span class="token keyword keyword-mut">mut</span> rept<span class="token punctuation">:</span> <span class="token keyword keyword-u64">u64</span> <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">;</span>
<span class="token keyword keyword-for">for</span> _i <span class="token keyword keyword-in">in</span> <span class="token number">1</span><span class="token punctuation">..</span><span class="token number">25</span> <span class="token punctuation">{</span>
_str <span class="token operator">=</span> <span class="token macro property">format!</span><span class="token punctuation">(</span><span class="token string">"{}{}"</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-str">str</span> <span class="token operator">=</span> _str<span class="token punctuation">.</span><span class="token function">as_str</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
rept <span class="token operator">*=</span> <span class="token number">2</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token keyword keyword-for">for</span> _j <span class="token keyword keyword-in">in</span> <span class="token number">1</span><span class="token punctuation">..</span><span class="token number">11</span> <span class="token punctuation">{</span>
<span class="token comment">/* find all matches */</span>
<span class="token keyword keyword-let">let</span> start_time <span class="token operator">=</span> <span class="token namespace">std<span class="token punctuation">::</span>time<span class="token punctuation">::</span></span><span class="token class-name">Instant</span><span class="token punctuation">::</span><span class="token function">now</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> _caps <span class="token operator">=</span> re<span class="token punctuation">.</span><span class="token function">captures</span><span class="token punctuation">(</span><span class="token keyword keyword-str">str</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-let">let</span> elapsed_time <span class="token operator">=</span> start_time<span class="token punctuation">.</span><span class="token function">elapsed</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token macro property">println!</span><span class="token punctuation">(</span><span class="token string">"{}: {:?}"</span><span class="token punctuation">,</span> rept<span class="token punctuation">,</span> elapsed_time<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">/* double string length by repeating with itself */</span>
_str <span class="token operator">=</span> <span class="token macro property">format!</span><span class="token punctuation">(</span><span class="token string">"{}{}"</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">,</span> <span class="token keyword keyword-str">str</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword keyword-str">str</span> <span class="token operator">=</span> _str<span class="token punctuation">.</span><span class="token function">as_str</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
rept <span class="token operator">*=</span> <span class="token number">2</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
</code></pre></details>
<p>The Rust <code>regex</code> implementation uses algorithms whose performance scales linearly with the size of the input.</p>
<img src="./data/regex.png" style="max-height:200px" alt="rust regex performance test - linear complexity">
<p><a href="./data/regex.csv">Download the raw data as a CSV</a></p>
<h3 id="exogenous-constraints">Exogenous Constraints </h3>
<p>Most problems have additional constraints. Addressing the constraints allow for more precise regular expressions with better performance.</p>
<p>For the problem of matching comments, various specifications impose limitations.</p>
<h4 id="xml-comments">XML Comments </h4>
<p>The XML 1.0 specification<sup class="footnote-ref"><a href="#fn10" id="fnref10">[10]</a></sup> disallows <code>--</code> within comments.</p>
<pre data-role="codeBlock" data-info="" class="language-text"><code>(this comment is not valid in XML 1.0)
&lt;!-- - .... .. ... / -.-. --- -- -- . -. - / .. ... / .. -. ...- .- .-.. .. -.. --&gt;
</code></pre><p><a href="https://github.com/prettier/prettier/blob/ff83d55d05e92ceef10ec0cb1c0272ab894a00a0/src/language-markdown/mdx.js#L28">PrettierJS</a> uses a regular expression in the MDX parser that enforces the XML constraint:</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code><span class="token keyword keyword-const">const</span> <span class="token constant">COMMENT_REGEX</span> <span class="token operator">=</span> <span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!----&gt;|&lt;!---?[^&gt;-](?:-?[^-])*--&gt;</span><span class="token regex-delimiter">/</span></span><span class="token punctuation">;</span>
</code></pre><p>Commonly-used regular expression engines can optimize for this pattern and avoid backtracking.</p>
<div class="admonition info">
<p class="admonition-title">Spreadsheet Engines</p>
<p>The XML parser in Excel powering the <a href="https://docs.sheetjs.com/docs/miscellany/formats/#excel-2007-xml-xlsxxlsm">Excel Workbook (XLSX) format</a> expects proper XML comments with no <code>--</code> in the comment body.</p>
<p>The XML parser in Excel powering the <a href="https://docs.sheetjs.com/docs/miscellany/formats#excel-2003-2004-spreadsheetml">Excel 2003-2004 (SpreadsheetML) format</a> allows <code>--</code> in the comment body.</p>
</div>
<h4 id="html-comments">HTML Comments </h4>
<p>The HTML5 standard<sup class="footnote-ref"><a href="#fn11" id="fnref11">[11]</a></sup> permits <code>--</code> but forbids <code>&lt;!--</code> within comment text. For example, the following comment is not valid according to the standard:</p>
<pre class="language-text">&lt;!-- I used to be a programmer like you, then I took an <span style="text-decoration-line:underline;text-decoration-color:red;text-decoration-style:wavy;">&lt;!--</span> in the Kleene --&gt;
</pre>
<p><a href="https://github.com/yt-dlp/yt-dlp/blob/95e82347b398d8bb160767cdd975edecd62cbabd/yt_dlp/extractor/common.py#L1709">yt-dlp</a> uses a regular expression with a negative lookahead to ensure <code>&lt;!--</code> does not appear in the body:</p>
<pre data-role="codeBlock" data-info="python" class="language-python python"><code> html <span class="token operator">=</span> re<span class="token punctuation">.</span>sub<span class="token punctuation">(</span><span class="token string">r'&lt;!--(?:(?!&lt;!--).)*--&gt;'</span><span class="token punctuation">,</span> <span class="token string">''</span><span class="token punctuation">,</span> html<span class="token punctuation">)</span>
</code></pre><p>This expression allows <code>--</code> but disallows <code>&lt;!--</code> in the comment body. In practice, it will match comments starting from the innermost <code>&lt;!--</code>. Using the previous example:</p>
<pre class="language-text">&lt;!-- I used to be a programmer like you, then I took an <span style="background-color: #FFFF00">&lt;!-- in the Kleene --&gt;</span>
</pre>
<div class="admonition info">
<p class="admonition-title">Web Browsers</p>
<p>Web browsers generally allow <code>&lt;!--</code> in comments. Text between the first <code>&lt;!--</code> and the first <code>--&gt;</code> are treated as a comment. For example, consider the following HTML:</p>
<pre data-role="codeBlock" data-info="html" class="language-html html"><code><span class="token tag"><span class="token tag"><span class="token punctuation">&lt;</span>pre</span><span class="token punctuation">&gt;</span></span><span class="token tag"><span class="token tag"><span class="token punctuation">&lt;</span>!--</span> <span class="token attr-name">this</span> <span class="token attr-name">is</span> <span class="token attr-name">a</span> <span class="token attr-name">nested</span> <span class="token attr-name">comment</span> <span class="token attr-name">&lt;!--</span> <span class="token attr-name">--</span><span class="token punctuation">&gt;</span></span> --&gt; more text<span class="token tag"><span class="token tag"><span class="token punctuation">&lt;/</span>pre</span><span class="token punctuation">&gt;</span></span>
| |^^^^^^^^^^^^^^ --- content
| this is interpreted as a comment |
</code></pre><p>This exact HTML code is added below:</p>
<pre><!-- this is a nested comment <!-- --> --&gt; more text</pre>
<p>Chromium and other browsers will display <code>--&gt; more text</code></p>
</div>
<h3 id="remove-the-regular-expression">Remove the Regular Expression </h3>
<p>Regular expression operations can be reimplemented using standard string operations.</p>
<p>For example, the replacement</p>
<pre data-role="codeBlock" data-info="js" class="language-javascript js"><code>str <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">replace</span><span class="token punctuation">(</span><span class="token regex"><span class="token regex-delimiter">/</span><span class="token regex-source language-regex">&lt;!--([\s\S]*?)--&gt;</span><span class="token regex-delimiter">/</span></span><span class="token punctuation">,</span> <span class="token string">""</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
</code></pre><p>can be rewritten with a loop. The core idea is to collect non-commented fragments:</p>
<pre data-role="codeBlock" data-info="js {.line-numbers}" class="language-javascript js line-numbers"><code><span class="token keyword keyword-function">function</span> <span class="token function">remove_xml_comments</span><span class="token punctuation">(</span><span class="token parameter">str</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token keyword keyword-const">const</span> <span class="token constant">START</span> <span class="token operator">=</span> <span class="token string">"&lt;!--"</span><span class="token punctuation">,</span> <span class="token constant">END</span> <span class="token operator">=</span> <span class="token string">"--&gt;"</span><span class="token punctuation">;</span>
<span class="token keyword keyword-const">const</span> results <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
<span class="token comment">/* this index tracks the last analyzed character */</span>
<span class="token keyword keyword-let">let</span> last_index <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span>
<span class="token comment">/* look for the first instance of &lt;!-- */</span>
<span class="token keyword keyword-let">let</span> start_index <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">indexOf</span><span class="token punctuation">(</span><span class="token constant">START</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">/* this loop runs while start_index is valid */</span>
<span class="token keyword control-flow keyword-while">while</span><span class="token punctuation">(</span>start_index <span class="token operator">&gt;</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token comment">/* add the fragment that precedes the comment */</span>
results<span class="token punctuation">.</span><span class="token method function property-access">push</span><span class="token punctuation">(</span>str<span class="token punctuation">.</span><span class="token method function property-access">slice</span><span class="token punctuation">(</span>last_index<span class="token punctuation">,</span> start_index<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
last_index <span class="token operator">=</span> start_index<span class="token punctuation">;</span>
<span class="token comment">/* look for the first instance of --&gt; after &lt;!-- */</span>
<span class="token keyword keyword-let">let</span> end_index <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">indexOf</span><span class="token punctuation">(</span><span class="token constant">END</span><span class="token punctuation">,</span> start_index <span class="token operator">+</span> <span class="token constant">START</span><span class="token punctuation">.</span><span class="token property-access">length</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">/* if --&gt; is found, then we have a match! */</span>
<span class="token keyword control-flow keyword-if">if</span><span class="token punctuation">(</span>end_index <span class="token operator">&gt;</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token comment">/* skip the comment */</span>
last_index <span class="token operator">=</span> end_index <span class="token operator">+</span> <span class="token constant">END</span><span class="token punctuation">.</span><span class="token property-access">length</span><span class="token punctuation">;</span>
<span class="token comment">/* search for next comment open tag */</span>
start_index <span class="token operator">=</span> str<span class="token punctuation">.</span><span class="token method function property-access">indexOf</span><span class="token punctuation">(</span><span class="token constant">START</span><span class="token punctuation">,</span> last_index<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token comment">/* if there is no end comment tag, stop processing */</span>
<span class="token keyword control-flow keyword-else">else</span> <span class="token keyword control-flow keyword-break">break</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token comment">/* add remaining part of string */</span>
results<span class="token punctuation">.</span><span class="token method function property-access">push</span><span class="token punctuation">(</span>str<span class="token punctuation">.</span><span class="token method function property-access">slice</span><span class="token punctuation">(</span>last_index<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">/* concatenate the fragments */</span>
<span class="token keyword control-flow keyword-return">return</span> results<span class="token punctuation">.</span><span class="token method function property-access">join</span><span class="token punctuation">(</span><span class="token string">""</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
</code><span aria-hidden="true" class="line-numbers-rows"><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span><span></span></span></pre><h3 id="validate-data">Validate Data </h3>
<p>In the places where ViteJS used the vulnerable regular expression, the text was validated using a separate HTML parser.</p>
<p>It is still strongly recommended to replace the regular expression.</p>
<h3 id="limit-to-trusted-data">Limit to Trusted Data </h3>
<p>PrettierJS and RollupJS use the vulnerable regular expression in internal scripts. The expressions are not used or added in websites. The data sources are trusted and malformed data can be corrected manually.</p>
<h2 id="special-thanks">Special Thanks </h2>
<p>Special thanks to <a href="https://asadbek.dev/">Asadbek</a>, <a href="http://francoatmega.com/">Jardel</a>, and members of the <a href="https://sheetjs.com">SheetJS team</a> for early feedback.</p>
<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p>See <a href="https://www.w3.org/TR/REC-xml/#sec-origin-goals">"Origin and Goals"</a> in the Extensible Markup Language (XML) 1.0 specification. <a href="#fnref1" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn2" class="footnote-item"><p>The theoretical underpinnings of modern regular expressions were established in the working paper <a href="https://www.rand.org/content/dam/rand/pubs/research_memoranda/2008/RM704.pdf">"Representation of Events in Nerve Nets and Finite Automata"</a> <a href="#fnref2" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn3" class="footnote-item"><p>See <a href="https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/ch09s10.html">"9.9 Remove XML-Style Comments"</a> on the official site for the book. <a href="#fnref3" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn4" class="footnote-item"><p>See <a href="https://en.wikipedia.org/wiki/Backtracking">the Wikipedia article for "Backtracking"</a> for more details and resources. <a href="#fnref4" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn5" class="footnote-item"><p>See <a href="https://cwe.mitre.org/data/definitions/1333.html">the definition in the "CWE List"</a> for more details and resources. <a href="#fnref5" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn6" class="footnote-item"><p>See <a href="https://crates.io/crates/regress">the listing for <code>regress</code> crate</a> for more details. <a href="#fnref6" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn7" class="footnote-item"><p>See <a href="https://github.com/google/re2">the <code>google/re2</code> project on GitHub</a> for more details. <a href="#fnref7" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn8" class="footnote-item"><p>See <a href="https://www.npmjs.com/package/re2">the listing for the <code>re2</code> NodeJS package</a> for more details. <a href="#fnref8" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn9" class="footnote-item"><p>See <a href="https://crates.io/crates/regex">the listing for <code>regex</code> crate</a> for more details. <a href="#fnref9" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn10" class="footnote-item"><p>See <a href="https://www.w3.org/TR/REC-xml/#sec-comments">"Comments"</a> in the XML 1.0 specification. <a href="#fnref10" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn11" class="footnote-item"><p>See <a href="https://html.spec.whatwg.org/multipage/syntax.html#comments">"Comments"</a> in the WHATWG HTML Living Standard. <a href="#fnref11" class="footnote-backref">↩︎</a></p>
</li>
</ol>
</section>
</div>
</body></html>