Markdig Extension – Bad Header Handler

First off, if you haven’t seen Markdig yet, you’re missing out! It has to be the most extensible Markdown processor I’ve ever seen, and it is still incredibly fast. It’s slim enough to confidently use on small .NET Clients like Xamarin, and supports custom output as well (not just HTML).

Because of it’s flexibility and componentization, we are able to customize it without sacrificing performance using their “Extension” framework. The extension we are talking about here is one that ideally would never exist, but solves the problem of malformed Markdown headers. How often do you see wrong headers with the missing space after the “#” in places like Github and WordPress?

Where

#My Header

Should be

# My Header

Well if you’re using Markdig and run into this issue, simply slap this extension into your processing pipeline and worry no more! It even works with a mix of good and bad headers.

Install

You can find it on NuGet or Clone it yourself from Github:

Usage

Add it to your pipeline that you use to parse:

var pipelineBuilder = new MarkdownPipelineBuilder();
pipelineBuilder = MarkdownExtensions.Use<BadHeadersExtension>(pipelineBuilder);
var pipeline = pipelineBuilder.Build();
var html = Markdown.ToHtml(BAD_HEADER_MARKDOWN, _pipeline);

Check out the unit tests in the source code to view a working example.

Source

The gist is a HeadingBlockParser and the Extension itself.

BadHeadingBlockParser.cs

<br />    /// <summary>
    /// Bad heading block parser. Does the same thing as the header parser, but doesn't require a space.
    /// Using a private class to ensure all markdown logic is contained within this service.
    /// </summary>
    public class BadHeadingBlockParser : HeadingBlockParser
    {
        /// <summary>
        /// The head char.
        /// </summary>
        private readonly char _headChar;

        /// <summary>
        /// Initializes a new instance of the <see cref="T:Markdig.BadHeaders.BadHeadingBlockParser"/> class.
        /// </summary>
        /// <param name="headChar">Head char.</param>
        public BadHeadingBlockParser(char headChar)
        {
            _headChar = headChar;
        }

        /// <summary>
        /// Overrides the TryOpen for the heading block parser to ignore the need for spaces
        /// </summary>
        /// <returns>The open.</returns>
        /// <param name="processor">Processor.</param>
        public override BlockState TryOpen(BlockProcessor processor)
        {
            // If we are in a CodeIndent, early exit
            if (processor.IsCodeIndent)
            {
                return BlockState.None;
            }

            // 4.2 ATX headings
            // An ATX heading consists of a string of characters, parsed as inline content, 
            // between an opening sequence of 1–6 unescaped # characters and an optional 
            // closing sequence of any number of unescaped # characters. The opening sequence 
            // of # characters must be followed by a space or by the end of line. The optional
            // closing sequence of #s must be preceded by a space and may be followed by spaces
            // only. The opening # character may be indented 0-3 spaces. The raw contents of 
            // the heading are stripped of leading and trailing spaces before being parsed as 
            // inline content. The heading level is equal to the number of # characters in the 
            // opening sequence.

            // We are not doing this ^^ we don't have the spaces... so we need to handle that adjusted logic here
            var column = processor.Column;
            var line = processor.Line;
            var sourcePosition = line.Start;
            var c = line.CurrentChar;
            var matchingChar = c;

            int leadingCount = 0;

            // get how many of the headChar we have and limit to 6 (h6 is the last handled header)
            while (c == _headChar && leadingCount <= 6)
            {
                if (c != matchingChar)
                {
                    break;
                }
                c = line.NextChar();
                leadingCount++;
            }

            // A space is NOT required after leading #
            if (leadingCount > 0 && leadingCount <= 6)
            {
                // Move to the content
                var headingBlock = new HeadingBlock(this)
                {
                    HeaderChar = matchingChar,
                    Level = leadingCount,
                    Column = column,
                    Span = { Start = sourcePosition }
                };
                processor.NewBlocks.Push(headingBlock);
                processor.GoToColumn(column + leadingCount); // no +1 - skip the space

                // Gives a chance to parse attributes
                if (TryParseAttributes != null)
                {
                    TryParseAttributes(processor, ref processor.Line, headingBlock);
                }

                // The optional closing sequence of #s must not be preceded by a space and may be followed by spaces only.
                int endState = 0;
                int countClosingTags = 0;
                for (int i = processor.Line.End; i >= processor.Line.Start; i--)  // Go up to Start in order to match the no space after the first ###
                {
                    c = processor.Line.Text[i];
                    if (endState == 0)
                    {
                        if (c.IsSpaceOrTab())
                        {
                            continue;
                        }
                        endState = 1;
                    }
                    if (endState == 1)
                    {
                        if (c == matchingChar)
                        {
                            countClosingTags++;
                            continue;
                        }

                        if (countClosingTags > 0)
                        {
                            if (c.IsSpaceOrTab())
                            {
                                processor.Line.End = i - 1;
                            }
                            break;
                        }
                        else
                        {
                            break;
                        }
                    }
                }

                // Setup the source end position of this element
                headingBlock.Span.End = processor.Line.End;

                // We expect a single line, so don't continue
                return BlockState.Break;
            }

            // Else we don't have an header
            return BlockState.None;
        }
    }

 
Then we use the Parser in the Extension:

BadHeadersExtension.cs

<br />    /// <summary>
    /// Markdig markdown extension for handling bad markdown for titles
    /// </summary>
    public class BadHeadersExtension : IMarkdownExtension
    {
        /// <summary>
        /// Sets up the extension to use the badheading block parser
        /// </summary>
        /// <returns>The setup.</returns>
        /// <param name="pipeline">Pipeline.</param>
        public void Setup(MarkdownPipelineBuilder pipeline)
        {
            if (!pipeline.BlockParsers.Contains<BadHeadingBlockParser>())
            {
                // Insert the parser before any other parsers and use '#' as the character identifier
                pipeline.BlockParsers.Insert(0, new BadHeadingBlockParser('#'));
            }
        }

        /// <summary>
        /// Not needed
        /// </summary>
        /// <returns>The setup.</returns>
        /// <param name="pipeline">Pipeline.</param>
        /// <param name="renderer">Renderer.</param>
        public void Setup(MarkdownPipeline pipeline, IMarkdownRenderer renderer)
        {
            // not needed
        }
    }

If you like what you see, don’t forget to follow me on twitter @Suave_Pirate, check out my GitHub, and subscribe to my blog to learn more mobile developer tips and tricks!

Interested in sponsoring developer content? Message @Suave_Pirate on twitter for details.

2 thoughts on “Markdig Extension – Bad Header Handler”

Leave a comment