Why Hashtags Can't Have Spaces

Share

Hashtags can't have spaces because the program that figures out where the hashtag starts and ends uses the hash sign (#) to figure out where the hashtag starts, and uses the space character ( ) to figure out where the hashtag ends. Since the space character means the hashtag has ended, we can't have a space character inside the hashtag.

The space character isn't the only hashtag-ending character. Hashtags also end when there is a newline character, which means that hashtags can't start in one line and continue in the next line if you insert a manual line break by pressing the Enter key.

Algorithm

The simplest algorithm to implement a hashtag parser would be to iterate over every single character to figure out if it's a hash sign, then switch modes and search for a space. The Javascript would be something like this:

for(let i = 0; i < textBody.length; i++) {
    const c = textBody.charAt(i);
    if(c === '#') {
        // hashtag starts here
        const startIndex = i;
        for(i++; i < textBody.length; i++) {
            const c2 = textBody.charAt(i);
            if(c2 === ' ') {
                break;
            }
        }

        // hashtag ends here
        const endIndex = i;

        // includes the hash sign, but not the space
        const hashtag = textBody.substring(startIndex, endIndex);
    }
}

I haven't tested this code, but in principle it should work.

The way it works is that we have a numeric indexing variable i, which goes from 0 to N, where N is the number of characters in the text.

For each index, we get which character is at position i. In Javascript this is done with the method chatAt.

First we check if the character is a hash sign. If no hash sign is found, the loop continues until the end of the text body without doing anything.

If a hash sign is found, we keep track of which index contained the hash sign and we start iterating again, this time looking for a space character. Once we find a space (or we run out of characters to iterate through), we remember which index we landed at.

The slice of text between the starting index and the ending index contains our hashtag text. Most programming languages provide an easy way to slice a text in parts with indexes like these. In Javascript, this is the substring method.

In practice hashtags don't end only when a space is found. A newline character ('\n'), a tab character ( '\t'), and various others could mark the end of the hashtag. Note that we also don't account for what would happen if the hashtag text contained two hash signs, e.g. #hash#hash#hash. Should this finish the current hashtag and start a new one? Or are hash signs inside hash tags valid? Or we ignore the previous starting point and start over? Or we ignore the whole thing? Different algorithms may deal with corner cases differently.

Written by Noel Santos.

About the Author

I'm a self-taught Brazilian programmer graduated in IT from a FATEC. In a world of increasingly complex and essential computers, I decided to use my technical expertise in hardware, desktop applications, and web technologies to create an informative resource to make PC's easier to understand.

View Comments