Basic Form:

The main chunk of symbols for Roman numerals ends at C = 100. Beyond that, we use the following 3 characters: ( | ). Normally these are written C, I, and Ↄ (backward C, Unicode #2183 (decimal)), but we don't use those here to avoid confusion with C=100 and I=1 and to keep things in ASCII.

1000 = (|)

10000 = ((|))

100000 = (((|)))

500 = |)

5000 = |))

50000 = |)))

500000 = |))))

444444 = (((|))) |)))) ((|)) |))) (|) |)) C |) X L I V

666666 = |)))) (((|))) |))) ((|)) |)) (|) |) C L X V I

In general, the symbol for 10^N has a balanced set of N-2 pairs of parentheses. 5*10^N has N-1 parentheses on the right. The design of 5*10^N is that it is visually "half" of 10^(N+1), reflecting its numerical relationship.

Incidentally, V is the top half of X, and L kind of looks like the bottom half of C.

It's a bit strange how a larger number gets a shorter symbol, and this size difference becomes more pronounced for larger numbers:

1000000 = ((((|))))

5000000 = |)))))

Sophistication 1: Instead of |), write D. Instead of (|), write M. One can see how the symbols look somewhat similar. For 1000, the similarity is closer in lower case: (|) and m.

Perhaps 5000 = |)) can be written D), 10000 = ((|)) can be written (M), and so forth.

There is ambiguity with (|) which could be the basic form for 1000, or C |) = CD = 400. We probably need to require the use of D, and not |), where there could be ambiguity. What are those situations?

Sophistication 2: Read the symbols of a number in Basic Form from left to right and consider the following for each symbol: if the symbol to the left of the current symbol was a power of 10, and the current symbol is 5 times a smaller power of 10, then omit the vertical bar in the current symbol (the symbol for 5 times the smaller power of 10). The restriction that the power of 10 be smaller means that this sophistication is not used when the power of 10 to the left indicates subtraction.

1500 = (|) |) becomes (|) )

6500 = |)) (|) |) becomes |)) (|) )

15000 = ((|)) |)) becomes ((|)) ))

10500 = ((|)) |) becomes ((|)) )

25000 = ((|)) ((|)) |)) becomes ((|)) ((|)) ))

20500 = ((|)) ((|)) |) becomes ((|)) ((|)) )

150000 = (((|))) |))) becomes (((|))) )))

105000 = (((|))) |)) becomes (((|))) ))

100500 = (((|))) |) becomes (((|))) )

4000 = (|) |)) remains unchanged because this is subtraction. (However, if it were allowed, would (|) )) be unambiguous?)

The source for all this information was an unsourced section in a Wikipedia article.

Note well that the symbol to the left must be a power of 10, not 5 times a power of 10:

5500 = |)) |) does not become |)) ) because that is ambiguous with

50000 = |))) once spaces are removed.

It might seem possible that if there are two or more consecutive 5 digits after a power of 10, both of them could have their vertical bars omitted. However, this would result in ambiguity:

105500 = (((|))) |)) |) becomes (((|))) )) |), not (((|))) )) ), because the latter is ambiguous with

150000 = (((|))) |))) becoming (((|))) )))

Note that parsing 1500 = (|)) is acceptable but tricky to interpret. It is tricky in a way similar to the ambiguity mentioned in Sophistication 1. We consider two possible interpretations, only one of them legal:

(|) ) = (|) |) = 1500.

C |)) = illegal subtraction, because only C may not subtract from |)) = 5000. (Only M may subtract from 5000.)

Only after eliminating the illegal interpretation are you left with one valid parse.

444444 = (((|))) |)))) ((|)) |))) (|) |)) C |) X L I V (unchanged from Basic Form)

666666 =

|)))) (((|))) |))) ((|)) |)) (|) |) C L X V I becomes

|)))) (((|))) ))) ((|)) )) (|) ) C L X V I

888888 = |)))) (((|))) (((|))) (((|))) ))) ((|)) ((|)) ((|)) )) (|) (|) (|) ) C C C L X X X V I I I

Sophistication 1 can combine with sophistication 2.

1500

= (|) |) basic form

= M D with sophistication 1

= (|) ) with sophistication 2

= M ) with sophistication 1 and 2

= (|) D is bizarre but possible

Were all 5 possibilities seen when writing years between 1500 - 1899?

Assuming we use M, the next year which could use backward C is 2500 = M M D = M M ) .

The next year which must use backward C is 4000 = M |)), perhaps also writable as M D).

Open problems:

Create a parser. Where (else) is the grammar ambiguous?

For a given number, how many different ways are there to write it?

## 1 comment :

Very interesting and informative article; the explanation regarding the derivation of various sophistications was particularly useful.

Post a Comment