Artificial intelligent assistant

What is the probability that a sequence of $M$ random letters contains a sub-sequence of $K$ letters? The alphabet is supposed to have $n$ characters (usually $n=26$, though in my case $n=256$ or $256^2$ or $256^4$). **Example** : Here $n=2$ and the alphabet consists of two letters A and B. The prespecified sub-sequence is ABB (so $K=3$). The sequence contains $M=4$ letters. The $M^n$ possibilities for the sequence are: AAAA, AAAB, AABA, A **ABB** , ABAA, ABAB, **ABB** A, **ABB** B, BAAA, BAAB, BABA, B **ABB** , BBAA, BBAB, BBBA, and BBBB. The sub-sequence ABB appears (highlighted in bold) 4 times out of 16, and in each of these four cases, it appears only once within the sequence. So the answer is 1/4 in this case. Note that for the sub-sequence AAA, the answer would be 3/16. In my case, $M$ is large (say, $10^7$) and $K$ is small (say $K=300$.) Both the sequence and sub-sequence are random in my case.

If you just need an _upperbound_ , that is much easier.

Let $X =$ the number of times the sequence appears. There are $M-K+1$ possible starting possitions, so by linearity of expectation, $E[X] = (M - K + 1) n^{-K} < M n^{-K}.$

Now $E[X] = \sum_{k \ge 0} k P(X = k) = \sum_{k \ge 1} k P(X = k) \ge \sum_{k \ge 1} P(X=k) = P(X \ge 1) =$ prob that the $K$-sequence appears at least once. So this prob is upperbounded by $E[X] < M n^{-K}$.

I am saying exactly the same thing as all the various comments, except that the above also formally proves it is an upperbound.

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy 1699edfd995072ea8f8dd3788c3824e8