Quiz: N-gram

Short story for this exercise:

The Story of An Hour - Kate Chopin.txt

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘her husband’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 157 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘it was’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 61 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘her sister’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 73 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘had been’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 137 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘brently mallard’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 101 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘he had’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 59 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘she did?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 127 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘did not’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 79 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘its significance’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 127 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘she would’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 73 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘would have’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 73 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘no one’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 173 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘open window’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 197 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘her body’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 107 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘that were’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 193 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘some one’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 127 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘one was’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 61 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘faintly countless’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 113 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘patches blue’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 79 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘blue sky’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 71 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘that had’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 67 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘with her’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 59 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘she was’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 167 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘there was’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 67 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘her eyes’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 131 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘her she’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 59 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘but she’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 191 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘when she’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 79 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘free free’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 89 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘keen bright’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 179 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘joy that’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 167 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘she saw’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 127 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘that would’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 139 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘there would’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 179 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘would be’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 59 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘be no’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 137 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘during those’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 101 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘she had’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 113 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘door with’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 67 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘open door’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 89 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘that life’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 61 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘life might’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 107 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘might be’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 167 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word bi-gram language model, what is the probability of the phrase ‘be long’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 191 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘she did not’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 127 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘with paralyzed inability’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 113 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘some one was’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 97 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘patches blue sky’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 89 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘was dull stare’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 67 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘her she was’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 53 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘there would be’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 131 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘would be no’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 197 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘that life might’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 79 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘life might be’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 59 and submit the remainder.

Just above of this quiz, there is a file that contains a short story.

As preprocessing, you should remove all capitalization, special characters, and numbers from the text. In addition, remove the following common words (called “stopwords”): [a, an, and, as, at, for, from, in, into, of, on, or, the, to]

After tokenization, remove also tokens with length < 2. We perform this step because if e.g. you have in the text words like “sister’s”, tokenization will result in meaningless tokens like “s” (Note that this might lead to omitting words such as the personal pronoun “I” which could be undesirable in practice, but for this exercise it is okay).

Using a word tri-gram language model, what is the probability of the phrase ‘might be long’?

Remove the leading zeroes from your answer, and consider the three most significant digits of the result as an integer (e.g. if your answer is 0.00010587, consider only 105 from it). Divide that integer with 83 and submit the remainder.

Posting submission...