I went through the same steps to extract the sentences from the Wikipedia data sets as I did with the biomedical data sets.
- After I making mistakes in the extracting progress, I tested the Wikipedia Evaluation set on the Wikipedia training set. My result was an “accuracy” measurement of 0.80, which means that our model correctly labeled 80% of the sentences from the evaluation set as certain or uncertain.
- Out of curiosity, I also tested the Wikipedia Evaluation set with the Biomedical Training set used in the previous bag-of-words model. Surprisingly, my result was an “accuracy” measurement of 0.78, meaning this model correctly labeled 78% of the sentences from the evaluation set as certain or uncertain.