This week, we worked mostly on getting our research submitted to the Stetson Showcase. We had to submit an abstract and finish the application.
This week I continued to work on making a GUI in Python. I have made a GUI before, but never in Python. I am using Tkinter which is moderately simple to use; however, not so simple to learn. I should be able to pick it up rather quickly. Now it is just a matter of taking what I’ve learned about Tkinter and making it into a usable GUI.
Next week I will continue to learn about Tkinter and hopefully make a lot of progress on the GUI.
This week I worked on improving the graph and learning how to make a GUI using Python. I think I am going to use Tkinter, but I have no experience using it. At the beginning of this research project, we already had a GUI in mind. Here is the GUI we originally proposed:
You can see we have a place for the user to input a document, and our program will check that document for connections with other documents.
Next week I will continue to learn about ways to make a GUI in Python.
Here is the final code used to extract all the certain sentences from all of the articles.
It resulted in nearly 96,000 .csv documents of certain sentences ready for testing.
article-parsing.py # coding: utf-8 import re import pandas as pd from nltk.tokenize import sent_tokenize from sklearn.feature_extraction.text import CountVectorizer import numpy as np from sklearn.ensemble import RandomForestClassifier import os import csv name = ['sent','hedge'] #training data for hedge-detection train = pd.read_csv("../../HedgeDetection/bag-of-words/train/TRAIN_biomedical_fullarticles.csv", names=name, delimiter = "\t", quoting=3) vectorizer = CountVectorizer(analyzer='word', stop_words = "english", ngram_range = (1,2), lowercase = True) train_data_feature = vectorizer.fit_transform(train['sent']) forest = RandomForestClassifier(n_estimators = 100) forest = forest.fit(train_data_feature, train["hedge"]) title = pd.read_csv('title.csv', names = ['article']) i = 0 while i < title['article'].shape: with open('fulltext/'+title['article'][i], 'r')as f: if os.stat('fulltext/'+title['article'][i]).st_size != 0: text = f.read() sentences = re.sub(r'\s+',' ',text) article_tokenize_list = sent_tokenize(sentences) #print(article_tokenize_list) article_data_feature = vectorizer.transform(article_tokenize_list) #print(article_data_feature) article_data_feature = article_data_feature.toarray()