For this week, the dataset needed to be retrained to account for updates.
I finally took the manipulated TSV file and ran it. I had a few more bugs to fix in my python script so that it fixes the data that I receive completely.
Below is the final script I came up with.
import pandas as pd import numpy as np # Import the CSV file received of certain sentences. The two columns will be labelled as "Sentence" and "Value". (The second column of '1' values stands for the certainty.) frame = pd.read_csv('TRAIN_wiki_certain.csv', sep="\t", names = ["Sentence", "Value"]) # Since we no longer need the "Value" column, we will just drop it. frame = frame.drop(columns=['Value']) # Here we are transforming the dataset into the format needed to be able to test this data. frame.insert(loc=0, column='Value', value=0) frame['Compare'] = frame['Sentence'] frame.Compare = frame.Compare.shift(-1) frame['id'] = np.random.randint(50, 405000, frame.shape) # Finally, we have to download the new datasat as a tsv file. frame.to_csv("test_final.tsv", sep='\t', index=False, header=None)
Now, we should be ready to run the dataset, and it should work, but I am still getting an “IndexError: list index out of range” problem. For the next week, I’ll keep on working to debug this error.