Week 11: 11/13 – 11/17

For this week, the dataset needed to be retrained to account for updates.

I finally took the manipulated TSV file and ran it. I had a few more bugs to fix in my python script so that it fixes the data that I receive completely.

Below is the final script I came up with.

import pandas as pd
import numpy as np

# Import the CSV file received of certain sentences. The two columns will be labelled as "Sentence" and "Value". (The second column of '1' values stands for the certainty.)

frame = pd.read_csv('TRAIN_wiki_certain.csv', sep="\t", names = ["Sentence", "Value"])


# Since we no longer need the "Value" column, we will just drop it.
frame = frame.drop(columns=['Value'])


# Here we are transforming the dataset into the format needed to be able to test this data.
frame.insert(loc=0, column='Value', value=0)
frame['Compare'] = frame['Sentence']
frame.Compare = frame.Compare.shift(-1)
frame['id'] = np.random.randint(50, 405000, frame.shape[0])


# Finally, we have to download the new datasat as a tsv file.
frame.to_csv("test_final.tsv", sep='\t', index=False, header=None)

 

Now, we should be ready to run the dataset, and it should¬†work, but I am still getting an “IndexError:¬†list index out of range” problem. For the next week, I’ll keep on working to debug this error.