This week I started working on transforming a CSV file of sentences into the data format:
label sentence#1 sentence#2 other_info
I am using pandas to import the csv file and transform it into a data frame:
import pandas as pd frame = pd.read_csv('Bag_of_Words_model.csv', names = ["Sentences"])
I then insert another column in the beginning as the label – with all zeroes for now since this column will be irrelevant for now, as the Sentence Match Decoder will have to decide this label.
frame.insert(loc=0, column='Values', value=0)
I want to be able to compare every sentence with every other sentence in an article, so I then duplicate the sentences column that I currently have, then shift the values by 1 row.
frame['Compare'] = frame['Sentences'] frame.Compare = frame.Compare.shift(-1)
This then gives me a data frame that looks like so:
This is close to what we need the input data to look like.