Monthly Archives: September 2017

Week Four: 9/25 – 9/29

(Finally) Running BiMPM source code

Our server is back up! For this week, I was finally able to start running the BiMPM source code and learn about training data sets using this.

I downloaded the Quora Question Pair dataset that was also used in the paper to start off.

I had to use the code below to run it with python and tensorflow:

Boost_DIR=/opt/boost-1.63.0-gcc4.9/ OpenCV_DIR=/opt/opencv2.4-gcc4.9/ PKG_CONFIG_PATH=/opt/libzip-1.2.0-gcc4.9/lib/pkgconfig PATH=/opt/openexr-2.2.0-bin/bin:/opt/protobuf-3.2.0-gcc4.9/bin/:/opt/python2.7-gcc4.9/bin:$PATH PYTHONPATH=/opt/dlib-19.3-gcc4.9/:/opt/opencv2.4-gcc4.9/lib:/opt/opencv2.4-gcc4.9/lib/python2.7/site-packages/ LD_LIBRARY_PATH=/opt/openexr-2.2.0-bin/lib:/opt/python2.7-gcc4.9/lib/:/opt/boost-1.63.0-gcc4.9/lib/:/opt/opencv2.4-gcc4.9/lib/:/opt/protobuf-3.2.0-gcc4.9/lib:/opt/glog-gcc4.9-bin/lib:/opt/gflags-gcc4.9-bin/lib:/opt/snappy-gcc4.9-bin/lib/:/opt/cuda/lib64/:/opt/libzip-1.2.0-gcc4.9/lib/ python src/

Although it seems that there are more configurations needed to make this command work.

I used the command below (in addition to the one above) to train my model:

python BiMPM/src/ –train_path train.tsv –dev_path dev.tsv –test_path test.tsv –word_vec_path wordvec.txt –suffix sample –fix_word_vec –model_dir models –MP_dim 20

I had issues importing rnn_cell.

I still have to figure out how to play with other arguments in order to get a better performance on this dataset like the command line configuration used by Wang below:

“dropout_rate”: 0.1,
“suffix”: “quora”,
“NER_dim”: 20,
“highway_layer_num”: 1,
“with_match_highway”: true,
“optimize_type”: “adam”,
“with_highway”: true,
“max_epochs”: 10,
“with_aggregation_highway”: true,
“with_filter_layer”: false,
“lex_decompsition_dim”: -1,
“aggregation_layer_num”: 1,
“max_char_per_word”: 10,
“wo_maxpool_match”: false,
“context_layer_num”: 1,
“wo_full_match”: false,
“lambda_l2”: 0.0,
“fix_word_vec”: true,
“wo_left_match”: false,
“with_NER”: false,
“aggregation_lstm_dim”: 300,
“context_lstm_dim”: 100,
“POS_dim”: 20,
“with_lex_decomposition”: false,
“learning_rate”: 0.001,
“with_POS”: false,
“wo_right_match”: false,
“MP_dim”: 10,
“max_sent_length”: 100,
“batch_size”: 60,
“wo_max_attentive_match”: false,
“wo_char”: false,
“wo_attentive_match”: false,
“char_emb_dim”: 20,
“char_lstm_dim”: 100,
“word_level_MP_dim”: -1,
“base_dir”: “./quora”

Furthermore, I have also tried testing the model using the argument below (again, in addition the first code snippet above):

python BiMPM/src/ –in_path test.tsv –word_vec_path wordvec.txt –mode prediction –model_prefix models/SentenceMatch.sample –out_path test.prediction

Although I still have some work and research to do to make this fully functional, we are getting there!

Week 3

My task: To research and determine an ideal way to detect and extract hedges from a document (continued…)

This week, the majority of my research was on conditional random fields, or CRFs.

CRFs are defined as undirected graphical models. However, CRFs are more complicated than I anticipated and with my limited knowledge of advanced mathematics and probability, I could not seem to completely wrap my head around the complex formulas that are associated with CRFs. So, instead of trying to teach myself all the mathematics needed to understand a single formula for a conditional probability distribution, I tried to focus my efforts for the week on understanding the overall purpose of CRFs and how they can be applied to the task at hand.

Let’s say you have a series of images of dogs, and your goal is to label each image based on what the dogs are doing in that image (i.e. walking, running, barking, eating, etc…). You can do this in one of two ways.

The first method: Discount the logical order of the images and classify the images based on larger, easily identifiable contexts.

  • For example, you notice that all of the very bright and vibrant images seem to be taken during the afternoon and illustrate dogs playing with toys outside. You also notice that unclear and dark images seem to be taken at night and illustrate dogs sleeping. In both respective cases, those similar images would be grouped and labeled based on that similarity.

This method is a great and very efficient way to label your images and the contexts that they illustrate. The problem, however, is that this method can result in a lot of information loss.

  • For example, you have an image that is a close-up on a dog’s foot. From that image alone, there is no way to tell whether that dog is walking, running, eating, or even barking. So, that image would end up being discarded by the classifier because it cannot be classified.

In this case, it would be best to use:

The second method: Take into account the logical order of the images and classify the images based on their near and surrounding contexts.

  • Let’s say you still have the image that is a close-up of a dog’s foot. Only this time, you also have the images that came before and after the close-up of the dog’s foot. You may notice that the previous image and/or following image illustrates a dog running. Then, the probability of that dog running in the unidentifiable picture becomes very likely.

Though this method may take longer and is less efficient than the first method, it will give us more accurate results. Well, such is the basis of CRFs: sequenced labeling for accuracy.

CRFs are often used in a natural-language processing method known as part-of-speech tagging, labeling words or phrases in a sentence or document based on contexts by their respective part-of-speech (i.e. noun, verb, pronoun, preposition, adverb, adjective, etc…). With that said, I can imagine they can also be used to label words, phrases, and sentences based on contexts as hedges or non-hedges.

Week Three: 9/18 – 9/22

Additional BiMPM research and scikit-learn

For the beginning of this week, as the server that we needed to use to run the source code of BiMPM for natural language sentence matching was still down (in addition to the countless effects of Hurricane Irma), I mainly focused on doing more research on this topic.

We wanted to use Wang, Hamza, and Florian’s work for this research project because their “matching-aggregation” framework differs from previous ones in such a way that their model matches two sentences, P and Q, in two directions (P -> Q and P <- Q). From this, in each individual direction, their model matches both of the sentences from multiple perspectives. From the experiments that they ran on the “Quora Question Pairs” dataset and their evaluations of their model (namely paraphrase identification, natural language inference, and answer selection), their results showed that their model achieves optimal performance on all tasks. We could also see that eliminating any of the matching strategies that they used hurt the performance significantly.

For the rest of the week, I was also able to begin going through scikit-learn tutorials in order to learn Machine Learning in Python.

Week 2

Unfortunately, I was unable to submit a blog post due to heavy traveling and the lack of power caused by Hurricane Urma.

Instead, I will simply include the findings of this week and next week in the “Week 3” blog post.