GitHub Link of the code - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/text_classification.py
The above code is an example of text classification using RNN or bag of words.
I trained the model using Ubuntu's conversation corpus. After training i modified the code to predict on the trained model ( changes are descirbed in another page.).
Adding the prediction part as follows,
x_test = pandas.DataFrame(data=["What's the apt-get equivalent to rpm -qi?"])
y_test = pandas.Series(data=[])
After this when i tried to run the code, it failed with following error,
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py", line 168, in fit_transform
self.fit(raw_documents)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py", line 150, in fit
for tokens in self._tokenizer(raw_documents):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py", line 51, in tokenizer
yield TOKENIZER_RE.findall(value)
TypeError: buffer size mismatch
After trying different things for a while, I ended up with below small change to the code which worked.
x_test = pandas.Series(data=["What's the apt-get equivalent to rpm -qi?"])
y_test = pandas.Series(data=[])