Hello All,
I have begun working on Text Mining and NLP using Python and NLTK. I hope to compile My final graduation project on this topic.
In this journey I have come across a few terms on which I reading the theory.
From WikipediaWhat is Information Retrieval ?Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text indexing.
What is Information Extraction ?Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction.
What is Text Mining ?Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).
An interesting Question that might have crossed your mindWhat is the difference between Information Extraction and Text Mining?Why I chose Python ?Read Here :
http://okfnlabs.org/blog/2013/11/11/python-nlp.htmland Here :
http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.htmlFAQQ: It seems interesting. How to get started ?
Ans. See the following :
Free book by NLTK developers :
http://www.nltk.org/book/NLTK Library :
http://www.nltk.org/install.htmlQ: Are there any Video tutorials on NLP ?
Ans.
https://www.coursera.org/course/nlp [Recommended]
https://www.coursera.org/course/nlangp [After you have done the above go through these]
Q: What are the other helpful Libraries needed ?
Ans. numpy - Numerical Computations and other features like array optimization.
Scipy : Scientific Computing
Matplotlib : For displaying the graphics models, distribution, probabilistic data models
Scitools (Optional) : Collection of Several scientific tools including above.
scikit-learn : Python bindings for ML (Machine Learning)
Hey Psycho_Coder, Is there a way I can help you ?Ans. Yes you can help me, Everyone of you must have a social life where you chat with your friends or etc. etc. If you wanna help me then give me a log of those chat text. Make sure it is not specially made for giving me but it should be natural. I won't distribute your data nor your information. If your chat logs have personal or intimate information then you are not required to give me those. I need to train my system with these data which are most commonly used. I have got some corpora but I need more. I don't know how many of you will actually understand why I asked for these but I will explain these later. But for students like me, these are great.
Make sure your chat logs have natural sounding words and sentences and where you use abbreviations as well. Like :-
"Hey, What are you doing ?"
the above can be written as or normally people write them as :-
"Hey, watcha doin ?" or "hiya, wat you doin ?" or "Hi, Wat r U doing ?" etc. etc.
My Project Details : - I am working on an Interactive Compiler where a user can ask write his instructions in english, as general expressions but he will get the output for what he has asked for.
For example the user may ask :-
"Evaluate the following, x^2 + 3*x + 5, when x is 6"
the above can be stated as follows as well :-
"Evaluate x^2 + 3*x + 5, when x=6"
or
"Solve x^2 + 3*x + 5, for x = 6"
All the three expressions mean the same and we will get the same result and my system has to understand these statements and then do the processing.
These are just simple algebra examples and I have more in reserve
The thing is often we have to remember the instructions that a language has. What if we give the machine an instruction and the the rest it takes care of. I want to reduce the instruction we have to remember to a minimum and let the machine take care of what is best and how to process.
I hope I get some help regarding the chat logs I asked. That is very important for me. Since its a large community hence I can get lots of data. Thank you for the time you took to read the thread.
Thank you,
Sincerely,
Psycho_Coder.