The most popular natural-language processing library in Python for beginners is NLTK. It's popular for beginners for a reason: it's lightweight and is easy to learn. It's also not up to par compared to modern NLP approaches.
The better choice is the library spaCy. spaCy is an excellent NLP library that can compete with state-of-the-art NLP tools. It also has built-in word vectors: crucial for most modern NLP.
Here's a taste of spaCy, direct from the spaCy documentation.
# pip install spacy # python -m spacy download en_core_web_sm import spacy # Load English tokenizer, tagger, parser, NER and word vectors nlp = spacy.load("en_core_web_sm") # Process whole documents text = ("When Sebastian Thrun started working on self-driving cars at " "Google in 2007, few people outside of the company took him " "seriously. “I can tell you very senior CEOs of major American " "car companies would shake my hand and turn away because I wasn’t " "worth talking to,” said Thrun, in an interview with Recode earlier " "this week.") doc = nlp(text) # Analyze syntax print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks]) print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"]) # Find named entities, phrases and concepts for entity in doc.ents: print(entity.text, entity.label_)
Open source lovers rejoice: spaCy is licensed under the MIT license!July 09, 2019