<?xml version="1.0"?>
<!DOCTYPE article SYSTEM "C:\nlm\converter\journal-publishing-dtd-2.0\journalpublishing.dtd">
<article>
<front>
<journal-meta>
<journal-id journal-id-type="publisher">IJDSBDA</journal-id>
<journal-title>International Journal of Data Science and Big Data Analytics</journal-title>
<issn pub-type="epub">2710-2599</issn>
<publisher>
<publisher-name>SvedbergOpen</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="other">ijdsbda-1-2-003</article-id>
<doi-group>
<article-doi><ext-link ext-link-type="uri" xmlns:xlink="https://doi.org/" xlink:href="10.51483/IJDSBDA.1.2.2021.23-30">10.51483/IJDSBDA.1.2.2021.23-30</ext-link></article-doi>
</doi-group>
<article-categories>
<subj-group>
<subject>Research Paper</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Developing and testing a tool to classify sentiment analysis</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Acharya</surname><given-names>Sameer Kumar</given-names></name>
<xref ref-type="aff" rid="aff001"><sup>1</sup></xref>
<xref ref-type="corresp" rid="cor001"><sup>&#x002A;</sup></xref>
</contrib>
</contrib-group>
<aff id="aff001"><sup>1</sup><deptname>Data Science Department, NMIMS University</deptname>, <instcity>Mumbai</instcity>, <instcountry>India</instcountry>. E-mail: <email>sameeracharya.nmims@gmail.com</email></aff>
<author-notes>
<corresp id="cor001"><sup>&#x002A;</sup>Corresponding author: Sameer Kumar Acharya, <deptname>Data Science Department, NMIMS University</deptname>, <instcity>Mumbai</instcity>, <instcountry>India</instcountry>. E-mail: <email>sameeracharya.nmims@gmail.com</email></corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>05</month>
<year>2021</year>
</pub-date>
<volume>1</volume>
<issue>2</issue>
<fpage>23</fpage>
<lpage>30</lpage>
<abstract>
<title>Abstract</title>
<p>The era has faced with explosive growth in data generation. Data generation has undergone a renaissance change. This availability of data has led a paradigm shift in the E-commerce sector; data is no longer a by-product of business activities, but are the asset to a company it helps in providing insights which are required in satisfying customers&#x2019; needs. This paper provides an overview of sentiment analysis of product reviews based on different algorithms and its efficiency in determining positive from negative reviews based on N-gram, Bigram with the application of Count-Vectorizer and (Term Frequency-Inverse Document Frequency) (TFIDF) Matrix. Different classification models have been employed to check the prediction ) accuracy of the unlabeled text. Based on the above classification and tool has been developed which predicts the incoming reviews and classify its sentiment polarity.</p>
</abstract>
<kwd-group>
<title>Keywords</title>
<kwd>Text mining</kwd>
<kwd>sentiments</kwd>
<kwd>K-Nearest Neighbor (KNN)</kwd>
<kwd>Random forest</kwd>
<kwd>Multinomial Na&#x00EF;ve Bayes</kwd>
<kwd>TFIDF</kwd>
<kwd>Count-Vectorizer</kwd>
</kwd-group>
<counts>
<ref-count count="23"/>
<page-count count="8"/>
</counts>
</article-meta>
</front>
<back>
<ref-list>
<title>References</title>
<ref id="bib001"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Aizerman</surname><given-names>A.B.</given-names></name></person-group> (<year>1964</year>). <article-title>Theoretical foundations of the potential function method in pattern recognition learning</article-title>. <source>Automation and Remote Control</source>. <fpage>821</fpage>&#x2013;<lpage>837</lpage>.</citation></ref>
<ref id="bib002"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Ben</surname></name><name><surname>Schafer</surname><given-names>J.</given-names></name></person-group>, and <collab>J. K.</collab> (<year>1999</year>). <article-title>Recommender Systems in E-Commerce</article-title>. <source>GroupLens Research Project, MN</source> <fpage>55455</fpage>.</citation></ref>
<ref id="bib003"><citation citation-type="web"><person-group person-group-type="author"><name><surname>Berman</surname><given-names>M.</given-names></name></person-group> (<year>2017</year>, <month>July</month> <day>5</day>). <source>Sentiment Analysis: Overview, Applications and Benefits</source>. Retrieved <date-in-citation content-type="access-date">July 5, 2017</date-in-citation>, from growthaccelerationpartners: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:href="https://www.growthaccelerationpartners.com/blog/sentiment-analysis/">https://www.growthaccelerationpartners.com/blog/sentiment-analysis/</ext-link></citation></ref>
<ref id="bib004"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Caropreso</surname><given-names>M.F.</given-names></name></person-group> (<year>2001</year>). <article-title>A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization</article-title>. <source>Semanticsscholar</source>. <fpage>385</fpage>.</citation></ref>
<ref id="bib005"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Cliff</surname><given-names>G.</given-names></name></person-group> (<year>2011</year>). <source>Semantic Analysis: An introduction</source>. <publisher-name>New York Oxford University Press</publisher-name>. p. <fpage>17</fpage>.</citation></ref>
<ref id="bib006"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Ho</surname><given-names>T.K.</given-names></name></person-group> (<year>1998</year>). <article-title>The random subspace method for constructing decision forests</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>. <pub-id pub-id-type="doi">10.1109/34.709601</pub-id>, <fpage>832</fpage> &#x2013; <lpage>844</lpage>.</citation></ref>
<ref id="bib007"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gerard Salton</surname><given-names>M. J.</given-names></name></person-group> (<year>1986</year>). <source>Introduction to Modern Information Retrieval</source>. <publisher-name>McGraw-Hill, Inc</publisher-name>. <publisher-loc>New York, NY, USA</publisher-loc>.</citation></ref>
<ref id="bib008"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Guo</surname><given-names>G.</given-names></name><name><surname>Wang</surname><given-names>H.</given-names></name><name><surname>Bell</surname><given-names>D.</given-names></name><name><surname>Bi</surname><given-names>Y.</given-names></name><name><surname>Greer</surname><given-names>K.</given-names></name></person-group> (<year>2004</year>). <article-title>KNN Model-Based Approach in Classification. Researchgate</article-title>.</citation></ref>
<ref id="bib009"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Ho</surname><given-names>T.K.</given-names></name></person-group> (<year>1995</year>). <article-title>Random decision forests</article-title>. <source>IEEE Computer Society</source>. <fpage>278</fpage>.</citation></ref>
<ref id="bib0010"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Ho</surname><given-names>T.B.</given-names></name></person-group> (<year>2000</year>). <article-title>Non-hierarchical document clustering based on a tolerance rough set model</article-title>. <source>International Journal of Intelligent Systems</source>. <fpage>199</fpage>&#x2013;<lpage>212</lpage>.</citation></ref>
<ref id="bib0011"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ho</surname><given-names>Tu Bao</given-names></name><name><surname>Funakoshi</surname><given-names>Kaname.</given-names></name></person-group> (<year>1998</year>). <article-title>Information retrieval using rough sets</article-title>. <source>Journal of the Japanese Society for Artificial Intelligence</source>. <volume>13</volume>(<issue>3</issue>), <fpage>424</fpage>&#x2013;<lpage>433</lpage>.</citation></ref>
<ref id="bib0012"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Joshi</surname><given-names>S.M.</given-names></name></person-group> (<month>October</month> <day>14-18</day>, <year>2013</year>). <article-title>Sentiment aggregation using concept net ontology</article-title>. <source>IJCNLP, Sixth International Joint Conference on Natural Language Processing</source>. <fpage>570</fpage>&#x2013;<lpage>578</lpage>.</citation></ref>
<ref id="bib0013"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Li</surname><given-names>Y. J.</given-names></name></person-group> (<year>2008</year>). <article-title>Text document clustering based on frequent word meaning sequences</article-title>. <source>Data &#x0026; Knowledge Engineering</source>. <fpage>381</fpage>&#x2013;<lpage>404</lpage>.</citation></ref>
<ref id="bib0014"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Lillian</surname><given-names>B.P.</given-names></name></person-group> (<month>02</month>, <year>2002</year>). <article-title>Sentiment classification using machine learning techniques</article-title>. <source>EMNLP</source>.</citation></ref>
<ref id="bib0015"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Papka</surname><given-names>R.</given-names></name><name><surname>Allan</surname><given-names>J.</given-names></name></person-group> (<year>1998</year>). <article-title>Document classification using multiword features</article-title>. In <source>Proceedings of the seventh international conference on information and knowledge management</source>. <fpage>124</fpage>&#x2013;<lpage>131</lpage>.</citation></ref>
<ref id="bib0016"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Rish</surname><given-names>I.</given-names></name></person-group> (<year>2001</year>, <month>January</month>). <article-title>An Empirical Study of the Na&#x00EF;ve Bayes Classifier</article-title>. <source>ResearchGate</source>. <fpage>46</fpage>.</citation></ref>
<ref id="bib0017"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Salton</surname><given-names>G.</given-names></name></person-group> (<year>1973</year>). <article-title>On the specification of term values in automatic indexing</article-title>. <source>Journal of Documentation</source>. <fpage>351</fpage>&#x2013;<lpage>372</lpage>.</citation></ref>
<ref id="bib0018"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Salton</surname><given-names>G.</given-names></name></person-group> (<year>1989</year>). <source>The transformation, analysis, and retrieval of information by computer</source>. <publisher-name>Addison-Wesley Longman Publishing Co., Inc</publisher-name>. <publisher-loc>Boston, MA, USA</publisher-loc>.</citation></ref>
<ref id="bib0019"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Sparck Jones</surname><given-names>K.</given-names></name></person-group> (<year>2004</year>). <article-title>IDF term weighting and IR research lessons</article-title>. <source>Journal of Documentation</source>. <fpage>521</fpage>&#x2013;<lpage>523</lpage>.</citation></ref>
<ref id="bib0020"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Sparck</surname></name><name><surname>Jones</surname><given-names>K.</given-names></name></person-group> (<year>1972</year>). <article-title>A statistical interpretation of term specificity and its application in retrieval</article-title>. <source>Journal of Documentation</source>. <fpage>11</fpage>&#x2013;<lpage>21</lpage>.</citation></ref>
<ref id="bib0021"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Tan</surname><given-names>S.</given-names></name></person-group> (<year>2005</year>). <chapter-title>Neighbor-weighted K-nearest neighbor for unbalanced text corpus</chapter-title>, <source>Expert Systems with Applications</source>. <fpage>667</fpage>&#x2013;<lpage>671</lpage>. <publisher-name>ACM Digital Library</publisher-name>.</citation></ref>
<ref id="bib0022"><citation citation-type="other"><person-group person-group-type="author"><name><surname>Tomas Mikolov</surname><given-names>K.C.</given-names></name></person-group> (<year>2013</year>). <article-title>Efficient Estimation of Word Representations in</article-title>. <source>arXiv, 1301.3781v3</source>.</citation></ref>
<ref id="bib0023"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yong</surname><given-names>Z.</given-names></name><name><surname>Youwen</surname><given-names>L.</given-names></name><name><surname>Shixiong</surname><given-names>X.</given-names></name></person-group> (<year>2009</year>). <article-title>An Improved KNN Text Classification Algorithm Based on Clustering</article-title>. <source>Journal of Computers</source>. <volume>4</volume>(<issue>3</issue>), <fpage>230</fpage>&#x2013;<lpage>237</lpage>.</citation></ref>
</ref-list>
</back>
</article>