The big quantities of records have created a requirement for new frameworks for processing of data. With the fast advancement of web, the quantity of information that are gathered or created in numerous spaces creates difficulties to mainstream researchers on account of the volume and complications of the records. The need of investigation techniques that pullout helpful data for decision making has been getting more consideration with the end goal for specialists to get a scalable answer for conventional method. In this paper, we proposed a scalable plan and execution of a Parallel Particle Swarm Optimization approach that depends on the Apache Spark system. The fundamental thought of the of this technique is to get the ideal centroid for each target name utilizing molecule swarm streamlining and afterward find out unlabeled information focuses to the nearest cancroids. Two variations of approach Proposed Algorithm-F1 and Proposed Algorithm-F2 were proposed dependent on various fitness functions that can be proficiently parallelized utilizing the Apache Spark structure in future.
Keywords: Particle Swarm Optimization, Big Data, Classification, Centroid, Apache Spark System.