Friday, 25 March 2016











1 : installl r java package in r studio then rhdfs package

        library("rJava", lib.loc="~/R/x86_64-pc-linux-gnu-library/3.2")

2 : before installing it set hadoop path to studio and start the hadoop by typing 

        start-all.sh
        Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
        Sys.setenv(HADOOP_HOME="/usr/local/hadoop") 
        library("rhdfs", lib.loc="~/R/x86_64-pc-linux-gnu-library/3.2")








3  : then import the data by using the command

      reder=hdfs.line.reader("/data/diabetes1")
      diabetes = reder$read()
      typeof(diabetes)
      diabetes





4 : now implement the decesion tree by using rpart package

      names(diabetes) <- gsub("\\.","",names(diabetes))
     str(diabetes)
      attributes(diabetes)
      library(rpart)


      set.seed(564)
      flags = sample(2,nrow(diabetes), replace = TRUE, prob =c(0.7,0.3))
      trainset = diabetes[which(flags==1),]
      testset = diabetes[which(flags==2),]
      str(trainset)
      str(testset)


      index = sample(1:nrow(diabetes), nrow(diabetes)*0.7, replace=FALSE)
      trainset = diabetes[index,]
      testset = diabetes[-index,]
     str(trainset)
     str(testset)


    ?rpart
    dtree = rpart(Classvariable ~ Number_of_times_pregnant
              +Plasma_glucose_concentration
              +Diastolic_blood_pressure
              +Triceps_skin_fold_thickness
              +Hour_serum_insulin
              +Body_mass_index
              +Diabetes_pedigree_function
              +Ageyears ,
              data=trainset,
              control=rpart.control(minsplit = 10))
    str(dtree)
    dtree
   plot(dtree)
   text(dtree)









5 : now the mining results is accessed by decesion maker


 
  
              

         
     Tools and softwares used in this project have listed below 

ARX ANONYMIZATION

ARX is a comprehensive open source software for anonymizing sensitive personal data. It has been designed from the ground up to provide high scalability, ease of use and a tight integration of the many different aspects relevant to data anonymization. Its highlights include: 

  • Risk-based anonymization using super-population models, strict-average risk and k-map
  • Syntactic privacy models, such as k-anonymity, ℓ-diversity, t-closeness, δ-disclosure privacy and δ-presence
  • Semantic privacy models, such as (ɛ, δ)-differential privacy
  • Data transformation with generalization, suppression, microaggregation and top/bottom-coding as well as global and local recoding
  • Methods for analyzing data utility
  • Methods for analyzing re-identification risks
The software is able to handle very large datasets on commodity hardware and features an intuitive cross-platform graphical user interface.




DOWNLOAD LINK:ARX

R STUDIO 


R Studio is a free and open-source integrated development environment(IDE) for R, a is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.



DOWNLOAD LINK FOR RR
DOWNLOAD LINK FOR R STUDIO R Studio
 

ECLLIPSE


Eclipse provides IDEs and platforms nearly every language and architecture. We are famous for our Java IDE, C/C++, JavaScript and PHP IDEs built on extensible platforms for creating desktop, Web and cloud IDEs. These platforms deliver the most extensive collection of add-on tools avaialable for software developers.




DOWNLOAD LINK : ECLLIPSE

APACHE HADOOP

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.



DOWNLOAD LINK : HADOOP

SELENIUM

Selenium is a portable software testing framework for web applications. Selenium provides a record/playback tool for authoring tests without learning a test scripting language (Selenium IDE).
                  
The Selenium Server is needed in order to run either Selenium RC style scripts or Remote Selenium WebDriver ones. The 2.x server is a drop-in replacement for the old Selenium RC server and is designed to be backwards compatible with your existing infrastructure.


DOWNLOAD LINK : SELENIUM JAR 

WIRESHARK

Wireshark is a free and open packet analyzer. It is used for network troubleshooting, analysis, software and communication protocal development, and education.
Wireshark is the world's foremost network protocol analyzer. It lets you see what's happening on your network at a microscopic level. It is the de facto (and often de jure) standard across many industries and educational institutions.
Wireshark development thrives thanks to the contributions of networking experts across the globe. It is the continuation of a project that started in 1998



DOWNLOAD LINK : WIRESHARK
 
VIRTUAL BOX

VirtualBox is a powerful x86 and AMD64/Intel64 virtulization product for enterprise as well as home use. Not only is VirtualBox an extremely feature rich, high performance product for enterprise customers, it is also the only professional solution that is freely available as Open Source Software



DOWNLOAD LINK : VIRTUAL BOX

XAMPP

XAMPP is a completely free, easy to install Apache distribution containing MariaDB, PHP, and Perl. The XAMPP open source package has been set up to be incredibly easy to install and to use.



DOWNLOAD LINK : XAMPP


Thursday, 24 March 2016







                    HADOOP  SETUP CODE



HDFS-SITE.XML

<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
 </property>
</configuration>



CORE_SITE.XML

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
 </property>

 <property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.</description>
 </property>
</configuration>

MAPRED-SITE.XML

<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at. 
  </description>
 </property>
</configuration>





IMPORTING DATA FROM HDFS


reder=hdfs.line.reader("/data/diabetes")
dibaetes = reder$read()
typeof(dibaetes)
diabetes

               DECESION TREE  USING R


names(diabetes) <- gsub("\\.","",names(diabetes))
str(diabetes)
attributes(diabetes)
library(rpart)


set.seed(564)
flags = sample(2,nrow(diabetes), replace = TRUE, prob =c(0.7,0.3))
trainset = diabetes[which(flags==1),]
testset = diabetes[which(flags==2),]
str(trainset)
str(testset)


index = sample(1:nrow(diabetes), nrow(diabetes)*0.7, replace=FALSE)
trainset = diabetes[index,]
testset = diabetes[-index,]
str(trainset)
str(testset)


?rpart
dtree = rpart(Classvariable ~ Number_of_times_pregnant
              +Plasma_glucose_concentration
              +Diastolic_blood_pressure
              +Triceps_skin_fold_thickness
              +Hour_serum_insulin
              +Body_mass_index
              +Diabetes_pedigree_function
              +Ageyears ,
              data=trainset,
              control=rpart.control(minsplit = 10))
str(dtree)
dtree
plot(dtree)
text(dtree)
attributes(dtree)
dtree$variable.importance

name = read.csv("new.csv",header=1)
predict(dtree, New, type=c("class"))


predictedDT = predict(dtree, testset, type=c("class"))
predictedDT
table(predictedDT, testset$Classvariable)
mean(predictedDT!=testset$Classvariable)



name = read.csv("read.csv",header=1)
predict(dtree, name, type=c("class"))

                             TESTING


package com.selenium.one;

import org.junit.Test;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

import com.relevantcodes.extentreports.ExtentReports;
import com.relevantcodes.extentreports.LogStatus;

public class Examples {
   
   
    @Test
        public void verifyTitle(){
       
            ExtentReports extent = ExtentReports.get(Examples.class);
           
            extent.init("/home/akhil/check.html",true);
           
            extent.startTest("verify page");
    /**
     * @param args
     */
   
        // TODO Auto-generated method stub
        WebDriver driver =new FirefoxDriver();
       
        driver.get("http://localhost/registration/input.html");
        driver.manage().window().maximize();
        extent.log(LogStatus.INFO,"browser is running");
       
        driver.findElement(By.xpath("html/body/form/fieldset[1]/p/input")).sendKeys("lokesh@gmail.com");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[2]/p[1]/input")).sendKeys("lokesh");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[2]/p[2]/input")).sendKeys("kuncham");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[2]/p[3]/input")).sendKeys("rvr");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[2]/p[4]/input")).sendKeys("21");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[2]/p[5]/input")).sendKeys("amarvathi");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[2]/p[6]/input")).sendKeys("guntur");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[2]/p[7]/input")).sendKeys("andhra pradesh");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[2]/p[8]/input")).sendKeys("522007");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[2]/p[9]/input")).sendKeys("87654");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[2]/p[10]/input")).sendKeys("12345");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[3]/p[1]/input")).sendKeys("22");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[3]/p[2]/input")).sendKeys("24");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[3]/p[3]/input")).sendKeys("54");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[3]/p[4]/input")).sendKeys("44");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[3]/p[5]/input")).sendKeys("66");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[3]/p[6]/input")).sendKeys("89");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[3]/p[7]/input")).sendKeys("56");
        extent.log(LogStatus.INFO,"browser is running");
        driver.findElement(By.xpath("html/body/form/fieldset[3]/input")).click();
        extent.log(LogStatus.PASS, "submitted sucessfully");
        driver.navigate().back();
       
       
    }
       

    }