How to read a file in java : ( Text , Binary , Pdf, Csv e.t.c )

Are you looking for a article which can help you in finding the solution for – “how to read a file in java ” . Usually It is one of the common task which every developer/ data scientist  has to perform atleast once in week . So why not learn in easy way ?

How to read a file in java : ( Text , Binary , Pdf, Csv etc ) –

As you know Pdf , CSV , Text and Binary are common file format . So if you are a java developer or data scientist , This article is a must read content for you –

1.How to read a file in java  ( Text ):


import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
public class TextFileReadingExample {
public static void main (String Args[]) {

//mention the fileName to Read
String fileName = "fileNameWithObsolutePath.txt";

String line = null;
try {

//File Reader object creation
FileReader fileReaderObj= new FileReader(fileName);

//Buffered Reader Obj Creation
BufferedReader bufferedReader = new BufferedReader(fileReaderObj);

try {
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
bufferedReader.close();
} catch (IOException e) {
System.out.println("System is unable to open the the file :"+ fileName);
e.printStackTrace();
}

} catch (FileNotFoundException e) {
System.out.println("System is unable to open the the file :"+ fileName);
e.printStackTrace();
}
}

}

Description –

All you need to go through some usual class for File Handling . These are –

  1. FileReader
  2. BufferedReader

Both of them are member of java.io.* .You can directly import and call them .Apart from this I don’t think any explanation is needed for you .Of Course the while loop contains some logical part .Here the –

bufferedReader.readLine() 

Function returns complete line as an string . It will return null when there will be no lines in buffer .Please refer the comment in code example for more information .

 

1.How to create a Text file in java  :


import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
public class TextFileWriting {
public static void main(String [] args) {
// FileName to write with absolute path.
String fileName = "FileName.txt";
try {
// File writer object Creation
FileWriter fileWriter = new FileWriter(fileName);
// FileWriter object Creation.
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
//Add String in lines of file
bufferedWriter.write("My Fiest Code for Text File Creation in java ");
bufferedWriter.write(" It so easy ");

//user this function if you need to leave a line
bufferedWriter.newLine();

bufferedWriter.write("Started from the next Line");
bufferedWriter.write(" Appended Text in seond Line ");
// Always close files.
bufferedWriter.close();
}
catch(IOException ex) {
System.out.println(
"System is unable to create the file "
+ fileName + "'");
// Or we could just do this:
// ex.printStackTrace();
}
}
}

Description –

There is only very slight difference between File reading and writing in java .In the place of FileReader and BufferedReader class from java.io.* package , We use FileWriter class and BufferedWriter class form the same input output package .I think most of thing will be clear to you . Incase of any doubt please comment in comment box .

How to read a Binary file in java  –

package com.practice.check.concept;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

public class ReadBinaryFile {

public static void main(String [] args) {

// The name of the file with absolute path to open.it could be a binary file as well .
String fileName = "FileName.txt";

try {
// declaring the buffer size
byte[] buffer = new byte[1000];

//File Input Stream initialization
FileInputStream inputStream = new FileInputStream(fileName);

int bufferCounter = 0;

// Reading the in chun of buffer size and breaks the while loop until finish the stream
while((bufferCounter = inputStream.read(buffer)) != -1) {

//Convert the buffer into String
System.out.println(new String(buffer));

}

// Always close files.
inputStream.close();

}
catch(FileNotFoundException ex) {
System.out.println(
"File can not be open now" +
fileName + "'");
}
catch(IOException ex) {
System.out.println(
"Unable to bread now "
+ fileName + "'");

}
}

}

. Description –

The above code will work for Binary and system formatted text file . As I have already mention in code comment that you need to give the filename with absolute path .If you already walk through the above code you can easily understand the code . There are only few differences like –

  1. In the place of FileReader class object , you need to create FileInputStream class object .It will create byte stream for you .
  2. Once the file is converted into stream , Define your buffer size .
  3. Now read the stream as many buffer it consumes .

Note –

How to Read CSV file in Java –

CSV file is a special file format which is nothing but Comma separated values . The main challenge here to read it when value contains comma and when comma is the separator together .Right?

For Example –

If the value in any row of a CSV is-

“2,3”, “25,” , ………………….

here both values – (2,3 ) and (25,)  it self contains comma . Here the separator has again comma. Now how to solve this parsing problem ?

There are actually two solution . First says to apply logic in inbuilt library provide by java and parse it .For example – You can ignore all comma except which comes between two inverted comma etc . Again this is not going to be the final logic . In fact there could be multiple ways to solve this .

Other way of doing is use third party library . In our case we are going to achieve with third party library . It will auto handle such scenarios for us .

  1. create a maven project and put the dependency there for opencsv .
    <dependency>
    <groupId>com.opencsv</groupId>
    <artifactId>opencsv</artifactId>
    <version>4.0</version>
    </dependency>

    2.Here is the complete code –

import java.io.FileReader;
import java.io.IOException;

import com.opencsv.CSVReader;

public class CsvFileReaderExample {

public static void main(String[] args) {

//filename of csv with absolute path
String csvFileToRead = "C:\\Users\\DSL\\Documents\\Folder\\SAMPLE.csv";

CSVReader reader = null;
try {

reader = new CSVReader(new FileReader(csvFileToRead));
String[] row;

//iterate each row
while ((row = reader.readNext()) != null) {
//To acces each element of row use row_variable[order]
System.out.println( row[0] +row[1]+row[2] );
}
} catch (IOException e) {
e.printStackTrace();
}

}

}

How to read PDF File in java –

PDF is portable document format and a for of unstructured data . While it is most important thing to learn and play for java data scientist / java developers.The reason is pretty straight forward .Most of the reports have pdf format and all bank statement etc .

Although to deal with PDF in java , there are so many external API like . We are going to use PDFbox .

If you want to know more about Java PDF Libraries ,Go for the article –  5 Best Java PDF Libraries : Must Read for every Data Scientist

Before we jump into java code . We need maven dependency for pdfbox library .All you need to copy into pom.xml between the tag

<dependencies> ____your dependency ____  </dependencies>

here is the maven dependency for pdfbox library-

<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.9</version>
</dependency>
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class ReadingPDF {

   public static void main(String args[]) throws IOException {

      //File object creation and pass as an arg to Pdfdocument 
      File fileobj = new File("C:/Folder/sample.pdf");
      PDDocument pdfDocument = PDDocument.load(fileobj);

      //object creation of  PDFTextStripper class
      PDFTextStripper pdfStripper = new PDFTextStripper();

      //text Extraction from PDF
      String textPdf = pdfStripper.getText(pdfDocument);
      System.out.println(textPdf);

      //Closing the document
      pdfDocument.close();

   }
}

How to Transform your career from Java developer to Data Scientist ?

Python , R and Julia are most popular language for data science but java is also powerful and capable of doing all the data science stuffs. Yes I agree some time performance varies in both of them.  Here is the detail article on carrer transition form java developer to Data Scientist.

Conclusion –

In this article (How to read a file in java : ( Text , Binary , Pdf, Csv etc ))  we have explored all the ways for java file handling  . In data science most of the time data scientist play with CSV  file format . Although this article give a walk through for all four file type . If you  are more interested to go  deeper  into any particular file format basic operation . Just Subscribe Data Science Learner . You will get the notification once the article publish on that .Till then keep reading Data Science Learner .

Thanks

Data Science Learner Team