Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Here is the procedure for reading and parsing a JSON file from an ADLS Gen2 storage account type using Java:

  1. Create a storage account instance that points to your ADLS Gen2 storage account.

    StorageAccount storageAccount = StorageAccount.fromConnectionString("<storage connection string>");
    
  2. Create a DataLakeStorageFileSystemManagementClient instance that will be used to access the file system.

    DataLakeStorageFileSystemManagementClient fileSystemClient = new DataLakeStorageFileSystemManagementClientBuilder()
           .endpoint("<file system endpoint>")
           .credential(new DefaultAzureCredentialBuilder().build())
           .buildClient();
    
  3. Use the fileSystemClient to download the JSON file from your storage account into a ByteBuffer object.

    ByteBuffer buffer = fileSystemClient.readFile("<file system name>", "<file path>");
    
  4. Convert the ByteBuffer into a String.

    String json = new String(buffer.array());
    
  5. Use a JSON parser library, such as Jackson or GSON, to parse the JSON string into Java objects.

Here is the procedure for reading and parsing a JSON file from an ADLS Gen2 storage account type using Java with Spark:

  1. Create a SparkContext instance.

    SparkConf conf = new SparkConf()
           .set("spark.hadoop.fs.adl.oauth2.access.token.provider.type", "ClientCredential")
           .setMaster("local[*]")
           .setAppName("<your app name>");
    SparkContext sparkContext = new SparkContext(conf);
    
  2. Use the SparkContext to create a RDD of JSON strings from your ADLS Gen2 storage account.

    JavaRDD<String> jsonRDD = sparkContext.textFile("<ADLS Gen2 file path>");
    
  3. Use a JSON parsing library, such as Jackson or GSON, to parse the JSON strings into Java objects.

    ObjectMapper objectMapper = new ObjectMapper();
    JavaRDD<MyObject> myObjectRDD = jsonRDD.map(json -> objectMapper.readValue(json, MyObject.class));