1 | initial version |
Here is the procedure for reading and parsing a JSON file from an ADLS Gen2 storage account type using Java:
Create a storage account instance that points to your ADLS Gen2 storage account.
StorageAccount storageAccount = StorageAccount.fromConnectionString("<storage connection string>");
Create a DataLakeStorageFileSystemManagementClient instance that will be used to access the file system.
DataLakeStorageFileSystemManagementClient fileSystemClient = new DataLakeStorageFileSystemManagementClientBuilder()
.endpoint("<file system endpoint>")
.credential(new DefaultAzureCredentialBuilder().build())
.buildClient();
Use the fileSystemClient to download the JSON file from your storage account into a ByteBuffer object.
ByteBuffer buffer = fileSystemClient.readFile("<file system name>", "<file path>");
Convert the ByteBuffer into a String.
String json = new String(buffer.array());
Use a JSON parser library, such as Jackson or GSON, to parse the JSON string into Java objects.
Here is the procedure for reading and parsing a JSON file from an ADLS Gen2 storage account type using Java with Spark:
Create a SparkContext instance.
SparkConf conf = new SparkConf()
.set("spark.hadoop.fs.adl.oauth2.access.token.provider.type", "ClientCredential")
.setMaster("local[*]")
.setAppName("<your app name>");
SparkContext sparkContext = new SparkContext(conf);
Use the SparkContext to create a RDD of JSON strings from your ADLS Gen2 storage account.
JavaRDD<String> jsonRDD = sparkContext.textFile("<ADLS Gen2 file path>");
Use a JSON parsing library, such as Jackson or GSON, to parse the JSON strings into Java objects.
ObjectMapper objectMapper = new ObjectMapper();
JavaRDD<MyObject> myObjectRDD = jsonRDD.map(json -> objectMapper.readValue(json, MyObject.class));