Ask Your Question

Revision history [back]

To convert a CSV file to Parquet format using C#, you can use the Apache Arrow library. Here's an example code snippet:

using System.IO;
using Apache.Arrow;
using Apache.Arrow.Csv;
using Apache.Arrow.Ipc;

// Input CSV file path
var inputFile = "input.csv";
// Output Parquet file path
var outputFile = "output.parquet";

// Read CSV file as Arrow table
var arrowTable = CsvReader.Read(inputFile);

// Write Arrow table to Parquet file
using (var fileStream = new FileStream(outputFile, FileMode.Create))
{
    using (var fileWriter = new ArrowStreamWriter(fileStream, arrowTable.Schema))
    {
        fileWriter.WriteTable(arrowTable);
    }
}

In this example, we first read the CSV file using the CsvReader class from the Apache Arrow library. This returns an Arrow table containing the data from the CSV file.

We then write this Arrow table to a Parquet file using the ArrowStreamWriter class. We create a file stream for the output file, and use the Arrow table's schema to initialize the ArrowStreamWriter. Finally, we call the WriteTable method on the ArrowStreamWriter to write the table to the output file.

It's worth noting that the Apache Arrow library supports many different data sources and formats, so you can use similar code to convert data between different formats.