-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
The classes in reader take a path to a file on disk, read that file and then parse the contents. For example:
public final class KeyValueReader {
/**
* Generic method to read key value pairs from the bagit files, like bagit.txt or bag-info.txt
*
* @param file the file to read
* @param splitRegex how to split the key from the value
* @param charset the encoding of the file
*
* @return a list of key value pairs
*/
public static List<SimpleImmutableEntry<String, String>> readKeyValuesFromFile(final Path file, final String splitRegex, final Charset charset) throws IOException, InvalidBagMetadataException{
final List<SimpleImmutableEntry<String, String>> keyValues = new ArrayList<>();
try(final BufferedReader reader = Files.newBufferedReader(file, charset)){
...
}
return keyValues;
}
}For the Wellcome storage service (https://github.com/wellcometrust/storage-service), we aren’t keeping bags on the local disk, but in S3. If we want to read a file, we make a GetObject call to the S3 SDK, which returns an InputStream.
We could download the bag files to disk, and read them from there, but that seems a bit icky – would you be open to some pull requests that add allow parsing files even if they aren’t local files? Something like:
public final class KeyValueReader {
public static List<SimpleImmutableEntry<String, String>> readKeyValuesFromReader(
final BufferedReader reader,
final String splitRegex) throws IOException, InvalidBagMetadataException{
final List<SimpleImmutableEntry<String, String>> keyValues = new ArrayList<>();
...
return keyValues;
}
public static List<SimpleImmutableEntry<String, String>> readKeyValuesFromFile(
final Path file,
final String splitRegex,
final Charset charset) throws IOException, InvalidBagMetadataException{
try(final BufferedReader reader = Files.newBufferedReader(file, charset)){
return readKeyValuesFromReader(reader, splitRegex)
}
}
}So the existing API is preserved, and calls into the new method that takes any BufferedReader – and now we can call that rather than round-tripping to the filesystem first.
Thoughts?
Metadata
Metadata
Assignees
Labels
No labels