I did a heap of aws s3 sync ./Documents s3://myBucket/Documents
commands on a MacBook to upload files to S3.
When I went to download the S3 files to Windows. I found that there were many characters that the Macbook and S3 would happily accept but windows wouldn't allows. Some examples I can think of are.
- A file path in mac that is
/Path To/File with trailing space /filename here.jpeg
- A folder with trailing dots
/Path To/Folder with trailing dots.../filename here2jpeg
- Macbooks will happily allow in a file or folder name double quotes, asterisks *, greater > and less < than symbols, the pipe character | and question marks ???
So to remedy this once the files have been uploaded I did the following
Get a list of your S3 objects :
#!/bin/bash
# awscommands.sh
# get a list of objects in your bucket
aws s3api list-objects --bucket myBucket --prefix Documents --query 'Contents[].[Key]' --output text > $1
# check for the illegal characters in the list and pipe to file
cat $1 | grep -e '*' -e '|' -e '<' -e '>' -e '\\' -e '?' -e '"' -e ':' > ${1}out
Loop through the filenames and run aws s3 mv against them by running the following script php fixfilenames.php
<?php
# fixfilenames.php
$lines = file('docsout');
foreach ($lines as $line) {
$trimmed = trim($line);
$s3 = 's3://myBucket/';
$arg1 = escapeshellarg($s3 . $trimmed);
$arg2 = escapeshellarg($s3 . filterFilename($trimmed));
$cmd = 'aws s3 mv '; //--dryrun
$fullCommand = $cmd . $arg1 . ' ' . $arg2;
echo $fullCommand . "\n";
$ret = shell_exec($fullCommand);
echo $ret . "\n";
}
/**
* filterFilename
* @param string $name the file name string
* @return string
*/
function filterFilename($name)
{
// remove illegal file system characters https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words
$name = str_replace(array_merge(
array_map('chr', range(0, 31)),
array('<', '>', ':', '"', '\\', '|', '?', '*')
), '', $name);
// maximise filename length to 255 bytes http://serverfault.com/a/9548/44086
// $ext = pathinfo($name, PATHINFO_EXTENSION);
//$name= mb_strcut(pathinfo($name, PATHINFO_FILENAME), 0, 255 - ($ext ? strlen($ext) + 1 : 0), mb_detect_encoding($name)) . ($ext ? '.' . $ext : '');
// mixed preg_replace ( mixed $pattern , mixed $replacement , mixed $subject [, int $limit = -1 [, int &amp;amp;amp;$count ]] )
$name = preg_replace('/\s\//', '/', $name); # this strips out trailing spaces in folder names
$name = preg_replace('/\/\s+/', '/', $name); # this strips out leading spaces in folder names
return $name;
}
0 Comments