Skip to content
Unverified Commit d0a8d9aa authored by Anton Hibl's avatar Anton Hibl Committed by GitHub
Browse files

add filters for irrelevant files in ISISDATA (#5109)



* adding filters for old files in ISISDATA area

* adding script changes and fixing PR

* used exlcude instead of exclude-from

* cleaned up files and script

* added test to see if filtering on rclone ISISDATA worked

* modified tests

* updating imports in tests

* added complete test for filters, list of filter files can be changed

* updated test to check args rather than files

* removed absolute paths

* updated test to use exclude list passed in

* Updated filter args

I added the following changes to create_rclone_arguments:

- I added a default filter flag --filter with the value of exclude_string

- I added a check for any additional --include and --exclude flags that may have
  been passed in by the user.

- I added logic to merge any additional --include and --exclude flags with the
  default --filter flag. The include and exclude patterns are concatenated with
  the default exclude_string separated by comma.

- I removed the f"--exclude={exclude_string}" from extra_args list, since the
  default exclude_string is now included in the filter flag

- I added the filter flag to the extra_args list using
  extra_args.extend(filter_args)

All of the above changes ensures that the --include and --exclude flags passed
    in by the user are taken into account while creating extra_args, and also
    the logic will merge these flags with the default filter flag, which is the
    recommended way as per the rclone docs.

* accidentally deleted a line

* Added more logic for when the user provides their own filters

along with a few other changes:

   + I added a check for any --filter flag provided by the user, if present it
    will use it and ignore the default filter flag. Otherwise, it will use the default
    filter flag. This is done to take into account if the user has provided any
    specific filter flag, and it will honor the user's intention of providing
    the filter flag.

   + I added a check for any additional --include and --exclude flags passed in
    by the user and merge them with the filter flag. This is to take into
    account any specific include/exclude patterns that the user wants to apply,
    and merge them with the default filter flag.

   + I added a "+" at the end of the filter string if the user has specified an
    --include flag. This simulates the behavior of --include where it includes
    any patterns specified and excludes everything else.

   + I also added a check for filter_args_provided and if provided, it will use
    this flag, else it will use the default filter flag and merge any additional
    include/exclude flags to it.

There is also a test in the pytest file to check if the filter logic works as
expected, run using `pytest`.

* fixed tests and adjusted with new filters

* Added several tests to pytests

**The first test that was added is `test_rclone_with_auth`**
    This test is designed to check the behavior of the `rclone` function when it
    is called with an `auth` parameter. This test checks that the `rclone`
    function properly passes the "auth" parameter to the underlying subprocess
    call.

 **The second test that was added is `test_create_rclone_args_with_no_kwargs`**
    This test is designed to check the behavior of the `create_rclone_args`
    function when it is called with no keyword arguments. This test checks that
    the `create_rclone_args` function properly handles the case when it is
    called with no keyword arguments and returns the correct list of arguments
    to be passed to the underlying subprocess call

 **The third test that was added is `test_file_filtering_with_hidden_files`**
    This test is designed to check the behavior of the `file_filtering` function
    when it is called with hidden files in the specified directory. This test
    checks that the `file_filtering` function properly filters out hidden files
    and only returns the non-hidden files in the specified directory.

I have tested to confirm these all run and pass on my machine.

* added more tests to pytest

**added `test_rclone` test**
    This test mocks the `subprocess.Popen` function and checks that the output
    of the rclone function matches the expected output when the function is
    invoked with the arguments `lsf`, `test`, `["-l", "-R", "--format", "p",
    "--files-only"]`, `True`, and `True`.

**added `test_rclone_unknown_exception` test**
    This test mocks the `subprocess.Popen` function and checks that the `rclone`
    function raises an exception when an unknown exception is encountered. This
    test uses a mocked class that raises an exception when it is initialized.

I have tested and confirmed these to work on my system.

* fixed filter args

    fixed how filters are input, patterns still aren't working I believe I need
    to look at their patterns and ensure the adhere to rclone documentation
    syntactical instructions.

* updated filter list,it starts search from {mission_name}/kernels/

* finally fixed filters and tests

* added findfeaturesSegment script

* added stuff

* fixed parsing issues

* cleaning up

* more cleaning up

* added new regex filter

* re-implementing kelvin's changes.

    accidentally re-based and merged over them.

* comma typo

* fixed capitalization.

* fixed i.e. shortened the regex pattern.

    shortened it to spk/spk_psp_rec* as the paths it searches in mission areas
    are actually sub folders of the mission folder. I believe this method saves
    time and thus it is important to preserve behavior here.

* Add dry run flag

* Roll back change to dry run

---------

Co-authored-by: default avatarKelvin Rodriguez <krodriguez@usgs.gov>
Co-authored-by: default avatarAustin Sanders <arsanders@usgs.gov>
parent addd1302
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment