Applying File Changes with fix and GADTs

For a description of prune-juice, see previous post on Pruning unused Haskell dependencies.

Prune Juice 0.7 has been released!

Since releasing prune-juice, I have received a number of requests asking to apply the unused dependencies directly to the cabal files. It ended up being a lot harder than I expected to implement, and I’m proud to say that it’s finally supported!

This post goes into the challenges I encountered implementing incremental file changes, and how I solved them with custom parsers, fix, and GADTs.

Simple Solution to Apply Changes

The first thing I wanted to do was figure out a simple solution overwrite the cabal file. The easiest way to do that was to use the built-in Cabal library, iterate over the targets with unused dependencies, and overwrite the file with the changes. That was simple enough by folding over each target and stripping out the dependencies:

stripDependencies :: Set T.DependencyName -> [Dependency] -> [Dependency]
stripDependencies dependencies = filter (\x -> not (Set.member (T.mkDependencyName next) dependencies))

However, while many import formats are supported for Cabal, only one output format is supported. Standard indentation, multiple default-extensions per line, and adding version ranges for build-depends entries are a few examples of Cabal’s pretty-printing behavior. If I were to use this strategy in the wild, for almost every user, running prune-juice would make changes to the cabal file other than the pruned dependencies.

Using the Relevant Bits

I set out to parse the cabal file into its relevant parts - targets, their dependencies, and everything else - and then attempt to modify each section in-place so as to minimize the changes to the file.

Parsing

For reference, I used megaparsec with the type synonym type Parser = Parsec Void String

Here’s the AST:

-- |An indented section.
data NestedSection
  = BuildDependsNestedSection Int [String]
  | OtherNestedSection Int [String]
  deriving (Eq, Ord, Show)

-- |A top-level section.
data Section
  = TargetSection T.CompilableType (Maybe T.CompilableName) [NestedSection]
  | OtherSection [String]
  deriving (Eq, Ord, Show)

And the parser for the top-level library section:

indentedLines :: Int -> Parser [String]
indentedLines numSpaces =
  -- parses lines, including empty lines, until the indentation is less than `numSpaces`

nestedSection :: Parser NestedSection
nestedSection = do
  numSpaces <- length <$> some (char ' ')
  let buildDepends = do
        void $ string "build-depends:"
        BuildDependsNestedSection numSpaces <$> indentedLines numSpaces
      ...
  buildDepends <|> ...

nestedSections :: Parser [NestedSection]
nestedSections = some nestedSection

section :: Parser Section
section =
  let lib = do
        void $ string "library"
        hspace
        void eol
        TargetSection T.CompilableTypeLibrary Nothing <$> nestedSections
      ...
  in lib <|> ...

I also wrote a render function plus some tests to verify that parsing, followed by rendering, resulted in the original input.

Fold/filter

Once it was parsed, it was pretty easy to apply the changes. I wrote a regular expression to match dependency names, then filtered the parsed dependencies.

dependencyNameRegex :: Regex
dependencyNameRegex = mkRegex "^ *([a-zA-Z0-9\\-]+).*$"

matchDependencyName :: String -> Maybe T.DependencyName
matchDependencyName str = Just . T.DependencyName . pack =<< T.headMay =<< matchRegex dependencyNameRegex str

stripOneBuildDepends :: String -> Set T.DependencyName -> Maybe String
stripOneBuildDepends input dependencies =
  let output = intercalate "," . mapMaybe go . fmap unpack . splitOn "," . pack $ input
  in case not (null output) && all ((==) ' ') output of
      True -> Nothing
      False -> Just output
  where
    go x = case matchDependencyName x of
      Nothing -> Just x
      Just dep -> case Set.member dep dependencies of
        True -> Nothing
        False -> Just x

-- |Strip any dependencies from @build-depends@.
stripBuildDepends :: [String] -> Set T.DependencyName -> [String]
stripBuildDepends buildDepends dependencies = mapMaybe (\x -> stripOneBuildDepends x dependencies) buildDepends

I traversed each leaf by folding over the Sections.

stripNestedSection :: NestedSection -> Set T.DependencyName -> NestedSection
stripNestedSection nested dependencies = case nested of
  T.BuildDependsNestedSection numSpaces buildDepends -> T.BuildDependsNestedSection numSpaces (stripBuildDepends buildDepends dependencies)
  other -> other

stripNestedSections :: [NestedSection] -> Set T.DependencyName -> [NestedSection]
stripNestedSections nested dependencies = fmap (\x -> stripNestedSection x dependencies) nested

stripSection :: Section -> Set T.DependencyName -> Maybe T.CompilableName -> Section
stripSection section dependencies compilableMay = case (section, compilableMay) of
  (TargetSection T.CompilableTypeLibrary Nothing nested, Nothing) ->
    TargetSection T.CompilableTypeLibrary Nothing (stripNestedSections nested dependencies)
  (TargetSection typ (Just name) nested, Just T.Compilable {..}) | typ == compilableType && name == compilableName ->
    TargetSection typ (Just name) (stripNestedSections nested dependencies)
  (other, _) -> other

stripSections :: [Section] -> Set T.DependencyName -> Maybe T.Compilable -> [Section]
stripSections sections dependencies compilableMay =
  fmap (\x -> stripSection x dependencies compilableMay) sections

Simple and successful! Or so I thought…

Beware `common` stanzas (`fix` to the rescue!)

Turns out, I had completely forgotten about common stanzas! For those that aren’t familiar, here’s a primer. These are essentially placeholders for shared dependencies, language pragmas, GHC options, etc which can then be included with a call to import. Similar to anchors in YAML, they meant my previous strategy of folding over each individual section was not going to work anymore.

I had to change the AST to account for common top-level stanzas as well as import calls:

newtype CommonName = CommonName { unCommonName :: Text }
  deriving (Eq, Ord, Show)

data NestedSection
  = BuildDependsNestedSection Int [String]
  | ImportNestedSection Int [String]
  | OtherNestedSection Int [String]
  deriving (Eq, Ord, Show)

data Section
  = TargetSection T.CompilableType (Maybe T.CompilableName) [NestedSection]
  | CommonSection CommonName [NestedSection]
  | OtherSection [String]
  deriving (Eq, Ord, Show)

Parsing was very easy to add, and similar to the above. The big challenge was figuring out how to resolve an import call, especially since common stanzas can, themselves, have calls to import. In the end, I decided to return the import set when folding over the sections and recurse while that set was non-empty.

I added a type for stripping from common stanzas, and updated every strip-type function to return Set CommonName along with the updated portion it was responsible for:

data StripTarget
  = StripTargetBaseLibrary
  -- ^ The base library
  | StripTargetCompilable T.Compilable
  -- ^ Any @library@, @executable@, @test-suite@, @benchmark@, etc stanza.
  | StripTargetCommonStanza (Set CommonName)
  -- ^ Any @common@ stanza matching the set.

stripNestedSection :: NestedSection -> Set T.DependencyName -> (NestedSection, Set CommonName)

stripNestedSections :: [NestedSection] -> Set T.DependencyName -> ([NestedSection], Set CommonName)

stripSection :: Section -> Set T.DependencyName -> StripTarget -> (Section, Set CommonName)

Then, I changed stripSections to recurse using fix:

stripSections :: [Section] -> Set T.DependencyName -> Maybe T.Compilable -> [Section]
stripSections sections dependencies compilableMay =
  let run target = second mconcat . unzip . fmap (\x -> stripSection x dependencies target)
      firstTarget = maybe StripTargetBaseLibrary StripTargetCompilable compilableMay
      firstPass = run firstTarget sections
  in flip fix firstPass $ \recurse -> \case
       (final, none) | Set.null none -> final
       (next, common) -> recurse (run (StripTargetCommonStanza common) next)

In this way, I was able to ensure that nested calls to import would eventually strip out the relevant dependencies. However, I knew that this strategy probably might not be perfect, so I had one final feature to add.

Multiple Strategies (GADTs!)

I figured that since I had ended up writing two strategies for applying the changes, each with their own shortcomings, I might as well allow the user to specify which one they wanted to use. The issues with doing that were:

They had different input and output types
Since there are multiple targets per file, the changes need to be applied incrementally
Other parts of the code didn’t care how the changes were being applied, but were gathering confirmation etc from the user and needed to know that something was applying the changes

I ended up using a continuation-style GADT which would apply the changes incrementally and hide the implementation details from the user.

-- |Continuation GADT for applying changes to a cabal file.
data Apply (a :: T.ApplyStrategy) where
  ApplySafe :: FilePath -> GenericPackageDescription -> Endo GenericPackageDescription -> Apply 'T.ApplyStrategySafe
  ApplySmart :: FilePath -> [T.Section] -> Endo [T.Section] -> Apply 'T.ApplyStrategySmart

-- |Wrap 'Apply' in a data type so that it can be passed to functions without escaping the inner type.
data SomeApply = forall (a :: T.ApplyStrategy). SomeApply { unSomeApply :: Apply a }

I then wrote implementations for each and applied the incremental changes all at once when the file was overwritten:

-- |Iterate on a cabal file by pruning one target at a time. Return whether the command-line call to @prune-juice@ should fail.
runApply :: SomeApply -> T.Package -> Set T.DependencyName -> Maybe T.Compilable -> T.ShouldApply -> IO (Bool, SomeApply)
runApply (SomeApply ap) T.Package {..} dependencies compilableMay = \case
  T.ShouldNotApply -> do
    printDependencies
    pure (True, applyNoop)
  T.ShouldApply -> do
    printDependencies
    confirm "Apply these changes? (Y/n)" >>= \case
      False -> pure (False, applyNoop)
      True -> pure (False, applyOnce)
  T.ShouldApplyNoVerify -> do
    printDependencies
    pure (False, applyOnce)
  where
    printDependencies =
      -- print the output
    applyNoop = case ap of
      ApplySafe x y z -> SomeApply $ ApplySafe x y z
      ApplySmart x y z -> SomeApply $ ApplySmart x y z
    applyOnce = case ap of
      ApplySafe x y z -> SomeApply $ ApplySafe x y $ z <> Endo (\w -> stripGenericPackageDescription w dependencies compilableMay)
      ApplySmart x y z -> SomeApply $ ApplySmart x y $ z <> Endo (\w -> stripSections w dependencies compilableMay)

-- |Write the series of changes to the cabal file.
writeApply :: SomeApply -> IO ()
writeApply (SomeApply ap) = case ap of
  ApplySafe fp description endo -> writeGenericPackageDescription fp (appEndo endo description)
  ApplySmart fp parsed endo -> writeCabalSections fp (appEndo endo parsed)

Conclusion

In the end, I was extremely happy once again to be using Haskell for parsing, recursion, and dependent typing. In addition, I was able to implement an oft-requested feature!