Scripting interfaces enable users to automate tasks and customize software
workflows, but creating scripts traditionally requires programming expertise
and familiarity with specific APIs, posing barriers for many users. While Large
Language Models (LLMs) can generate code from natural language queries, runtime
code generation is severely limited due to unverified code, security risks,
longer response times, and higher computational costs. To bridge the gap, we
propose an offline simulation framework to curate a software-specific skillset,
a collection of verified scripts, by exploiting LLMs and publicly available
scripting guides. Our framework comprises two components: (1) task creation,
using top-down functionality guidance and bottom-up API synergy exploration to
generate helpful tasks; and (2) skill generation with trials, refining and
validating scripts based on execution feedback. To efficiently navigate the
extensive API landscape, we introduce a Graph Neural Network (GNN)-based link
prediction model to capture API synergy, enabling the generation of skills
involving underutilized APIs and expanding the skillset’s diversity.
Experiments with Adobe Illustrator demonstrate that our framework significantly
improves automation success rates, reduces response time, and saves runtime
token costs compared to traditional runtime code generation. This is the first
attempt to use software scripting interfaces as a testbed for LLM-based
systems, highlighting the advantages of leveraging execution feedback in a
controlled environment and offering valuable insights into aligning AI
capabilities with user needs in specialized software domains.